CN112489643A - Conversion method, conversion table generation device and computer storage medium - Google Patents

Conversion method, conversion table generation device and computer storage medium Download PDF

Info

Publication number
CN112489643A
CN112489643A CN202011165943.1A CN202011165943A CN112489643A CN 112489643 A CN112489643 A CN 112489643A CN 202011165943 A CN202011165943 A CN 202011165943A CN 112489643 A CN112489643 A CN 112489643A
Authority
CN
China
Prior art keywords
conversion
key value
primary key
candidate
intention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011165943.1A
Other languages
Chinese (zh)
Inventor
尹江荣
何伟宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Midea Group Co Ltd
Guangdong Midea White Goods Technology Innovation Center Co Ltd
Original Assignee
Midea Group Co Ltd
Guangdong Midea White Goods Technology Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Midea Group Co Ltd, Guangdong Midea White Goods Technology Innovation Center Co Ltd filed Critical Midea Group Co Ltd
Priority to CN202011165943.1A priority Critical patent/CN112489643A/en
Publication of CN112489643A publication Critical patent/CN112489643A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The application discloses a conversion method, a conversion table generation device and a computer storage medium, wherein the conversion method comprises the following steps: obtaining semantics to be converted; calculating a primary key value corresponding to the semantic to be converted; and acquiring the intention to be converted corresponding to the semantics to be converted from a conversion table by taking the primary key value as an index. The method and the device do not need to compare the voice character strings one by one, are easy to operate and improve efficiency.

Description

Conversion method, conversion table generation device and computer storage medium
Technical Field
The present application relates to the field of speech recognition, and in particular, to a conversion method, a conversion table generation device, and a computer storage medium.
Background
Nowadays, the artificial intelligence voice industry is more and more mature, the voice recognition rate exceeds 90%, and the semantics recognized by voice needs to be converted into intentions so as to be applied to various industries to realize voice control.
In the off-line mode, the semantic conversion intents are compared by phonetic strings. For example, a speech processing device can recognize the semantics of multiple voices, such as: a, turning on an air conditioner; b, starting the air conditioner; c, helping I turn on the air conditioner; and d, the air conditioner is heightened for one time. The speech processing device recognizes that the semantic meaning is 'air conditioning one degree higher' through the speech recognition engine, and then needs to compare the 'air conditioning one degree higher' with a, b, c and d one by one, which results in low efficiency.
Disclosure of Invention
The application provides a conversion method, a conversion table generation device and a computer storage medium, which aim to solve the problem of low efficiency caused by comparison of voice character strings one by one in the prior art.
In order to solve the above technical problem, the present application provides a method for converting speech into an intention, which includes: obtaining semantics to be converted; calculating a primary key value corresponding to the semantic to be converted; and acquiring the intention to be converted corresponding to the semantics to be converted from a conversion table by taking the primary key value as an index.
The obtaining of the to-be-converted intention corresponding to the to-be-converted semantic from a conversion table by using the primary key value as an index includes:
obtaining a conversion pair corresponding to the primary key value from the conversion table by taking the primary key value as an index, wherein the conversion table comprises a plurality of conversion pairs, and each conversion pair comprises a secondary key value and a candidate intention which are mutually related;
judging whether an auxiliary key value in a conversion pair corresponding to the primary key value is a preset value, wherein the preset value is different from an index value used for inquiring the conversion pair in the conversion table;
and in response to a judgment result that the secondary key value is the preset numerical value, taking the candidate intention in the conversion pair corresponding to the primary key value as the intention to be converted.
Wherein each said transformed pair further comprises candidate semantics;
the obtaining of the to-be-converted intention corresponding to the to-be-converted semantic from a conversion table by taking the primary key value as an index comprises the following steps:
responding to a judgment result that the secondary key value is not the preset numerical value, and detecting the similarity of the candidate semantics in a conversion pair corresponding to the to-be-converted semantics and the primary key value;
and in response to a detection result that the similarity is greater than or equal to a preset threshold, taking the candidate intention in the conversion pair corresponding to the primary key value as the intention to be converted.
The obtaining of the to-be-converted intention corresponding to the to-be-converted semantic from a conversion table by using the primary key value as an index includes:
obtaining a conversion pair corresponding to the primary key value from the conversion table by taking the primary key value as an index, wherein the conversion table comprises a plurality of conversion pairs, and each conversion pair comprises a candidate semantic meaning and a candidate intention which are mutually related;
and determining the intention to be converted according to the similarity between the candidate semantics in the conversion pair corresponding to the primary key value and the semantics to be recognized.
Determining the intention to be converted according to the similarity between the candidate semantics and the semantics to be recognized in the conversion pair corresponding to the primary key value, wherein the determining the intention to be converted comprises:
judging whether the primary key value corresponds to the unique conversion pair or not;
responding to a judgment result that the primary key value corresponds to the unique conversion pair, and taking the candidate intention in the conversion pair corresponding to the primary key value as the intention to be converted;
and in response to the judgment result of at least two conversion pairs corresponding to the primary key value, taking the candidate intention corresponding to the candidate semantic with the highest similarity to the semantic to be recognized as the intention to be converted.
In order to solve the above technical problem, the present application provides a method for generating a speech to intention conversion table, including: obtaining a plurality of conversion pairs, wherein each conversion pair respectively comprises a candidate semantic meaning and a candidate intention which are mutually related; calculating the primary key value corresponding to the candidate semanteme in each conversion pair; and storing the conversion pair to a corresponding position of the conversion table by taking the primary key value as an index.
Wherein the storing the conversion pair to the corresponding position of the conversion table by taking the primary key value as an index comprises: and responding to the judgment result of at least two conversion pairs corresponding to the primary key value, storing one conversion pair corresponding to the primary key value in a position corresponding to the primary key value in the conversion table, and storing the rest conversion pairs corresponding to the primary key value in idle positions of the conversion table, so as to further associate the at least two conversion pairs with secondary key values respectively.
Wherein the storing the conversion pair to the corresponding position of the conversion table by taking the primary key value as an index comprises: and associating the primary key value with at least two conversion pairs corresponding to the primary key value so as to allow the primary key value to be utilized to index to the at least two conversion pairs simultaneously.
In order to solve the above technical problem, the present application provides a speech processing apparatus comprising a processor and a memory, the memory storing therein a computer program, the processor being configured to execute the computer program to implement the above conversion method and the above generation method of a speech to intention conversion table.
In order to solve the above technical problem, the present application provides a computer storage medium in which a computer program is stored, the computer program implementing the above conversion method and the above generation method of a speech to intention conversion table when executed.
The method for converting the voice to the intention comprises the steps of obtaining a semantic meaning to be converted; calculating a primary key value corresponding to the semantic to be converted; and acquiring the intention to be converted corresponding to the semantics to be converted from a conversion table by taking the primary key value as an index. Therefore, the primary key value is used as an index to query the conversion table, the intention to be converted corresponding to the semantics to be converted can be obtained, comparison of voice character strings one by one is not needed, operation is easy, and efficiency is improved.
Drawings
FIG. 1 is a flow chart illustrating an embodiment of a method for converting speech into intent;
FIG. 2 is a flowchart illustrating an embodiment of a method for generating a speech to intent translation table according to the present application;
FIG. 3 is a flowchart illustrating an embodiment of step S103 of FIG. 1;
FIG. 4 is a flowchart illustrating an embodiment of step S302 of FIG. 3;
FIG. 5 is a flowchart illustrating an embodiment of step S203 in FIG. 2;
FIG. 6 is a schematic flow chart of another embodiment of step S103 in FIG. 1;
FIG. 7 is a schematic structural diagram of an embodiment of a speech processing apparatus according to the present application;
FIG. 8 is a schematic structural diagram of an embodiment of a computer storage medium according to the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the following describes a method, an apparatus, and a computer storage medium for processing audio data provided in the present application in further detail with reference to the accompanying drawings and the detailed description.
The method for converting voice to intention is applied to a voice processing device used for responding to voice, and the voice processing device comprises but is not limited to household appliances, mobile phones, computers, robots and the like. Taking the field of home appliances as an example, the voice processing device may be a home appliance, and the home appliance has a voice recognition function, for example, a living room area is provided with a voice processing device such as an air conditioner, a television, a refrigerator, and the like. Taking an air conditioner as an example, the air conditioner is preset with a plurality of semantics and intentions corresponding to the semantics, when the air conditioner receives voice sent by a user, the air conditioner recognizes the semantics corresponding to the voice, and compares the semantics corresponding to the voice with the plurality of semantics one by one, so that the searching efficiency is low. The method for converting the voice to the intention is easy to operate and improves efficiency.
Referring to fig. 1, the method for converting speech into intention of the present embodiment includes the following steps.
S101: and acquiring the semantics to be converted.
The voice processing apparatus is provided with a voice sensor, such as a microphone, and thus, the voice processing apparatus can obtain a voice signal through the microphone. Of course, the voice sensor may be connected to other devices in communication, such as a sound pickup, and the voice processing device may obtain the voice signal through the sound pickup, and of course, those skilled in the art may obtain the voice signal through the voice sensor of other devices. Therefore, the voice processing device identifies the voice signal to acquire the semantics to be converted.
The Speech processing device may be provided with an Automatic Speech Recognition (ASR) engine, the ASR engine may output a Recognition result of the Speech signal in a pinyin form, for example, the Speech signal is "turn on air conditioner", and the semantic to be converted acquired by the Speech processing device may be "da 3 kai1kong1 tiao 2".
S102: and calculating a primary key value corresponding to the semantic to be converted.
The voice processing apparatus calculates a primary key value corresponding to the semantic to be converted, the primary key value being a key value (key value). The hash table is a data structure directly accessed according to the key value, and records are accessed by mapping the key value to one position in the table so as to accelerate the searching speed. The voice processing device calculates a primary key value corresponding to the semantic to be converted through the following algorithm:
Figure BDA0002745776110000051
taking the semantic to be converted "da 3 kai1kong1 tiao 2" as an example, locating the length of a hash array (hashsize) to be 255, firstly carrying out exclusive or addition on characters of the semantic to be converted "da 3 kai1kong1 tiao 2" to obtain a numerical value; the value is then divided by the hash array length to obtain a key value, i.e., a primary key value, to prevent the key value from exceeding the hash array length.
S103: and acquiring the intention to be converted corresponding to the semantics to be converted from the conversion table by taking the primary key value as an index.
The voice processing apparatus is provided with a voice-to-intention conversion table in advance, and as shown in fig. 2, the generation method of the voice-to-intention conversion table includes the following steps.
S201: a plurality of transformed pairs are obtained, wherein each transformed pair comprises a candidate semantic and a candidate intent associated with each other, respectively.
The speech processing device may be provided with a table, and the candidate semantics and the candidate intentions associated with each other are stored in the table, that is, the speech processing device fills the candidate semantics and the candidate intentions associated with each other in the table, as shown in table 1.
Table 1 table with interrelated candidate semantics and candidate intents
Figure BDA0002745776110000052
The speech processing apparatus obtains a plurality of conversion pairs from the table, each conversion pair respectively including a candidate semantic meaning and a candidate intention associated with each other. For example, the conversion pair includes the candidate semantic "air conditioner on" and the candidate intention "Open", the conversion pair includes the candidate semantic "air conditioner off" and the candidate intention "Close", and the conversion pair includes the candidate semantic "air conditioner off" and the candidate intention "Close".
S202: and calculating primary key values corresponding to the candidate semantemes in each conversion pair.
The speech processing apparatus calculates the primary key value corresponding to the candidate semantic meaning in each conversion pair, which is the same as step S102 and is not described herein again.
S203: and storing the conversion pair to the corresponding position of the conversion table by taking the primary key value as an index.
The voice processing device stores the conversion pair in a corresponding position of the conversion table by using the primary key value corresponding to the candidate semantic as an index, for example, the voice processing device finds the corresponding position from the conversion table according to the primary key value corresponding to the candidate semantic, and stores the conversion pair in the corresponding position. The conversion table may be a hash table, and the conversion table includes candidate semantics, candidate intents associated with the candidate semantics, and primary key values corresponding to the candidate semantics, so as to generate a speech-to-intention conversion table based on the candidate semantics and the candidate intents associated with each other in the table.
Through the mode, the voice processing device is provided with a form, and the candidate semantics and the candidate intention which are mutually related are filled in the form; the voice processing device generates a voice-to-intention conversion table based on the candidate semanteme and the candidate intention which are mutually associated in the table, and can modify the conversion table by modifying the candidate semanteme and the candidate intention in the table, so that the maintenance is easy.
The voice processing device takes the primary key value as an index to obtain the intention to be converted corresponding to the semanteme to be converted from the conversion table. The voice processing device takes a primary key value as an index query conversion table to obtain a conversion pair corresponding to the primary key value; the voice processing device acquires the corresponding candidate intention from the conversion pair, and takes the candidate intention as the intention to be converted corresponding to the semantic meaning to be converted. The voice processing device further executes the intention to be converted, for example, the intention to be converted corresponding to the semantic to be converted of "da 3 kai1kong1 tiao 2" is "Open", the voice processing device is an air conditioner, and the air conditioner is started based on the intention to be converted of "Open".
Through the mode, the method for converting the voice into the intention obtains the intention to be converted corresponding to the semantics to be converted from the conversion table by calculating the primary key value corresponding to the semantics to be converted and taking the primary key value as an index, does not need to compare voice character strings one by one, is easy to operate and improves the efficiency.
Referring to fig. 3, step S103 includes the following steps.
S301: and acquiring a conversion pair corresponding to the primary key value from a conversion table by taking the primary key value as an index, wherein the conversion table comprises a plurality of conversion pairs, and each conversion pair comprises a candidate semantic meaning and a candidate intention which are mutually related.
The voice processing device takes the primary key value as an index query conversion table and matches the primary key value with the primary key value of the conversion table. And the voice processing device responds to the fact that the primary key value is successfully matched with the primary key value of the conversion table, obtains a conversion pair corresponding to the primary key value from the conversion table, and further obtains the conversion pair corresponding to the primary key value from the conversion table by taking the primary key value as an index.
S302: and determining the intention to be converted according to the similarity between the candidate semantics in the conversion pair corresponding to the primary key value and the semantics to be recognized.
Since the conversion pair comprises the candidate semanteme and the candidate intention which are mutually related, the voice processing device acquires the candidate semanteme from the conversion pair corresponding to the primary key value and determines the intention to be converted according to the similarity between the candidate semanteme and the semanteme to be recognized.
The voice processing apparatus calculates the primary key values corresponding to the candidate semantics in each conversion pair through step S202, and the primary key values of the multiple candidate semantics may be the same, for example, the primary key value of the candidate semantic "air conditioner please turn on" is the same as the primary key value of the candidate semantic "air conditioner please turn off". In step S203, the voice processing apparatus associates the primary key value with at least two conversion pairs corresponding to the primary key value, so as to allow the primary key value to simultaneously index to the at least two conversion pairs, for example, a conversion pair corresponding to the candidate semantic "air conditioner turn on" and a conversion pair corresponding to the candidate semantic "air conditioner turn off" are both associated with the same key value. In order to improve the accuracy of the speech processing apparatus in converting speech into intent, step S302 includes the following steps, as shown in fig. 4.
S401: and judging whether the primary key value corresponds to a unique conversion pair.
The voice processing device obtains all primary key values of the conversion table, and matches the primary key values with all the primary key values to judge whether the primary key values correspond to the unique conversion pairs. Responding to a judgment result that the primary key value corresponds to the unique conversion pair, namely the primary key value is matched with one primary key value in the conversion table, and entering the step S402; in response to the determination result that the primary key values correspond to at least two conversion pairs, that is, the primary key values are matched with at least two primary key values in the conversion table, the process proceeds to S403.
S402: and responding to the judgment result that the primary key value corresponds to the unique conversion pair, and taking the candidate intention in the conversion pair corresponding to the primary key value as the intention to be converted.
The voice processing device judges that the primary key value is the same as one primary key value in the conversion table, namely the primary key value corresponds to a unique conversion pair, so that the candidate intention of the conversion pair is taken as the intention to be converted corresponding to the primary key value, and further the candidate intention in the conversion pair corresponding to the primary key value is taken as the intention to be converted.
S403: and responding to the judgment result of at least two conversion pairs corresponding to the primary key value, and taking the candidate intention corresponding to the candidate semantic with the highest similarity with the semantic to be recognized as the intention to be converted.
And the voice processing device judges that the primary key values are the same as at least two primary key values in the conversion table, namely the primary key values are the same as the two primary key values or more than two primary key values in the conversion table, and compares the semantic to be converted with the candidate semantic in at least two conversion pairs so as to take the candidate intention corresponding to the candidate semantic with the highest similarity to the semantic to be recognized as the intention to be converted.
For example, the speech processing device obtains that the semantic to be converted is "kong tiao qing guan ji", and determines that the primary key value is the same as the primary key value of the candidate semantic "air conditioner turning on" and the primary key value of the candidate semantic "air conditioner turning off" respectively. Or, the voice processing device uses the primary key value to index to at least two conversion pairs at the same time, that is, the voice processing device uses the primary key value to query the conversion table to obtain the conversion pair corresponding to the candidate semantic "air conditioner please turn on" and the conversion pair corresponding to the candidate semantic "air conditioner please turn off".
The voice processing device obtains that a first similarity between 'kong tiao qing guan ji' and a candidate semantic 'air conditioner on (kong tiao qing kai ji)' is 60%, obtains that a second similarity between 'kong tiao qing guan ji' and a candidate semantic 'air conditioner on (kong tiao qing guan ji)' is 100%, obtains that the similarity between the candidate semantic 'air conditioner on off' and the semantic to be identified is highest, obtains that a candidate intention corresponding to the candidate semantic 'air conditioner on off' is 'close', and further takes the candidate intention 'close' as the intention to be converted.
In this way, in this embodiment, the candidate intentions in the conversion pair corresponding to the primary key value are used as the intentions to be converted by responding to the determination result that the primary key value corresponds to the unique conversion pair, and the comparison of the phonetic character strings one by one is not needed, so that the efficiency is improved. In addition, in response to the judgment result that the primary key value corresponds to at least two conversion pairs, the candidate intention corresponding to the candidate semantic with the highest similarity to the semantic to be recognized is used as the intention to be converted, and therefore the accuracy can be improved.
The hash array length of the above embodiment is 255, and in order to reduce the hash array length, as shown in fig. 5, the step S203 further includes the following steps.
S501: and judging whether each primary key value corresponds to a unique conversion pair.
The voice processing device stores the conversion pairs into corresponding positions of the conversion table by taking the primary key values as indexes, wherein the positions of the conversion table, which do not store the conversion pairs, exist, namely the conversion table has free positions, and in order to utilize the free positions of the conversion table, the voice processing device judges whether each primary key value corresponds to a unique conversion pair.
S502: and responding to a judgment result that the primary key value corresponds to the unique conversion pair, storing the conversion pair corresponding to the primary key value in a position corresponding to the primary key value in the conversion table, further associating the secondary key value with the conversion pair, and setting the secondary key value as a preset value, wherein the preset value is different from an index value used for inquiring the conversion pair in the conversion table.
And the voice processing device responds to the judgment result of the unique conversion pair corresponding to the primary key value, and stores the conversion pair corresponding to the primary key value in the position corresponding to the primary key value in the conversion table. The voice processing device is provided with an auxiliary key value in the conversion table so as to associate the auxiliary key value with the conversion pair and mark a position corresponding to the primary key value. The voice processing device sets the auxiliary key value as a preset numerical value, wherein the preset numerical value is different from an index value used for inquiring a conversion pair in the conversion table, namely the preset numerical value is different from the main key value corresponding to the candidate semantics. For example, the position of the primary key value 0 in the conversion table is an idle position, the voice processing device sets the preset value to 0, and the voice processing device finds that the secondary key value corresponding to the conversion pair is 0, so that the conversion pair is the only conversion pair corresponding to the primary key value, and does not need to be found again, thereby improving the efficiency.
The speech processing device of the present application can convert the table (table 1) into a conversion table by script speech, as follows:
const struct hash_table table={
255,/'array length'
{
/*000*/{"","",NULL,0},
001/{ "air conditioner please start up", open,0},
/{ "air conditioner please shut down", close,0},
/{ "turn on air conditioner", open,0},
/{ "turn off air conditioner", close,0},
/{ "please turn on air conditioner", close,0},
};
in/001/{ "air conditioner please start up", open,0},/001 denotes a primary key value; "air conditioner please start" represents candidate semantics; open represents a candidate intent; 0 represents a secondary key value. Therefore, the form can be converted into the conversion table through the script voice, the implementation is easy, and errors are not easy to occur.
S503: and responding to the judgment result of at least two conversion pairs corresponding to the primary key value, storing one conversion pair corresponding to the primary key value in a position corresponding to the primary key value in the conversion table, storing the other conversion pairs corresponding to the primary key value in idle positions of the conversion table, and further associating the at least two conversion pairs with the secondary key value.
Steps S502 and S503 are described in conjunction with table 2. In step S202, the speech processing apparatus calculates that the primary key value corresponding to the candidate semantic value 3 is 17, and the candidate semantic value 3 and the candidate intention 3 corresponding to the primary key value 17 are unique conversion pairs, that is, a determination result of the unique conversion pair corresponding to the primary key value 17 is responded to. The voice processing apparatus stores the candidate semantic 3 and the candidate intention 3 corresponding to the primary key 17 in a position corresponding to the primary key 17 in the conversion table, and sets the secondary key to 0.
Table 2 conversion table
Candidate semantics Candidate intent Auxiliary key value
…… …… ……
Candidate semantics 1 Candidate intent 3 20
Candidate semantics 2 Candidate intent 3 15
Candidate semantics 3 Candidate intent 3 0
Candidate semantics 4 Intention candidates 4 18
…… …… ……
The voice processing device finds that the primary key values 15 and 18 are both idle positions, and the primary key values of the candidate semantic 1, the primary key value of the candidate semantic 2 and the primary key value of the candidate semantic 4 are all 20 obtained through calculation in step S202, and at this time, the primary key values 20 correspond to three conversion pairs. The voice processing device responds to the judgment result of at least two conversion pairs corresponding to the primary key value, and stores one conversion pair corresponding to the primary key value in a position corresponding to the primary key value in the conversion table, namely the voice processing device stores the candidate semantic 1 and the candidate intention 3 in positions with the primary key value of 20. The voice processing device stores the remaining conversion pairs corresponding to the primary key value in the idle position of the conversion table, that is, the voice processing device stores the candidate semantic 2 in the idle position with the primary key value of 15 and stores the candidate semantic 4 in the idle position with the primary key value of 18. The voice processing device further associates the auxiliary key values for the at least two conversion pairs respectively; the speech processing apparatus sets the primary key value corresponding to the candidate semantic 1 to 20, sets the primary key value corresponding to the candidate semantic 2 to 15, and sets the primary key value corresponding to the candidate semantic 4 to 18.
In other embodiments, since the voice processing apparatus stores the candidate semantic 2 in the idle location with the primary key value of 15, and stores the candidate semantic 4 in the idle location with the primary key value of 18, that is, the candidate semantic 1, the candidate intention 3, and the primary key value 20 are in a one-to-one correspondence (that is, the primary key value corresponds to a unique conversion pair), the voice processing apparatus may set the secondary key value corresponding to the candidate semantic 1 to 0. Furthermore, the speech processing apparatus may point the primary key value of the last conversion pair to the storage location of the next conversion pair in the storage order, for example, point the primary key value corresponding to the candidate semantic 2 to the storage location of the candidate semantic 4.
Through the above manner, in response to the determination result that the primary key value corresponds to at least two conversion pairs, one conversion pair corresponding to the primary key value is stored in the conversion table at a position corresponding to the primary key value, and the other conversion pairs corresponding to the primary key value are stored in the idle position of the conversion table.
As shown in fig. 6, step S103 further includes the following steps.
S601: and taking the primary key value as an index to obtain a conversion pair corresponding to the primary key value from the conversion table.
Step S601 is the same as step S301, and is not described herein again.
S602: and judging whether the auxiliary key value in the conversion pair corresponding to the main key value is a preset numerical value or not.
The voice processing device judges whether an auxiliary key value in a conversion pair corresponding to the main key value is a preset numerical value or not; in response to the determination result that the secondary key value is the preset value, the step 603 is entered; in response to the determination result that the secondary key value is not the preset value, go to step 604.
S603: and responding to a judgment result that the auxiliary key value is a preset numerical value, and taking the candidate intention in the conversion pair corresponding to the main key value as the intention to be converted.
As shown in table 2, the speech processing apparatus calculates a primary key value of the semantic to be converted to be 17, queries the conversion table to obtain a conversion pair of the candidate semantic 3 and the candidate intention 3, obtains a secondary key value corresponding to the candidate semantic 3 and the candidate intention 3 to be 0, and determines whether the secondary key value is a preset value, where the preset value is 0. And the voice processing device judges that the auxiliary key value is a preset numerical value, and then takes the candidate intention 3 as the intention to be converted.
Compared with the embodiment shown in fig. 3, in the embodiment, it is not necessary to determine whether the primary key value corresponds to a unique conversion pair, and the execution efficiency can be improved.
S604: and responding to a judgment result that the secondary key value is not a preset numerical value, and detecting the similarity of the semantic to be converted and the candidate semantic in the conversion pair corresponding to the primary key value.
As shown in table 2, the speech processing apparatus calculates the primary key value of the semantic to be converted to be 20, and queries the conversion table to obtain: a conversion pair of candidate semantic 1 and candidate intent 3, with a secondary key value of 20; a conversion pair of candidate semantics 2 and candidate intent 3, with a secondary key value of 15; the secondary key value of the transformed pair of candidate semantic 4 and candidate intent 4 is 18. And the voice processing device responds to the judgment result that the auxiliary key value is not a preset value, and detects the similarity of the semantic to be converted and the candidate semantic 1, the candidate semantic 2 and the candidate semantic 4.
S605: and in response to the detection result that the similarity is greater than or equal to the preset threshold, taking the candidate intention in the conversion pair corresponding to the primary key value as the intention to be converted.
The preset threshold value is set to 80% in advance, for example, if the speech processing device detects that the similarity between the semantic to be converted and the candidate semantic 2 is 100%, the candidate intention 3 is taken as the intention to be converted.
The method is applied to hardware equipment, and can realize the conversion from voice to intention. Referring to fig. 7 in detail, fig. 7 is a schematic structural diagram of an embodiment of a speech processing apparatus 200 of the present application, which includes a processor 21 and a memory 22. In which a computer program is stored in the memory 22 and the processor 21 is adapted to execute the computer program to implement the above-mentioned speech to intention conversion method and the above-mentioned generation method of a speech to intention conversion table.
The processor 21 is configured to obtain a plurality of conversion pairs, wherein each conversion pair respectively comprises a candidate semantic meaning and a candidate intention which are associated with each other; calculating a primary key value corresponding to the candidate semantics in each conversion pair; and storing the conversion pair to the corresponding position of the conversion table by taking the primary key value as an index.
The processor 21 is further configured to obtain a semantic to be converted; calculating a primary key value corresponding to the semantics to be converted; and acquiring the intention to be converted corresponding to the semantics to be converted from the conversion table by taking the primary key value as an index.
The processor 21 may be an integrated circuit chip having signal processing capability. The processor 21 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of a computer storage medium of the present application, and the computer storage medium 300 of the present embodiment includes a computer program 31 that can be executed to implement the above-mentioned speech-to-intention conversion method and the above-mentioned speech-to-intention conversion table generation method.
The computer storage medium 300 of this embodiment may be a medium that can store program instructions, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, or may also be a server that stores the program instructions, and the server may send the stored program instructions to other devices for operation, or may self-operate the stored program instructions.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (11)

1. A method of speech to intent conversion, the method comprising:
obtaining semantics to be converted;
calculating a primary key value corresponding to the semantic to be converted;
and acquiring the intention to be converted corresponding to the semantics to be converted from a conversion table by taking the primary key value as an index.
2. The conversion method according to claim 1, wherein the obtaining the intention to be converted corresponding to the semantic to be converted from a conversion table by using the primary key value as an index comprises:
obtaining a conversion pair corresponding to the primary key value from the conversion table by taking the primary key value as an index, wherein the conversion table comprises a plurality of conversion pairs, and each conversion pair comprises a secondary key value and a candidate intention which are mutually related;
judging whether an auxiliary key value in a conversion pair corresponding to the primary key value is a preset value, wherein the preset value is different from an index value used for inquiring the conversion pair in the conversion table;
and in response to a judgment result that the secondary key value is the preset numerical value, taking the candidate intention in the conversion pair corresponding to the primary key value as the intention to be converted.
3. The transformation method of claim 2, wherein each of said transformed pairs further comprises candidate semantics;
the obtaining of the to-be-converted intention corresponding to the to-be-converted semantic from a conversion table by taking the primary key value as an index comprises the following steps:
responding to a judgment result that the secondary key value is not the preset numerical value, and detecting the similarity of the candidate semantics in a conversion pair corresponding to the to-be-converted semantics and the primary key value;
and in response to a detection result that the similarity is greater than or equal to a preset threshold, taking the candidate intention in the conversion pair corresponding to the primary key value as the intention to be converted.
4. The conversion method according to claim 1, wherein the obtaining the intention to be converted corresponding to the semantic to be converted from a conversion table by using the primary key value as an index comprises:
obtaining a conversion pair corresponding to the primary key value from the conversion table by taking the primary key value as an index, wherein the conversion table comprises a plurality of conversion pairs, and each conversion pair comprises a candidate semantic meaning and a candidate intention which are mutually related;
and determining the intention to be converted according to the similarity between the candidate semantics in the conversion pair corresponding to the primary key value and the semantics to be recognized.
5. The conversion method according to claim 4, wherein the determining the intention to be converted according to the similarity between the candidate semantics and the semantics to be recognized in the conversion pair corresponding to the primary key value comprises:
judging whether the primary key value corresponds to the unique conversion pair or not;
responding to a judgment result that the primary key value corresponds to the unique conversion pair, and taking the candidate intention in the conversion pair corresponding to the primary key value as the intention to be converted;
and in response to the judgment result of at least two conversion pairs corresponding to the primary key value, taking the candidate intention corresponding to the candidate semantic with the highest similarity to the semantic to be recognized as the intention to be converted.
6. A method of generating a speech to intent translation table, the method comprising:
obtaining a plurality of conversion pairs, wherein each conversion pair respectively comprises a candidate semantic meaning and a candidate intention which are mutually related;
calculating the primary key value corresponding to the candidate semanteme in each conversion pair;
and storing the conversion pair to a corresponding position of the conversion table by taking the primary key value as an index.
7. The generation method according to claim 6, wherein the storing the conversion pair to the corresponding position of the conversion table with the primary key value as an index comprises:
judging whether each primary key value corresponds to a unique conversion pair;
and responding to a judgment result that the primary key value corresponds to the unique conversion pair, storing the conversion pair corresponding to the primary key value in a position corresponding to the primary key value in the conversion table, further associating an auxiliary key value with the conversion pair, and setting the auxiliary key value as a preset value, wherein the preset value is different from an index value used for inquiring the conversion pair in the conversion table.
8. The generation method according to claim 7, wherein the storing the conversion pair to the corresponding position of the conversion table with the primary key value as an index comprises:
and responding to the judgment result of at least two conversion pairs corresponding to the primary key value, storing one conversion pair corresponding to the primary key value in a position corresponding to the primary key value in the conversion table, and storing the rest conversion pairs corresponding to the primary key value in idle positions of the conversion table, so as to further associate the at least two conversion pairs with secondary key values respectively.
9. The generation method according to claim 6, wherein the storing the conversion pair to the corresponding position of the conversion table with the primary key value as an index comprises:
and associating the primary key value with at least two conversion pairs corresponding to the primary key value so as to allow the primary key value to be utilized to index to the at least two conversion pairs simultaneously.
10. A speech processing apparatus, characterized in that the speech processing apparatus comprises a processor and a memory; the memory has stored therein a computer program for executing the computer program for carrying out the steps of the conversion method according to any one of claims 1 to 5 and the steps of the generation method according to any one of claims 6 to 9.
11. A computer storage medium, characterized in that the computer storage medium stores a computer program which, when executed, implements the steps of the conversion method according to any one of claims 1-5 and the steps of the generation method according to any one of claims 6-9.
CN202011165943.1A 2020-10-27 2020-10-27 Conversion method, conversion table generation device and computer storage medium Pending CN112489643A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011165943.1A CN112489643A (en) 2020-10-27 2020-10-27 Conversion method, conversion table generation device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011165943.1A CN112489643A (en) 2020-10-27 2020-10-27 Conversion method, conversion table generation device and computer storage medium

Publications (1)

Publication Number Publication Date
CN112489643A true CN112489643A (en) 2021-03-12

Family

ID=74927848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011165943.1A Pending CN112489643A (en) 2020-10-27 2020-10-27 Conversion method, conversion table generation device and computer storage medium

Country Status (1)

Country Link
CN (1) CN112489643A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104469029A (en) * 2014-11-21 2015-03-25 科大讯飞股份有限公司 Method and device for telephone number query through voice
WO2016180186A1 (en) * 2015-07-01 2016-11-17 中兴通讯股份有限公司 Semantic data storage method and apparatus
CN106326303A (en) * 2015-06-30 2017-01-11 芋头科技(杭州)有限公司 Spoken language semantic analysis system and method
CN107515944A (en) * 2017-08-31 2017-12-26 广东美的制冷设备有限公司 Exchange method, user terminal and storage medium based on artificial intelligence
US20200160851A1 (en) * 2018-11-20 2020-05-21 Institute For Information Industry Semantic analysis method, semantic analysis system and non-transitory computer-readable medium
CN111444722A (en) * 2020-03-06 2020-07-24 中国平安人寿保险股份有限公司 Intent classification method, device, equipment and storage medium based on voting decision
CN111540353A (en) * 2020-04-16 2020-08-14 重庆农村商业银行股份有限公司 Semantic understanding method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104469029A (en) * 2014-11-21 2015-03-25 科大讯飞股份有限公司 Method and device for telephone number query through voice
CN106326303A (en) * 2015-06-30 2017-01-11 芋头科技(杭州)有限公司 Spoken language semantic analysis system and method
WO2016180186A1 (en) * 2015-07-01 2016-11-17 中兴通讯股份有限公司 Semantic data storage method and apparatus
CN107515944A (en) * 2017-08-31 2017-12-26 广东美的制冷设备有限公司 Exchange method, user terminal and storage medium based on artificial intelligence
US20200160851A1 (en) * 2018-11-20 2020-05-21 Institute For Information Industry Semantic analysis method, semantic analysis system and non-transitory computer-readable medium
CN111444722A (en) * 2020-03-06 2020-07-24 中国平安人寿保险股份有限公司 Intent classification method, device, equipment and storage medium based on voting decision
CN111540353A (en) * 2020-04-16 2020-08-14 重庆农村商业银行股份有限公司 Semantic understanding method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MING-HSIANG SU, ET AL.: "Attention-Based Response Generation Using Parallel Double Q-Learning for Dialog Policy Decision in a Conversational System", 《IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》, vol. 28 *
朱明奇: "基于垂直搜索的意图识别算法的设计与实现", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, no. 1 *

Similar Documents

Publication Publication Date Title
CN107038220B (en) Method, intelligent robot and system for generating memorandum
JP2020518861A (en) Speech recognition method, apparatus, device, and storage medium
CN107741937B (en) Data query method and device
CN103456297B (en) A kind of method and apparatus of speech recognition match
US6751595B2 (en) Multi-stage large vocabulary speech recognition system and method
EP3816994A1 (en) Speech recogniton method, apparatus and readable storage medium
CN107784044B (en) Table data query method and device
CN105654949A (en) Voice wake-up method and device
US20200279565A1 (en) Caching Scheme For Voice Recognition Engines
US20220366880A1 (en) Method and electronic device for recognizing song, and storage medium
CN107591155A (en) Audio recognition method and device, terminal and computer-readable recording medium
KR101496876B1 (en) An apparatus of sound recognition in a portable terminal and a method thereof
US11194378B2 (en) Information processing method and electronic device
CN110795541B (en) Text query method, text query device, electronic equipment and computer readable storage medium
US20080228493A1 (en) Determining voice commands with cooperative voice recognition
US20220279238A1 (en) Systems and methods to handle queries comprising a media quote
CN110727769B (en) Corpus generation method and device and man-machine interaction processing method and device
CN114387966A (en) Control method and device of intelligent equipment, electronic equipment and storage medium
CN110889009B (en) Voiceprint clustering method, voiceprint clustering device, voiceprint processing equipment and computer storage medium
US20230186941A1 (en) Voice identification for optimizing voice search results
CN116108150A (en) Intelligent question-answering method, device, system and electronic equipment
CN112489643A (en) Conversion method, conversion table generation device and computer storage medium
CN102314464A (en) Lyrics searching method and lyrics searching engine
WO2021143016A1 (en) Approximate data processing method and apparatus, medium and electronic device
CN109977397B (en) News hotspot extracting method, system and storage medium based on part-of-speech combination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination