CN111540353B - Semantic understanding method, device, equipment and storage medium - Google Patents

Semantic understanding method, device, equipment and storage medium Download PDF

Info

Publication number
CN111540353B
CN111540353B CN202010300927.2A CN202010300927A CN111540353B CN 111540353 B CN111540353 B CN 111540353B CN 202010300927 A CN202010300927 A CN 202010300927A CN 111540353 B CN111540353 B CN 111540353B
Authority
CN
China
Prior art keywords
user
letter
understood
pinyin
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010300927.2A
Other languages
Chinese (zh)
Other versions
CN111540353A (en
Inventor
秦邱川
刘引
卢华玮
杨声春
徐欣欣
魏鑫
田成志
汪哲逸
王璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Rural Commercial Bank Co ltd
Original Assignee
Chongqing Rural Commercial Bank Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Rural Commercial Bank Co ltd filed Critical Chongqing Rural Commercial Bank Co ltd
Priority to CN202010300927.2A priority Critical patent/CN111540353B/en
Publication of CN111540353A publication Critical patent/CN111540353A/en
Application granted granted Critical
Publication of CN111540353B publication Critical patent/CN111540353B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a semantic understanding method, a semantic understanding device, semantic understanding equipment and a storage medium, wherein the method comprises the following steps: converting voice sent by a user into characters, extracting all keywords contained in the characters from the characters, and determining that each keyword is a keyword to be understood; if the language used by the voice sent by the user is a local language, converting all the keywords to be understood into pinyin, determining a replacement letter corresponding to a designated letter in the pinyin based on the letter mapping relation, and replacing the designated letter in the corresponding pinyin by using at least one replacement letter corresponding to each pinyin to obtain a character string to be understood; the letter mapping relation is the corresponding relation between any letter used in the standard language and the letter used by the letter in the local language used by the user; and respectively matching the character strings to be understood with the pinyin of the key words in the intentions, and determining that the corresponding intention is the corresponding semantic meaning of the voice sent by the user when the matching is successful. The accuracy of user intention recognition based on the voice uttered by the user is improved.

Description

Semantic understanding method, device, equipment and storage medium
Technical Field
The present invention relates to the technical field of semantic intelligent processing, and more particularly, to a semantic understanding method, apparatus, device, and storage medium.
Background
The languages spoken by people in different regions are not exactly the same, i.e. different regions have different local languages (dialects), e.g. people in the southwest region are the southwest official languages, in particular, chongqing people mostly speak Chongqing, sichuan people mostly speak Sichuan, chongqing and Sichuan languages are local languages. Under the situation that a bank and other intelligent outbound and intelligent customer service have customer voice interaction, due to the fact that characteristics of speech pronunciation in the southwest region and the like are not identical to those of common speech (namely a standard language) (for example, some characters are raised to form flat tongues, and some characters are raised to form front nasal sounds), the system for achieving customer voice interaction often has the situation that intentions to be expressed by voices sent by users cannot be accurately identified, real semantic intelligent understanding is not achieved, the competitiveness of the bank and other industries is reduced, and meanwhile user experience is also influenced.
In summary, how to provide a technical solution for accurately identifying the intention of a user based on the voice uttered by the user is a problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a semantic understanding method, a semantic understanding device, semantic understanding equipment and a storage medium, which can improve the accuracy of user intention recognition based on voice sent by a user, realize real semantic intelligent understanding, increase the industry competitiveness of corresponding industries and further improve the user experience. .
In order to achieve the above purpose, the invention provides the following technical scheme:
a semantic understanding method, comprising:
converting voice sent by a user into corresponding characters, extracting all keywords contained in the characters from the converted characters, and determining that each extracted keyword is a keyword to be understood;
if the language used by the voice sent by the user is a local language, converting the keywords to be understood into corresponding pinyin, determining a replacement letter corresponding to a designated letter contained in the pinyin based on a preset letter mapping relation, and replacing the designated letter in the corresponding pinyin by using at least one replacement letter corresponding to each pinyin to obtain a character string to be understood; the letter mapping relation is a corresponding relation between any letter used in a standard language and a letter used by the letter in a language used by the user;
and respectively matching the character strings to be understood with the pinyin of the keywords contained in the preset intentions, and determining that the corresponding intention is the corresponding semantic meaning of the voice sent by the user when the matching is successful.
Preferably, after determining that each keyword obtained by extraction is a keyword to be understood, the method further includes:
and respectively comparing the keywords to be understood with the keywords contained in each intention, if the keywords contained in any intention are successfully matched with the keywords to be understood, determining that the any intention is the corresponding semantics of the voice sent by the user, otherwise, determining that the language used by the voice sent by the user is a local language, and executing the step of converting the keywords to be understood into the corresponding pinyin.
Preferably, extracting all keywords contained in the text from the converted text includes:
and performing word segmentation on the converted characters, and selecting words of the sentence structure component corresponding to the current scene from a plurality of words obtained by word segmentation as keywords.
Preferably, after the matching of the character string to be understood and the pinyin of the keyword included in each preset intention, the method further includes:
and if the keyword contained in any intention and successfully matched with the character string to be understood does not exist, outputting a voice prompt to prompt the user to send out an instruction in a voice form again.
Preferably, if there is no keyword contained in any intention that matches successfully with the character string to be understood, the method further includes:
and if the keyword contained in any intention and successfully matched with the character string to be understood does not exist after N times of continuous determination, sending command information to a terminal corresponding to a worker, and indicating the worker to provide corresponding service for the user.
Preferably, after determining the corresponding semantics of the voice sent by the user, the method further includes:
and displaying the corresponding semantics of the voice sent by the user in a text mode, and continuing to execute the operation corresponding to the corresponding semantics of the voice sent by the user after the user confirms that the displayed text corresponds to the voice sent by the user.
Preferably, after the operation corresponding to the semantic corresponding to the voice uttered by the user is completed, the method further includes:
and informing the user of the information of the completed operation corresponding to the corresponding semantics of the voice sent by the user in a voice form.
A semantic understanding apparatus, comprising:
an extraction module to: converting voice sent by a user into corresponding characters, extracting all key words contained in the characters from the characters obtained by conversion, and determining that each extracted key word is a key word to be understood;
a replacement module to: if the language used by the voice sent by the user is a local language, converting the keywords to be understood into corresponding pinyin, determining replacement letters corresponding to designated letters contained in the pinyin based on a preset letter mapping relation, and replacing the designated letters in the corresponding pinyin by using at least one replacement letter corresponding to each pinyin to obtain a character string to be understood; the letter mapping relation is the corresponding relation between any letter used in a standard language and the letter used by the letter in the language where the user uses;
a matching module to: and respectively matching the character strings to be understood with the pinyin of the keywords contained in the preset intentions, and determining that the corresponding intention is the corresponding semantic meaning of the voice sent by the user when the matching is successful.
A semantic understanding apparatus comprising:
a memory for storing a computer program;
a processor for implementing the steps of the semantic understanding method according to any one of the above when executing the computer program.
A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the semantic understanding method according to any one of the above.
The invention provides a semantic understanding method, a semantic understanding device, semantic understanding equipment and a storage medium, wherein the method comprises the following steps: converting voice sent by a user into corresponding characters, extracting all keywords contained in the characters from the converted characters, and determining that each extracted keyword is a keyword to be understood; if the language used by the voice sent by the user is a local language, converting the keywords to be understood into corresponding pinyin, determining replacement letters corresponding to designated letters contained in the pinyin based on a preset letter mapping relation, and replacing the designated letters in the corresponding pinyin by using at least one replacement letter corresponding to each pinyin to obtain a character string to be understood; the letter mapping relation is a corresponding relation between any letter used in a standard language and a letter used by the letter in a language used by the user; and respectively matching the character strings to be understood with the pinyin of the keywords contained in the preset intentions, and determining that the corresponding intention is the corresponding semantic meaning of the voice sent by the user when the matching is successful. According to the technical scheme, all keywords contained in characters obtained by converting voice sent by a user are extracted, if the language used by the user is a local language, the extracted keywords are converted into corresponding pinyin, replacement letters of designated letters in the pinyin are determined based on letter mapping relations, and then at least one replacement letter is used for replacing the corresponding designated letter, character strings obtained after replacement are matched with the pinyin of the keywords contained in each preset intention, so that the intention of successful matching is determined as the semantic meaning required to be expressed by the user; the letter mapping relation is the corresponding relation between any letter used in the standard language and the letter used by the letter in the language of the place used by the user; therefore, by the mode, after the pinyin of the keywords in the voice sent by the user is converted into the pinyin of the keywords with the same semantics in the standard language, namely the pronunciation of the keywords in the voice sent by the user is converted into the pronunciation of the keywords with the same semantics in the standard language, and then the corresponding semantic recognition is realized, the condition that the semantics of the voice sent by the user cannot be accurately understood due to the fact that the local voice pronunciation characteristics are different from the standard language can be avoided, the accuracy of user intention recognition based on the voice sent by the user is improved, real semantic intelligent understanding is realized, the industry competitiveness of the corresponding industry is increased, and further the user experience is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a semantic understanding method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a semantic understanding apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Referring to fig. 1, a flowchart of a semantic understanding method according to an embodiment of the present invention is shown, where the method may include:
s11: the method comprises the steps of converting voice sent by a user into corresponding characters, extracting all key words contained in the characters from the converted characters, and determining that each extracted key word is a key word to be understood.
The execution main body of the semantic understanding method provided by the embodiment of the present invention may be a corresponding semantic understanding device, and the semantic understanding device may be disposed in a voice interaction system for implementing voice interaction with a user, so the execution main body of the semantic understanding method may be the voice interaction system, and the following description specifically describes the execution main body of the semantic understanding method as the voice interaction system.
When the user realizes voice interaction, the intention is usually expressed by speaking (namely speaking) and can comprise identity confirmation, repayment willingness, repayment amount confirmation and the like; for example, a voice interaction system (such as an intelligent customer service robot) of a bank performs voice interaction with a user: "do you be four? If the client answer is yes or if the client answer is Li four, the intention of the client is matched, and the process goes to a subsequent process corresponding to the intention of the client, and if the client answer is not yes, the intention of the client is matched, and the process goes to a subsequent process corresponding to the intention of the client.
The voice interaction system converts the voice sent by the user into corresponding characters, specifically, the voice sent by the user can be converted into characters through ASR (Automatic Speech Recognition); then, all keywords contained in the converted characters are extracted (the converted characters may contain one or more keywords), specifically, the keywords may be extracted by adopting a textRank algorithm and a part-of-speech analysis combined mode, the keyword extraction mode is consistent with the implementation principle of the corresponding technical scheme in the prior art, specifically, word segmentation processing is performed on a file to obtain a plurality of corresponding words, then, the part-of-speech of each word (such as noun, verb, adjective, adverb and the like) can be analyzed, and further, which part-of-speech word is selected as the keyword is determined based on the current scene; when determining which part of speech word is selected as the keyword based on the current scene, if the answer of the user through the voice response is required to be which part of speech, selecting the part of speech word as the keyword, for example: if the answer that the user answers through the voice is required to be a verb, selecting a word with the part of speech being the verb as a keyword, and if the user is required to determine whether to pay, selecting 'return' or 'not return' and the like as the keyword; if the answer needing the user to answer through the voice is a noun, selecting words with parts of speech as the keywords, and if the user needs to answer the name, selecting 'Liquan', 'Zhang three' and the like as the keywords; achieving extraction of keywords in this way can increase the likelihood that the keywords are words that express the user's intent.
S12: if the language used by the voice sent by the user is a local language, converting all the keywords to be understood into corresponding pinyin, determining a replacement letter corresponding to an appointed letter contained in the pinyin based on a preset letter mapping relation, and replacing the appointed letter in the corresponding pinyin with at least one replacement letter corresponding to each pinyin to obtain a character string to be understood; the letter mapping relation is the corresponding relation between any letter used in the standard language and the letter used by the letter in the local language used by the user.
If the language used by the voice uttered by the user (i.e., the language spoken by the user) is a local language (dialect) different from the standard language (mandarin), since pronunciation has a problem, the voice is converted into other words which do not correspond to the intention which the user needs to express when being converted into words, and influence on semantic understanding, in this embodiment, if it is determined that the language used by the voice uttered by the user is a local language, based on the pronunciation characteristics of the local language, similar pronunciation is used for error correction, thereby implementing correction at the language level.
Specifically, firstly, each keyword obtained by extraction is converted into corresponding pinyin, namely the pinyin of each keyword is obtained, if the pinyin of the keyword 'Lisi', and if the pinyin of the keyword 'Buhuan', the pinyin of the keyword 'Buhuan'; after obtaining the pinyin of each keyword, determining a replacement letter corresponding to a designated letter in the keyword through a called letter mapping relation, wherein the letters in the application are all pinyin letters, the letter mapping relation is established based on the pronunciation characteristics of a local language used by a user, the local language used by the user is not a standard language, so when the local language and the standard language are used for expressing words with the same semantic, the issued voices may not be the same, and when the pronunciations are different, the pronunciations are different when the pinyin of the words are different, at this time, the different letters are the designated letters in the local language and the replacement letters in the standard language, and the letters which are obtained according to the pronunciation of the standard language and the letters which are obtained according to the pronunciation of the local language and are the letters with the corresponding relation in the letter mapping relation in the same position in the words with the same semantic expression expressed by the user, that is, the letter mapping relationship is the corresponding relationship between any letter used in the standard language and the letter used by the letter in the local language used by the user, for example, "jinnian" is pronounced when spoken by the standard language in this year, and "jinlian" can be pronounced when spoken by the local dialect, and "n" is replaced in the standard language, and "l" in the local language and "n" in the standard language have corresponding relationship in the letter mapping relationship, and "jnian" in the standard language of "this year" can be obtained when "n" is replaced by "l" in the local language pronunciation; in this embodiment, after determining the replacement letters corresponding to the designated letters in the pinyin of the extracted keyword in the letter mapping relationship, at least one replacement letter may be used to replace the corresponding designated letter, so as to obtain a plurality of character strings, where, when the above replacement is implemented, if only one designated letter is included in the pinyin, the designated letter may be directly replaced by the replacement letter, if more than one designated letter is included in the pinyin, the corresponding character string may be obtained by replacing the corresponding designated letter with any one replacement letter, and/or the corresponding character string may be obtained by replacing the corresponding designated letter with any 2 replacement letters, and m is greater than or equal to 2, and m is less than or equal to the total number of the designated letters included in the pinyin, so as to generalize a plurality of character strings, and obtaining all possible pinyin character strings by this way to implement subsequent matching operations, thereby increasing semantic understanding accuracy to a certain extent.
In addition, when a character string to be understood is obtained, letters containing a plurality of characters in the pinyin can be replaced firstly, then letters containing a single character are replaced, the situation that after a certain letter containing a single character is replaced, the corresponding replacement letter of the letter can form a designated letter with other characters in the pinyin, and then wrong replacement is carried out, for example, an, en, in and the like in the pinyin are replaced firstly, then l- > n is replaced, and after l- > n is avoided, a plurality of an and the like are added in the combination of n and a, and continuous replacement is needed.
S13: and respectively matching the character strings to be understood with the pinyin of the keywords contained in the preset intentions, and determining that the corresponding intentions are corresponding semantics of the voice sent by the user when the matching is successful.
The intention is the semantic meaning of the voice possibly expressed by the user, the extraction of the key words in the intention can be realized by utilizing a method for extracting the key words to be understood, each character string to be understood is respectively matched with the key words contained in each intention, if the key words contained in any intention successfully matched with the character string to be understood can be contained, the corresponding semantic meaning of the voice sent by the user is determined to be the arbitrary intention, otherwise, the semantic recognition of the voice sent by the user cannot be realized. Specifically, the keywords may be compared in sequence in an order that is intended to be arranged in advance, and as long as the matching is successful, the matching is not continued. In addition, successful matching in this embodiment may mean complete agreement.
Generally, characters obtained by converting voice sent by a user comprise at least one keyword, pinyin of the keyword comprises at least one designated letter, and if the keyword cannot be extracted from the characters and/or the designated letter cannot be positioned from the pinyin of the keyword, and the language used by the voice sent by the user is dialect, the voice sent by the user can be considered to be incapable of performing semantic recognition.
According to the technical scheme, all keywords contained in characters obtained by converting voice sent by a user are extracted, if the language used by the user is a local language, the extracted keywords are converted into corresponding pinyin, replacement letters of designated letters in the pinyin are determined based on letter mapping relations, and then at least one replacement letter is used for replacing the corresponding designated letter, character strings obtained after replacement are matched with the pinyin of the keywords contained in each preset intention, so that the intention of successful matching is determined as the semantic meaning required to be expressed by the user; the letter mapping relation is the corresponding relation between any letter used in the standard language and the letter used by the letter in the local language used by the user; therefore, by the mode, after the pinyin of the keywords in the voice sent by the user is converted into the pinyin of the keywords with the same semantics in the standard language, namely the pronunciation of the keywords in the voice sent by the user is converted into the pronunciation of the keywords with the same semantics in the standard language, corresponding semantic recognition is realized, the situation that the semantics sent by the user cannot be accurately understood due to the fact that the local pronunciation characteristics of the voice are different from those of the standard language can be avoided, the accuracy of user intention recognition based on the voice sent by the user is improved, real semantic intelligent understanding is realized, the industry competitiveness of corresponding industries is increased, and further the user experience is improved.
In the semantic understanding method provided in the embodiment of the present invention, after determining that each extracted keyword is a keyword to be understood, the method may further include:
and respectively comparing the keywords to be understood with the keywords contained in each intention, if the keywords contained in any intention successfully matched with the keywords to be understood exist, determining that the any intention is the corresponding semantics of the voice sent by the user, otherwise, determining that the language used by the voice sent by the user is a local language, and executing the step of converting the keywords to be understood into the corresponding pinyin.
It should be noted that after the keywords to be understood are extracted, the keywords to be understood may be directly and respectively matched with the keywords in each intention, and if the keywords to be understood are successfully (consistently) matched with the keywords included in any intention, it may be determined that the any intention is the semantics that the user needs to express, otherwise, it is indicated that the user semantics cannot be understood, and at this time, it may be determined that the language used by the voice sent by the user is a local language, and further, a subsequent operation is implemented.
In addition, the matching success in the application refers to that all the keywords to be understood are consistent with all the keywords in any intention, and only in this case, it can be determined that any intention is the semantics to be expressed by the user or the intention of the user, otherwise, the conclusion cannot be fully determined.
In the semantic understanding method provided in the embodiment of the present invention, extracting all keywords included in the converted text from the text may include:
and performing word segmentation on the converted characters, and selecting words of the sentence structure component corresponding to the current scene from a plurality of words obtained by word segmentation as keywords.
In addition, when keyword extraction is realized, word segmentation processing can be performed on characters, and then corresponding words are selected from a plurality of words obtained through word segmentation processing as keywords, wherein when the keywords are selected from the plurality of words, sentence structure components required by the current scene can be determined, and then words in the plurality of words, of which the sentence structure components in the characters are the sentence structure components required by the current scene, are determined as the keywords, if the current scene is to inquire whether a user repays or not, the required sentence structure components are predicates, so that in the returned "repayment of me" or "repayment for the time of me", the "repayment" or "non-repayment" is the words of the sentence structure components corresponding to the current scene; therefore, the selected keywords correspond to the current scene in the mode, the selected keywords are the keywords which can express the user semantics to a certain extent, and the accuracy of understanding the user semantics is improved.
The semantic understanding method provided in the embodiment of the present invention may further include, after matching the character strings to be understood with the pinyin of the keywords included in each preset intention, respectively:
if the keywords contained in any intention and successfully matched with the character string to be understood do not exist, outputting a voice prompt to prompt the user to send out an instruction in a voice form again.
If the keyword successfully matched with the character string to be understood does not exist, in order to enable the external user to know the information, a corresponding voice prompt, such as 'unable to know your intention', can be output, so that the external user can input the voice to the voice interaction system again after knowing the prompt, the user can conveniently know the indication issuing condition of the user, and the user experience is improved.
The semantic understanding method provided in the embodiment of the present invention may further include, if there is no keyword contained in any intention that is successfully matched with the to-be-understood character string, that is included in any intention:
and if the keyword contained in any intention successfully matched with the character string to be understood does not exist for N times (N is a numerical value which can be set according to actual needs, such as 3, 4 and the like), sending command information to a terminal corresponding to the staff, and indicating the staff to provide corresponding services for the user.
If it is determined that there is no keyword successfully matched with the character string to be understood for N times, it indicates that the user may continue to interact with the voice interaction system and cannot make the voice interaction system learn its true intention, so as to avoid resource waste of the voice interaction system and poor use experience caused by the user performing voice input again, in this embodiment, corresponding command information may be sent to a corresponding terminal, so that a worker corresponding to the terminal may locate the corresponding user to provide help for the user, and help the user complete an operation to be implemented by using the voice interaction system; in addition, the command information may include the number or the position information of the voice interaction system currently used by the user, so that the staff can quickly realize the positioning of the user, provide the help required by the user at the fastest speed, and further improve the user experience.
The semantic understanding method provided by the embodiment of the present invention, after determining the semantic corresponding to the voice uttered by the user, may further include:
and displaying the corresponding semantics of the voice sent by the user in a text mode, and continuously executing the operation corresponding to the corresponding semantics of the voice sent by the user after the user confirms that the displayed text corresponds to the voice sent by the user.
In addition, in order to avoid incorrect semantic understanding of the voice uttered by the user, in this embodiment, after determining the semantic corresponding to the voice of the user, the semantic may be displayed in a text manner, and is a button for providing confirmation or re-inputting the voice, so that after the user confirms, an operation corresponding to the semantic expressed by the user is performed, such as payment guidance or card transaction guidance; after the user selects to input the voice again, the operations of receiving the voice sent by the user, recognizing the semantic meaning of the voice and the like are realized again; therefore, the accuracy of the operation which is required to be executed by the user in the follow-up operation is further ensured through the mode, and the use experience of the user is further improved.
The semantic understanding method provided in the embodiment of the present invention, after performing an operation corresponding to a semantic corresponding to a voice uttered by a user, may further include:
and informing the user of the information that the operation corresponding to the semantic meaning corresponding to the voice sent by the user is completed in a voice mode.
It should be noted that, in order to enable the user to know the implementation of the corresponding operation after the user utters the voice, in this embodiment, the user may be notified of the information voice that the corresponding operation of the semantic corresponding to the voice uttered by the user has been completed, so as to further improve the user experience of the user.
The semantic understanding method disclosed by the application is explained in detail by taking the language used by the user as the Chongqing session, and specifically comprises the following steps:
1) Extracting keywords from characters obtained after ASR translation is carried out on voice sent by a user, matching the extracted keywords with keywords in each intention, and entering the corresponding intention to realize corresponding operation if a certain intention is hit; if all the data are not hit, performing subsequent operation;
2) Converting all key words extracted from characters translated by user voice into corresponding pinyin;
3) Calling a letter mapping relation formulated according to the pronunciation characteristics of Chongqing dialect:
(conversion of flat tongue and warped tongue) s < -sh
c<->ch
z<->zh
(conversion of front and rear nasal sounds) an < - > ang
en<->eng
in<->ing
(l, n conversion) l- > n (e.g., "jinlian" - > "jinnian" of this year's pronunciation in Chongqing)
(h, f conversion) f- > h
Wherein, the bidirectional arrow indicates that bidirectional conversion is possible, the unidirectional arrow indicates that unidirectional conversion is possible, and when the letter conversion is realized, the conversion is realized according to the direction indicated by the arrow (for example, f is converted into h, and l is converted into n);
4) Generalizing the pinyin of the keyword into a plurality of character strings according to the letter mapping relation; the generalization here means that a designated letter in the Chongqing language is realized by using a replacement letter in a standard language, and in addition, there is a precedence order when the determination and replacement of the replacement letter are realized (these two actions may be collectively referred to as conversion), specifically, firstly, a letter containing a plurality of characters is converted, then, a letter containing a single character is converted, for example, an, en, and in are converted first, then, l- > n is converted, and after l- > n is avoided, n and a are combined to form a plurality of an, and the like, and conversion is continued);
5) Comparing the generalized character strings with the pinyin of the keywords in the intentions respectively, and if the same character strings exist, considering that the intentions are hit; if there is still no identical string, then any intent is considered missed.
For better understanding of the present solution, the following will be described by exemplifying a usage scenario of the above-described flow for the collection of credit card by bank, in which a voice interaction system (here, a smart service robot) of the bank performs the following voice interaction with the customer:
customer service robot: ' do you, i am a credit card center robot for Chongqing rural commercial banks asking for you to be Mr. Li Si? "
Customer (say Chongqing session): "I am Li four. "
After the customer service robot receives the voice, the voice is translated into the character 'I is history' due to the translation problem caused by the pronunciation of the Chongqing words; therefore, firstly, according to the translated words 'I is history', extracting a keyword 'history' from the translated words, and converting the keyword 'history' into pinyin 'lishi'; based on letter mapping relation, the keywords are spelled into a plurality of character strings of 'lisi', 'nishi', 'nisi'; comparing the plurality of generalized character strings with the pinyin of the keyword in the intention understanding, and finding that the same character string exists in the 'lisi' after comparison, the intention is considered to be hit, and the 'i is Lifour'.
According to the scheme in the prior art, when the voice interaction system carries out the process of client identity authentication and processes 'i is history', the identity authentication is not passed, the step of checking the client identity information is stopped, the next step cannot be carried out, the client needs to input 'i is Lifours' again through voice, and the user experience is poor. Through the scheme of this embodiment, the real intention of this customer can be discerned fast to bank voice system, and the flow can go on smoothly, promotes user experience, and is more intelligent.
To sum up, the technical scheme disclosed in the application can identify the real intention of the user using the local language to a great extent, improves the accuracy of intention understanding under the scene that the intelligent call is called out and the intelligent customer service has the customer voice interaction, is more intelligent, promotes the industry competitiveness of the bank, also improves the accuracy of voice recognition to the local dialect simultaneously, and promotes the user experience. Moreover, the technical scheme disclosed by the application can be applied to any scene needing voice interaction, such as overdue payment due for bank credit cards, card sending for bank credit cards, payment due for personal loans and the like, and has wide applicability.
An embodiment of the present invention further provides a semantic understanding apparatus, as shown in fig. 2, which may include:
an extraction module 11 configured to: converting voice sent by a user into corresponding characters, extracting all keywords contained in the characters from the converted characters, and determining that each extracted keyword is a keyword to be understood;
a replacement module 12 for: if the language used by the voice sent by the user is a local language, converting all the keywords to be understood into corresponding pinyin, determining a replacement letter corresponding to an appointed letter contained in the pinyin based on a preset letter mapping relation, and replacing the appointed letter in the corresponding pinyin with at least one replacement letter corresponding to each pinyin to obtain a character string to be understood; the letter mapping relation is the corresponding relation between any letter used in the standard language and the letter used by the letter in the language of the place used by the user;
a matching module 13 for: and respectively matching the character strings to be understood with the pinyin of the keywords contained in the preset intentions, and determining that the corresponding intentions are corresponding semantics of the voice sent by the user when the matching is successful.
The semantic understanding apparatus provided in the embodiment of the present invention may further include:
a determination module to: after determining that each extracted keyword is a keyword to be understood, comparing the keyword to be understood with the keyword contained in each intention, if the keyword contained in any intention successfully matched with the keyword to be understood exists, determining that the any intention is the corresponding semantic of the voice sent by the user, otherwise, determining that the language used by the voice sent by the user is a local language, and executing the step of converting the keyword to be understood into the corresponding pinyin.
In the semantic understanding apparatus provided in the embodiment of the present invention, the extraction module may include:
an extraction unit for: and performing word segmentation on the converted characters, and selecting words of sentence structural components corresponding to the current scene from a plurality of words obtained by word segmentation as keywords.
The semantic understanding apparatus provided in the embodiment of the present invention may further include:
a prompt module to: after the character string to be understood is respectively matched with the pinyin of the keyword contained in each preset intention, if the keyword contained in any intention successfully matched with the character string to be understood does not exist, a voice prompt is output to prompt the user to send out an instruction again in a voice mode.
The semantic understanding apparatus provided in the embodiment of the present invention may further include:
an indication module to: if the keyword contained in any intention successfully matched with the character string to be understood does not exist after the keyword contained in any intention successfully matched with the character string to be understood does not exist, if the keyword contained in any intention successfully matched with the character string to be understood does not exist after the keyword is determined for N times continuously, command information is sent to a corresponding terminal of a worker, and the worker is instructed to provide corresponding services for the user.
The semantic understanding apparatus provided in the embodiment of the present invention may further include:
a display module to: and after determining that the voice sent by the user corresponds to the semantics, displaying the words of the voice sent by the user, and after the user confirms that the displayed words correspond to the voice sent by the user, continuously executing the operation corresponding to the semantics corresponding to the voice sent by the user.
The semantic understanding apparatus provided in the embodiment of the present invention may further include:
a notification module to: and after the operation corresponding to the semantic meaning corresponding to the voice sent by the user is finished, informing the user of the information that the operation corresponding to the semantic meaning corresponding to the voice sent by the user is finished in a voice mode.
An embodiment of the present invention further provides a semantic understanding device, which may include:
a memory for storing a computer program;
a processor for implementing the steps of the semantic understanding method according to any one of the above when executing the computer program.
The embodiment of the invention also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of any one of the semantic understanding methods can be implemented.
It should be noted that, for the description of the relevant parts in the semantic understanding apparatus, the device and the storage medium provided in the embodiment of the present invention, reference is made to the detailed description of the corresponding parts in the semantic understanding method provided in the embodiment of the present invention, and details are not described here again. In addition, parts of the technical solutions provided in the embodiments of the present invention that are consistent with the implementation principles of the corresponding technical solutions in the prior art are not described in detail, so as to avoid redundant description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A method of semantic understanding, comprising:
converting voice sent by a user into corresponding characters, extracting all keywords contained in the characters from the converted characters, and determining that each extracted keyword is a keyword to be understood;
if the language used by the voice sent by the user is a local language, converting the keywords to be understood into corresponding pinyin, determining replacement letters corresponding to designated letters contained in the pinyin based on a preset letter mapping relation, and replacing the designated letters in the corresponding pinyin by using at least one replacement letter corresponding to each pinyin to obtain a character string to be understood; the letter mapping relation is the corresponding relation between any letter used in a standard language and the letter used by the letter in the language where the user uses;
matching the character strings to be understood with the pinyin of the keywords contained in each preset intention, and determining that the corresponding intention is the corresponding semantic meaning of the voice sent by the user when the matching is successful;
wherein, extracting all keywords contained in the characters obtained by conversion comprises: and performing word segmentation on the converted characters, and selecting words of the sentence structure component corresponding to the current scene from a plurality of words obtained by word segmentation as keywords.
2. The method according to claim 1, wherein after determining that each extracted keyword is a keyword to be understood, the method further comprises:
and respectively comparing the keywords to be understood with the keywords contained in each intention, if the keywords contained in any intention are successfully matched with the keywords to be understood, determining that the any intention is the corresponding semantics of the voice sent by the user, otherwise, determining that the language used by the voice sent by the user is a local language, and executing the step of converting the keywords to be understood into the corresponding pinyin.
3. The method according to claim 1, wherein after matching the character string to be understood with the pinyin of the keyword included in each preset intention, the method further comprises:
and if the keyword contained in any intention successfully matched with the character string to be understood does not exist, outputting a voice prompt to prompt the user to send out an instruction in a voice form again.
4. The method according to claim 3, wherein if there is no keyword contained in any intention that is successfully matched with the character string to be understood, the method further comprises:
and if the keyword contained in any intention and successfully matched with the character string to be understood does not exist after N times of continuous determination, sending command information to a terminal corresponding to a worker, and indicating the worker to provide corresponding service for the user.
5. The method of claim 4, after determining corresponding semantics of the speech uttered by the user, further comprising:
and displaying the corresponding semantics of the voice sent by the user in a text mode, and continuing to execute the operation corresponding to the corresponding semantics of the voice sent by the user after the user confirms that the displayed text corresponds to the voice sent by the user.
6. The method of claim 5, wherein after performing the operation corresponding to the semantic corresponding to the speech uttered by the user, further comprising:
and informing the user of the information that the operation corresponding to the semantic meaning corresponding to the voice sent by the user is completed in a voice mode.
7. A semantic understanding apparatus, comprising:
an extraction module to: converting voice sent by a user into corresponding characters, extracting all keywords contained in the characters from the converted characters, and determining that each extracted keyword is a keyword to be understood;
a replacement module to: if the language used by the voice sent by the user is a local language, converting the keywords to be understood into corresponding pinyin, determining replacement letters corresponding to designated letters contained in the pinyin based on a preset letter mapping relation, and replacing the designated letters in the corresponding pinyin by using at least one replacement letter corresponding to each pinyin to obtain a character string to be understood; the letter mapping relation is the corresponding relation between any letter used in a standard language and the letter used by the letter in the language where the user uses;
a matching module to: matching the character strings to be understood with the pinyin of the keywords contained in each preset intention, and determining that the corresponding intention is the corresponding semantic meaning of the voice sent by the user when the matching is successful;
wherein the extracting module is further configured to: and performing word segmentation on the converted characters, and selecting words of the sentence structure component corresponding to the current scene from a plurality of words obtained by word segmentation as keywords.
8. A semantic understanding apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the semantic understanding method according to any one of claims 1 to 6 when executing the computer program.
9. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the semantic understanding method according to any one of claims 1 to 6.
CN202010300927.2A 2020-04-16 2020-04-16 Semantic understanding method, device, equipment and storage medium Active CN111540353B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010300927.2A CN111540353B (en) 2020-04-16 2020-04-16 Semantic understanding method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010300927.2A CN111540353B (en) 2020-04-16 2020-04-16 Semantic understanding method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111540353A CN111540353A (en) 2020-08-14
CN111540353B true CN111540353B (en) 2022-11-15

Family

ID=71974973

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010300927.2A Active CN111540353B (en) 2020-04-16 2020-04-16 Semantic understanding method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111540353B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12033615B2 (en) * 2020-11-04 2024-07-09 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for recognizing speech, electronic device and storage medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102833B (en) * 2020-09-22 2023-12-12 阿波罗智联(北京)科技有限公司 Speech recognition method, device, equipment and storage medium
CN112114926A (en) * 2020-09-25 2020-12-22 北京百度网讯科技有限公司 Page operation method, device, equipment and medium based on voice recognition
CN112489643B (en) * 2020-10-27 2024-07-12 广东美的白色家电技术创新中心有限公司 Conversion method, conversion table generation device and computer storage medium
CN112364212A (en) * 2020-11-04 2021-02-12 北京致远互联软件股份有限公司 Voice name recognition method based on approximate voice recognition
CN112382275B (en) * 2020-11-04 2023-08-15 北京百度网讯科技有限公司 Speech recognition method, device, electronic equipment and storage medium
CN114783437A (en) * 2022-06-15 2022-07-22 湖南正宇软件技术开发有限公司 Man-machine voice interaction realization method and system and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136352A (en) * 2013-02-27 2013-06-05 华中师范大学 Full-text retrieval system based on two-level semantic analysis
CN103678674A (en) * 2013-12-25 2014-03-26 乐视网信息技术(北京)股份有限公司 Method, device and system for achieving error correction searching through Pinyin
CN105117487A (en) * 2015-09-19 2015-12-02 杭州电子科技大学 Book semantic retrieval method based on content structures
CN105319978A (en) * 2015-12-09 2016-02-10 上海电机学院 Speech recognition based intelligent home control system
CN106782533A (en) * 2016-12-23 2017-05-31 陈勇 Incorrect pinyin acknowledgement key of the sound to correction Software Create in word correspondence
CN109446376A (en) * 2018-10-31 2019-03-08 广东小天才科技有限公司 Method and system for classifying voice through word segmentation

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070214118A1 (en) * 2005-09-27 2007-09-13 Schoen Michael A Delivery of internet ads
CN104235891B (en) * 2013-06-14 2019-01-11 上海能感物联网有限公司 A method of smart electronics gas furnace is manipulated with phonetic order
CN103593340B (en) * 2013-10-28 2017-08-29 余自立 Natural expressing information processing method, processing and response method, equipment and system
CN105912725A (en) * 2016-05-12 2016-08-31 上海劲牛信息技术有限公司 System for calling vast intelligence applications through natural language interaction
CN105913841B (en) * 2016-06-30 2020-04-03 北京小米移动软件有限公司 Voice recognition method, device and terminal
CN106409283B (en) * 2016-08-31 2020-01-10 上海交通大学 Man-machine mixed interaction system and method based on audio
CN107845381A (en) * 2017-10-27 2018-03-27 安徽硕威智能科技有限公司 A kind of method and system of robot semantic processes
CN109360563B (en) * 2018-12-10 2021-03-02 珠海格力电器股份有限公司 Voice control method and device, storage medium and air conditioner
CN109493848A (en) * 2018-12-17 2019-03-19 深圳市沃特沃德股份有限公司 Audio recognition method, system and electronic device
CN110377908B (en) * 2019-07-19 2023-05-30 科大讯飞股份有限公司 Semantic understanding method, semantic understanding device, semantic understanding equipment and readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136352A (en) * 2013-02-27 2013-06-05 华中师范大学 Full-text retrieval system based on two-level semantic analysis
CN103678674A (en) * 2013-12-25 2014-03-26 乐视网信息技术(北京)股份有限公司 Method, device and system for achieving error correction searching through Pinyin
CN105117487A (en) * 2015-09-19 2015-12-02 杭州电子科技大学 Book semantic retrieval method based on content structures
CN105319978A (en) * 2015-12-09 2016-02-10 上海电机学院 Speech recognition based intelligent home control system
CN106782533A (en) * 2016-12-23 2017-05-31 陈勇 Incorrect pinyin acknowledgement key of the sound to correction Software Create in word correspondence
CN109446376A (en) * 2018-10-31 2019-03-08 广东小天才科技有限公司 Method and system for classifying voice through word segmentation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
语音关键词识别技术的研究;孙成立;《中国博士学位论文全文数据库(信息科技辑)》;20081015;全文 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12033615B2 (en) * 2020-11-04 2024-07-09 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for recognizing speech, electronic device and storage medium

Also Published As

Publication number Publication date
CN111540353A (en) 2020-08-14

Similar Documents

Publication Publication Date Title
CN111540353B (en) Semantic understanding method, device, equipment and storage medium
WO2020253362A1 (en) Service processing method, apparatus and device based on emotion analysis, and storage medium
CN108847241B (en) Method for recognizing conference voice as text, electronic device and storage medium
KR102151681B1 (en) Determining conversation states for the language model
JP4680691B2 (en) Dialog system
US7184539B2 (en) Automated call center transcription services
US20190279622A1 (en) Method for speech recognition dictation and correction, and system
EP2660810B1 (en) Post processing of natural language ASR
US8165887B2 (en) Data-driven voice user interface
US20160163314A1 (en) Dialog management system and dialog management method
US20050131673A1 (en) Speech translation device and computer readable medium
US7680661B2 (en) Method and system for improved speech recognition
US20080133245A1 (en) Methods for speech-to-speech translation
JP2017058673A (en) Dialog processing apparatus and method, and intelligent dialog processing system
US20030191625A1 (en) Method and system for creating a named entity language model
CN111508479B (en) Voice recognition method, device, equipment and storage medium
WO2018055983A1 (en) Translation device, translation system, and evaluation server
US11204964B1 (en) Systems and methods for conversing with a user
CN110866100A (en) Phonetics generalization method and device and electronic equipment
Hone et al. Designing habitable dialogues for speech-based interaction with computers
WO2023045186A1 (en) Intention recognition method and apparatus, and electronic device and storage medium
CN111142834A (en) Service processing method and system
JP4000828B2 (en) Information system, electronic equipment, program
Di Fabbrizio et al. AT&t help desk.
US20190279623A1 (en) Method for speech recognition dictation and correction by spelling input, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant