CN110232920B

CN110232920B - Voice processing method and device

Info

Publication number: CN110232920B
Application number: CN201910542572.5A
Authority: CN
Inventors: 季文驰
Original assignee: Apollo Zhilian Beijing Technology Co Ltd
Current assignee: Apollo Zhilian Beijing Technology Co Ltd
Priority date: 2019-06-21
Filing date: 2019-06-21
Publication date: 2021-11-19
Anticipated expiration: 2039-06-21
Also published as: CN110232920A

Abstract

The embodiment of the application discloses a voice processing method and device. One embodiment of the method comprises: carrying out voice recognition on the acquired user voice to obtain a text of the user voice; in response to determining that the text is a reference answer instruction sentence comprising a question-answer pair, performing generalization processing on the text to obtain a generalization processing result; and correspondingly storing the questions and the answers in the question-answer pair, and storing the generalized processing result. The user in the embodiment of the application can conveniently and quickly set the reference answer through voice.

Description

Voice processing method and device

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of internet, and particularly relates to a voice processing method and device.

Background

With the development of voice processing technology, the voice interaction technology between the user and the device is more and more widely applied. The user can find an answer to a question by voice, but if the user wants to manually set an answer to a certain question, the user must manually input the question and the answer.

In addition, since the customized question-answer pairs are fixed contents, the user slightly modifies the questions asked of the device, which results in far-reaching answers being fed back by the device.

Disclosure of Invention

The embodiment of the application provides a voice processing method and device.

In a first aspect, an embodiment of the present application provides a speech processing method, including: carrying out voice recognition on the acquired user voice to obtain a text of the user voice; in response to the fact that the text is determined to be a reference answer instruction sentence comprising a question-answer pair, carrying out generalization processing on the text to obtain a generalization processing result; and correspondingly storing the questions and answers in the question-answer pairs and storing the generalized processing results.

In some embodiments, generalizing the sentence to obtain a generalization result, including: selecting at least one word as a target word from the words contained in the question-answer pair, and determining the synonym of the target word; and adopting at least one synonym in the synonyms to replace the target word corresponding to the synonym in the question-answer pair, and generating the synonym sentence.

In some embodiments, at least one synonym is selected by: and selecting at least one synonym corresponding to the user portrait from synonyms of the target words based on the user portrait of the user corresponding to the user voice.

In some embodiments, at least one synonym is selected by: and selecting at least one synonym corresponding to the intention from the synonyms of the target words based on the intention of the voice of the user.

In some embodiments, the corresponding storage of the question sentences and the answers in the question-answer pairs includes: and responding to the question-answer pair as a Chinese sentence, converting the question-answer pair into pinyin, and correspondingly storing the pinyin of the question sentence and the pinyin of the answer in the question-answer pair obtained by conversion.

In some embodiments, storing the generalization processing result includes: and responding to the fact that the synonym sentence is the Chinese sentence, converting the synonym sentence into pinyin, and storing the pinyin in the synonym sentence obtained through conversion.

In some embodiments, storing the generalization processing result includes: if the synonym sentence comprises the question and does not comprise the answer, the synonym sentence and the answer in the question-answer pair are correspondingly stored; if the synonym sentences do not comprise questions and answers, storing the synonym sentences and the questions in the question-answer pairs correspondingly; and if the synonym sentence comprises the question and the answer, correspondingly storing the question and the answer in the synonym sentence.

In some embodiments, the method further comprises: in response to receiving the first user voice, determining text of the first user voice; responding to the fact that the characters corresponding to the first user voice are question sentences and Chinese sentences, and taking pinyin of the characters of the first user voice as first pinyin; searching the pinyin of the question matched with the first pinyin in a database, and determining the pinyin of the answer corresponding to the pinyin of the matched question as the target answer pinyin; and generating a reply sentence of the first user voice according to the answer corresponding to the target answer pinyin.

In a second aspect, an embodiment of the present application provides a speech processing apparatus, including: the recognition unit is configured to perform voice recognition on the acquired user voice to obtain a text of the user voice; the generalization unit is configured to respond to the fact that the text is determined to be a reference answer indicating statement comprising a question-answer pair, carry out generalization processing on the text and obtain a generalization processing result; and the storage unit is configured to correspondingly store the questions and the answers in the question-answer pairs and store the generalized processing results.

In some embodiments, the generalization unit is further configured to: selecting at least one word as a target word from the words contained in the question-answer pair, and determining the synonym of the target word; and adopting at least one synonym in the synonyms to replace the target word corresponding to the synonym in the question-answer pair, and generating the synonym sentence.

In some embodiments, the storage unit is further configured to: and responding to the question-answer pair as a Chinese sentence, converting the question-answer pair into pinyin, and correspondingly storing the pinyin of the question sentence and the pinyin of the answer in the question-answer pair obtained by conversion.

In some embodiments, the storage unit is further configured to: and responding to the fact that the synonym sentence is the Chinese sentence, converting the synonym sentence into pinyin, and storing the pinyin in the synonym sentence obtained through conversion.

In some embodiments, the storage unit is further configured to: if the synonym sentence comprises the question and does not comprise the answer, the synonym sentence and the answer in the question-answer pair are correspondingly stored; if the synonym sentences do not comprise questions and answers, storing the synonym sentences and the questions in the question-answer pairs correspondingly; and if the synonym sentence comprises the question and the answer, correspondingly storing the question and the answer in the synonym sentence.

In some embodiments, the apparatus further comprises: a receiving unit configured to determine a text of a first user voice in response to receiving the first user voice; a determining unit configured to take a pinyin of a character of the first user voice as a first pinyin in response to the character corresponding to the first user voice being a question sentence and being a Chinese sentence; a search unit configured to search, in a database, for a pinyin for a question sentence that matches the first pinyin, and determine a pinyin for an answer corresponding to the pinyin for the matching question sentence as a target answer pinyin; and the generating unit is configured to generate a reply sentence of the first user voice according to the answer corresponding to the target answer pinyin.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement a method as in any embodiment of the speech processing method.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements a method as in any one of the embodiments of the speech processing method.

According to the voice processing scheme provided by the embodiment of the application, firstly, voice recognition is carried out on the obtained user voice to obtain the text of the user voice. And then, in response to the fact that the text is determined to be the reference answer instruction sentence comprising the question-answer pair, the text is subjected to generalization processing, and a generalization processing result is obtained. And finally, correspondingly storing the questions and the answers in the question-answer pairs and storing the generalized processing results. The user in the embodiment of the application can conveniently and quickly set the reference answer through voice, and the question and answer pairs set by the user can be stored by the equipment, so that the answer is more in line with the wish of the user in the interaction process. In addition, the embodiment of the application enhances the intelligence degree and the learning capacity of the electronic equipment through generalization processing, so that the electronic equipment can hold three things against each other, rather than strictly referring to the question and answer pair set by the user word by word for an interactive process.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram for one embodiment of a speech processing method according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a speech processing method according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a speech processing method according to the present application;

FIG. 5 is a schematic block diagram of one embodiment of a speech processing apparatus according to the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the speech processing method or speech processing apparatus of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a voice processing application, a video application, a live application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the

terminal devices

101, 102, 103.

Here, the

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the

terminal devices

101, 102, 103. The background server may analyze and perform other processing on the received data such as the user voice, and feed back a processing result (e.g., a generalization processing result) to the terminal device.

It should be noted that the voice processing method provided in the embodiment of the present application may be executed by the server 105 or the

terminal devices

101, 102, and 103, and accordingly, the voice processing apparatus may be disposed in the server 105 or the

terminal devices

101, 102, and 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a speech processing method according to the present application is shown. The voice processing method comprises the following steps:

step 201, performing voice recognition on the acquired user voice to obtain a text of the user voice.

In this embodiment, an execution subject of the voice processing method (for example, a server or a terminal device shown in fig. 1) may perform voice recognition on the acquired user voice, and the obtained voice recognition result is a text of the user voice. Specifically, the user speech refers to speech uttered by the user while speaking. The voice of the user is subjected to voice recognition, and the voice can be converted into text.

Step 202, in response to determining that the text is a reference answer instruction sentence including a question-answer pair, performing generalization processing on the text to obtain a generalization processing result.

In this embodiment, the execution subject may perform generalization processing on the text in response to determining that the text is a reference answer instruction sentence including a question-answer pair, thereby obtaining a generalization processing result. The generalization processing refers to a processing procedure for making an electronic device have better generalization capability. The object of the generalization can be one or both of a question and an answer in the text, and the resulting generalization results are generalized questions and/or answers.

In particular, the reference answer indication sentence includes a question-answer pair having a question and an answer, and if the user indicates that the answer therein can be a reference answer to the question, that is, if the user proposes the question, the electronic device can output the answer in the reference answer indication sentence.

For example, the user speech spoken by the user expresses "who is the best-looking person who i say again next time you say you are certainly cheering", and the question and answer here are "who is the best-looking person" and "you are certainly cheering", respectively. The question-answer pair can be generalized, i.e., replacing "nice looking" in the question with "beautiful".

In practice, the execution body may perform word segmentation on the text, and then extract a sentence stem of the text and match the sentence stem with a preset sentence stem template. If the matched sentence backbone template is the template of the reference answer indication sentence, the text can be determined as the reference answer indication sentence.

Step 203, storing the questions and answers in the question-answer pairs correspondingly, and storing the generalization processing results.

In this embodiment, the executing entity may store the question and the answer in the question-answer pair correspondingly. In practice, the stored correspondence of questions and answers may be not only one-to-one, but also one-to-many, or many-to-one. For example, the question "who is best to see" and "who is most beautiful" may be stored in correspondence with the answer "is you".

In some optional implementation manners of this embodiment, in step 203, the storing the question and the answer in the question-answer pair correspondingly includes: and responding to the question-answer pair as a Chinese sentence, converting the question-answer pair into pinyin, and correspondingly storing the pinyin of the question sentence and the pinyin of the answer in the question-answer pair obtained by conversion.

In these alternative implementations, the execution body may store the pinyin. In this way, storing pinyin may avoid the problem of a user presenting a question that cannot be matched with a stored answer due to the presence of homophones (e.g., homophones from speech recognition or manually entered homophones) as opposed to storing only question-answer pairs in text form. For example, the question and answer pair "is not the most beautiful least sensitive" and "is the most beautiful least sensitive" is stored in a text, and the question "is not the most beautiful least sensitive" of the user and cannot find a proper answer. In the implementation manners, after the user sends out the voice, if homophones exist in the voice, the found answer is not accurate because the characters of the voice are unsuccessfully matched with the stored questions word by word.

In some optional application scenarios of these implementations, the method further includes: in response to receiving the first user voice, determining text of the first user voice; responding to the fact that the characters corresponding to the first user voice are question sentences and Chinese sentences, and taking pinyin of the characters of the first user voice as first pinyin; searching the pinyin of the question matched with the first pinyin in a database, and determining the pinyin of the answer corresponding to the pinyin of the matched question as the target answer pinyin; and generating a reply sentence of the first user voice according to the answer corresponding to the target answer pinyin.

In these optional application scenarios, after storing the pinyin, the execution body may convert the problem voice into text and further convert the text into pinyin during the voice interaction process. Then, the executing body can use the pinyin of the question to search the answer pinyin stored in the database and convert the pinyin into characters.

In practice, the execution body may generate the reply statement in various ways. For example, the execution main body may directly use the found answer corresponding to the answer pinyin of the question as a reply sentence. Exclamation or word assist, etc. may also be added to the answer to generate a reply sentence.

The application scenes can utilize the stored pinyin, so that the stored content can be mutually matched with more questions of the user, answers corresponding to the questions of the user can be found, and the accuracy of equipment reply in the interaction process is improved.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the speech processing method according to the present embodiment. In the application scenario of fig. 3, the executing entity 301 may perform speech recognition on the acquired user speech 302 to obtain the text "who i say is the most beautiful person next time, and you say you of course" 303 of the user speech. In response to determining that the text 303 is a reference answer instruction sentence including a question-answer pair, the text 303 is subjected to generalization processing, resulting in a generalized processing result 304. And correspondingly storing the question who is the most beautiful person in the question-answer pair and the answer which is you of course, and storing the generalization processing result.

The user in the embodiment can conveniently and quickly set the reference answer through voice, and the question and answer pairs set by the user can be stored by the equipment, so that the answer is more in line with the wish of the user during interaction. In addition, the embodiment enhances the intelligence degree and the learning ability of the electronic equipment through generalization processing, so that the electronic equipment can take the effect of one action and three actions instead of strictly referring to the question-answer pairs set by the user word by word for an interactive process.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a speech processing method is shown. The flow 400 of the speech processing method includes the following steps:

step 401, performing voice recognition on the acquired user voice to obtain a text of the user voice.

Step 402, in response to determining that the text is a reference answer indication sentence including a question-answer pair, selecting at least one word as a target word from the words included in the question-answer pair, and determining a synonym of the target word.

In this embodiment, the executing body may select a target word from the words included in the question-answer pair and determine a synonym of the target word in response to determining that the text is a reference answer indication sentence including the question-answer pair. The number of the selected target words is at least one.

In practice, the execution subject may select the target word in various ways. For example, the execution subject may use a word with a preset part of speech as a target word, such as at least one of the following: adjectives, adverbs, nouns. For example, an adjective may be "nice looking" and synonyms for the adjective may be "beautiful" and "aesthetic". In addition, the execution main body can also select the target words by utilizing the sentence components to which the words belong. For example, if a sentence component to which a word belongs is a predicate, the word may be taken as a target word.

And 403, replacing the target word corresponding to the synonym in the question-answer pair with at least one synonym in the synonyms to generate a synonym sentence.

In this embodiment, after obtaining the at least one synonym, the executing entity may respectively adopt each synonym in the at least one synonym to replace a target word in the question-answer pair, so as to generate at least one synonym sentence. For example, in a question-answer pair, the question and answer are "who is the best-looking person" and "of course you do", respectively. The execution main body can replace synonyms for the question and answer pair, namely replace 'good looking' in the question with 'beautiful', obtain a synonym sentence 'who is the most beautiful person', and correspondingly store the synonym sentence and the answer in the question and answer pair.

In some optional implementations of this embodiment, the at least one synonym is selected by: and selecting at least one synonym corresponding to the user portrait from synonyms of the target words based on the user portrait of the user corresponding to the user voice.

In these alternative implementations, the executive may select synonyms based on the user representation. Specifically, various user information (such as a user profile) for the user corresponding to the voice may be information pre-stored on the terminal device and/or the server.

In practice, the user representation may indicate age, gender, and so forth. Users of various ages (or age groups) and genders may have a white list and/or a black list of synonyms. For example, synonyms for "beauty" may include "beautiful", and "general", and when the personality of a user is a woman, the white list of synonyms for the user may include "beautiful", and the black list may include "general". As another example, synonyms for "general" may include "Cool", "dazzle", "cattle". When the user is an elderly person, the white list of synonyms corresponding to the user may include "cattle" and the black list may include "cool" and "dazzle".

These implementations may specifically and accurately determine the synonym sentence matching the user based on the user representation of the user, so that the device can more accurately and properly answer the user when interacting with the user.

In some optional implementations of this embodiment, the at least one synonym is selected by: and selecting at least one synonym corresponding to the intention from the synonyms of the target words based on the intention of the voice of the user.

In these alternative implementations, the executive may determine the intent of the text and select at least one synonym based on the intent. Specifically, with different intentions for questions, the vocabulary employed for answers may be different, and thus, each intended question may have a white list and/or a black list of corresponding synonyms. For example, question-answer pair No. 1 is "who is the best-looking person", "certainly you do", question-answer pair No. 2 is "which is the best-looking picture", and "certainly is cheer on your wall". Synonyms for "nice looking" may include "beautiful", "beautiful" and "general". And the intention in question-answer pair 2 is to ask for a picture, so the commander is not suitable and can be in a blacklist of synonyms corresponding to the intention.

These implementations may choose synonyms that fit the context by intent so that the device can more accurately and appropriately answer the user when interacting with the user.

It should be noted that the two implementation manners may be combined with each other, that is, the execution subject may select at least one synonym matching both the intention and the user portrait from synonyms of the target word based on the user portrait of the user corresponding to the user voice and the intention of the user voice, so as to obtain a more appropriate synonym.

And step 404, correspondingly storing the questions and answers in the question-answer pairs and storing the generalized processing results.

In this embodiment, the executing entity may store the question and the answer in the question-answer pair correspondingly. Specifically, in the generalization processing, if only the question is generalized, the generalized question and the answer in the question-answer pair described above may be stored correspondingly. If only the answers are generalized, the questions in the question-answer pairs and the generalized answers may be stored correspondingly. If the questions and the answers are all subjected to generalization processing, the generalized questions and the generalized answers can be correspondingly stored.

In some optional implementation manners of this embodiment, in step 404, storing the generalization processing result may include: and responding to the fact that the synonym sentence is the Chinese sentence, converting the synonym sentence into pinyin, and storing the pinyin in the synonym sentence obtained through conversion.

In these alternative implementations, the execution body may not store the synonym sentence, but convert the synonym sentence into pinyin and store the pinyin. The implementation modes can avoid that the searched answer is inaccurate due to unsuccessful word-by-word matching of the sentence and the stored question caused by homophone, thereby improving the accuracy of the reply output by the equipment during interaction. Specific storage correspondence may refer to the following implementation.

In some optional implementation manners of this embodiment, in step 404, storing the generalization processing result may include: if the synonym sentence comprises the question and does not comprise the answer, the synonym sentence and the answer in the question-answer pair are correspondingly stored; if the synonym sentences do not comprise questions and answers, storing the synonym sentences and the questions in the question-answer pairs correspondingly; and if the synonym sentence comprises the question and the answer, correspondingly storing the question and the answer in the synonym sentence.

In these alternative implementations, if the selected target word is only present in the questions of the question-answer pair, only the questions are replaced, and the synonym sentence corresponding to the questions is generated, and the answers in the question-answer pair are not replaced. In this case, the synonym sentence and the answer in the question-answer pair may be stored in association with each other. If the selected target word is only in the answer of the question-answer pair, the synonym sentence corresponding to the answer and the question in the question-answer pair can be correspondingly stored. If the selected target word exists in the question and answer of the question-answer pair, the execution main body can obtain a synonym sentence comprising the question and the answer, and correspondingly store the question and the answer in the synonym sentence.

For example, if the question-and-answer pair is "who is the best-looking person" and "of course you do", the "nice-looking" in the question can be replaced by "beautiful". And the synonym sentence "who is the most beautiful person" obtained by replacement is stored correspondingly with the answer "of course you do" in the question-answer pair.

For another example, if the question-and-answer pair is "how I grow like" or "naturally very cheerful". The "nice look" in the answer may be replaced by "beautiful". And the synonym sentence 'is certainly very beautiful and cheerful' obtained by replacement is correspondingly stored with the question 'how I grow' in the question-answer pair.

Further, if the question-answer pair is "who is the best-looking person" and "of course, you are the best-looking cheer". "nice looking" in the question and answer may be replaced with "beautiful", the generated synonym sentence including the question "who is the most beautiful person", and the answer "certainly, you are the most beautiful. The execution subject may store the question and the answer in the synonym sentence in correspondence.

The realization modes can adopt different storage modes for the generalization processing result according to different conditions of generalization processing, thereby improving the accuracy of answering the user during voice interaction.

The embodiment can utilize the generated synonym sentences to generalize the question-answer pairs so as to further improve the intelligence degree of the electronic equipment.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of a speech processing apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 5, the speech processing apparatus 500 of the present embodiment includes: an identification unit 501, a generalization unit 502, and a storage unit 503. The recognition unit 501 is configured to perform voice recognition on the acquired user voice to obtain a text of the user voice; a generalization unit 502 configured to, in response to determining that the text is a reference answer instruction sentence including a question-answer pair, perform generalization processing on the text to obtain a generalization processing result; the storage unit 503 is configured to store the questions and answers in the question-answer pairs in a corresponding manner, and store the generalized processing results.

In some embodiments, the recognition unit 501 of the speech processing apparatus 500 performs speech recognition on the acquired user speech, and the obtained speech recognition result is text of the user speech. Specifically, the user speech refers to speech uttered by the user while speaking. The voice of the user is subjected to voice recognition, and the voice can be converted into text.

In some embodiments, the generalization unit 502 may, in response to determining that the text is a reference answer indication sentence comprising a question-answer pair, generalize the text, resulting in a generalized processing result. The generalization processing refers to a processing procedure for making an electronic device have better generalization capability. The object of the generalization can be one or both of a question and an answer in the text, and the resulting generalization results are generalized questions and/or answers.

In some embodiments, the storage unit 503 may store the questions and answers in question-answer pairs correspondingly. In practice, the stored correspondence of questions and answers may be not only one-to-one, but also one-to-many, or many-to-one. For example, the question "who is best to see" and "who is most beautiful" may be stored in correspondence with the answer "is you".

In some optional implementations of this embodiment, the generalization unit is further configured to: selecting at least one word as a target word from the words contained in the question-answer pair, and determining the synonym of the target word; and adopting at least one synonym in the synonyms to replace a target word corresponding to the synonym in the question-answer pair, and generating a synonym sentence.

In some optional implementations of this embodiment, the at least one synonym is selected by: and selecting at least one synonym corresponding to the user portrait from the synonyms of the target words based on the user portrait of the user corresponding to the user voice.

In some optional implementations of this embodiment, the at least one synonym is selected by: and selecting at least one synonym corresponding to the intention from the synonyms of the target words based on the intention of the user voice.

In some optional implementations of this embodiment, the storage unit is further configured to: and responding to the question-answer pair as a Chinese sentence, converting the question-answer pair into pinyin, and correspondingly storing the pinyin of the question sentence and the pinyin of the answer in the converted question-answer pair.

In some optional implementations of this embodiment, the storage unit is further configured to: and responding to the fact that the synonym sentence is a Chinese sentence, converting the synonym sentence into pinyin, and storing the pinyin in the synonym sentence obtained through conversion.

In some optional implementations of this embodiment, the storage unit is further configured to: if the synonym sentence comprises a question and does not comprise an answer, correspondingly storing the synonym sentence and the answer in the question-answer pair; if the synonym sentences do not comprise questions and comprise answers, the synonym sentences and the questions in the question-answer pairs are correspondingly stored; and if the synonym sentence comprises a question and an answer, correspondingly storing the question and the answer in the synonym sentence.

In some optional implementations of this embodiment, the apparatus further includes: a receiving unit configured to determine a text of a first user voice in response to receiving the first user voice; a determining unit configured to take a pinyin of the text of the first user voice as a first pinyin in response to the text corresponding to the first user voice being a question sentence and being a Chinese sentence; a search unit configured to search the database for a pinyin for a question sentence matching the first pinyin and determine a pinyin for an answer corresponding to the pinyin for the matching question sentence as a target answer pinyin; and the generating unit is configured to generate a reply sentence of the first user voice according to the answer corresponding to the target answer pinyin.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium of the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an identification unit, a generalization unit, and a storage unit. The names of these units do not form a limitation on the unit itself in some cases, and for example, the recognition unit may also be described as a "unit that performs voice recognition on the acquired user voice to obtain a text of the user voice".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: carrying out voice recognition on the acquired user voice to obtain a text of the user voice; in response to the fact that the text is determined to be a reference answer instruction sentence comprising a question-answer pair, carrying out generalization processing on the text to obtain a generalization processing result; and correspondingly storing the questions and answers in the question-answer pairs and storing the generalized processing results.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method of speech processing, the method comprising:

carrying out voice recognition on the acquired user voice to obtain a text of the user voice;

in response to the fact that the text is determined to be a reference answer instruction sentence comprising a question-answer pair, performing generalization processing on the text to obtain a generalization processing result;

correspondingly storing the questions and answers in the question-answer pairs and storing the generalized processing results;

the step of generalizing the statement to obtain a generalized processing result includes:

selecting at least one word as a target word from the words contained in the question-answer pair, and determining the synonym of the target word; adopting at least one synonym in the synonyms to replace a target word corresponding to the synonym in the question-answer pair, and generating a synonym sentence;

the at least one synonym is selected by the following steps:

and selecting at least one synonym corresponding to the intention from the synonyms of the target words based on the intention of the voice of the user, wherein each intention question can have a white list and/or a black list of the corresponding synonym.

2. The method of claim 1, wherein the at least one synonym is selected by:

and selecting at least one synonym corresponding to the user portrait from the synonyms of the target words based on the user portrait of the user corresponding to the user voice.

3. The method of claim 1, wherein the correspondingly storing the question sentences and the answers in the question-answer pairs comprises:

and responding to the question-answer pair as a Chinese sentence, converting the question-answer pair into pinyin, and correspondingly storing the pinyin of the question sentence and the pinyin of the answer in the converted question-answer pair.

4. The method of claim 1, wherein said storing the generalization processing result comprises:

and responding to the fact that the synonym sentence is a Chinese sentence, converting the synonym sentence into pinyin, and storing the pinyin in the synonym sentence obtained through conversion.

5. The method of claim 1, wherein said storing the generalization processing result comprises:

if the synonym sentence comprises a question and does not comprise an answer, correspondingly storing the synonym sentence and the answer in the question-answer pair;

if the synonym sentences do not comprise questions and comprise answers, the synonym sentences and the questions in the question-answer pairs are correspondingly stored;

and if the synonym sentence comprises a question and an answer, correspondingly storing the question and the answer in the synonym sentence.

6. The method of claim 4, wherein the method further comprises:

in response to receiving a first user voice, determining text of the first user voice;

responding to that the characters corresponding to the first user voice are question sentences and Chinese sentences, and taking the pinyin of the characters of the first user voice as a first pinyin;

searching the pinyin of the question sentence matched with the first pinyin in a database, and determining the pinyin of the answer corresponding to the pinyin of the matched question sentence as the target answer pinyin;

and generating a reply sentence of the first user voice according to the answer corresponding to the target answer pinyin.

7. A speech processing apparatus, the apparatus comprising:

the recognition unit is configured to perform voice recognition on the acquired user voice to obtain a text of the user voice;

a generalization unit configured to, in response to determining that the text is a reference answer indication sentence including a question-answer pair, perform generalization processing on the text to obtain a generalization processing result;

the storage unit is configured to correspondingly store the questions and the answers in the question-answer pairs and store the generalized processing results;

the generalization unit is further configured to:

the at least one synonym is selected by the following steps:

8. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-6.