CN114927134A - Information processing method, device, equipment and computer readable storage medium - Google Patents

Information processing method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN114927134A
CN114927134A CN202210639832.2A CN202210639832A CN114927134A CN 114927134 A CN114927134 A CN 114927134A CN 202210639832 A CN202210639832 A CN 202210639832A CN 114927134 A CN114927134 A CN 114927134A
Authority
CN
China
Prior art keywords
information
user
voice
reply
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210639832.2A
Other languages
Chinese (zh)
Inventor
单权强
余贤雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Hangzhou Information Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Hangzhou Information Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202210639832.2A priority Critical patent/CN114927134A/en
Publication of CN114927134A publication Critical patent/CN114927134A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Abstract

The invention discloses an information processing method, an information processing device, information processing equipment and a computer readable storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining all first user voice information within preset time, preprocessing the first user voice information, determining high-frequency information, performing natural language generation processing on the high-frequency information, determining first recovered text information, performing voice synthesis processing on the first recovered text information, determining the first recovered voice information and caching the first recovered voice information.

Description

Information processing method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to an information processing method, an information processing apparatus, an information processing device, and a computer-readable storage medium.
Background
In the existing voice interaction process of the user terminal and the platform server, the platform server acquires an inquiry request sent by the user terminal, feeds back reply voice information to the user terminal after the inquiry request is processed by the platform server, and finally the user terminal receives and broadcasts the reply voice information.
In an actual application scenario, a considerable part of the same inquiry requests exist in all the inquiry requests initiated by the user terminal, and no matter how many times the user terminal initiates the same inquiry requests, the platform server performs voice synthesis processing on the inquiry requests, and the repeated voice synthesis processing process undoubtedly greatly increases the burden of the platform server, so that the time for the platform server to process the inquiry requests is prolonged, thereby prolonging the time for the user terminal to broadcast the reply voice information, and leading to poor user experience.
Disclosure of Invention
The invention mainly aims to provide an information processing method, information processing equipment and a computer readable storage medium, and aims to solve the technical problems that the load of a platform server is increased and the time for feeding back a reply voice is long due to the fact that the platform server carries out voice synthesis processing on all inquiry requests in the existing voice interaction process.
In order to achieve the above object, the present invention provides an information processing method applied to a platform server, the information processing method including the steps of:
acquiring all first user voice information within preset time, wherein the first user voice information is sent to a platform server by a user terminal;
preprocessing the first user voice information to determine high-frequency information;
performing natural language generation processing on the high-frequency information to determine first recovered character information;
and carrying out voice synthesis processing on the first reply text message, determining a first reply voice message and caching the first reply voice message.
Further, the step of preprocessing the first user voice information and determining high-frequency information includes:
performing voice recognition processing on the voice information of the first user to determine word information of the first user;
performing natural language understanding processing on the first user character information, and determining first field information, first intention information and first key slot position information corresponding to each first user character information;
and determining high-frequency information based on the first field information, the first intention information and the first key slot position information corresponding to each first user character information.
Further, the step of determining high-frequency information based on the first field information, the first intention information, and the first key slot position information corresponding to each first user text information includes:
determining that the first user character information with the same first field information, first intention information and first key slot position information is the same type character information, and recording the number of the same type character information;
if the number reaches a first preset number, determining that the same type of character information is high-frequency information; or sequencing the same type character information based on the sequence of the number from large to small to obtain a sequencing result, and taking a second preset number of same type character information which is sequenced in the sequencing result in front as the high-frequency information.
Further, the information processing method further includes:
acquiring second user voice information sent by the user terminal;
performing voice recognition processing on the voice information of the second user to determine second user character information;
performing natural language understanding processing on the second user character information, and determining second field information, second intention information and second key slot position information corresponding to the second user character information;
determining whether target high-frequency information corresponding to second user character information exists in the high-frequency information or not based on the second field information, the second intention information and the second key slot position information;
and if the target high-frequency information exists, acquiring second reply voice information corresponding to the target high-frequency information from the cached first reply voice information, and sending the second reply voice information to the user terminal.
Further, the step of determining whether target high-frequency information corresponding to second user text information exists in the high-frequency information based on the second field information, the second intention information, and the second key slot information includes:
if target domain information matched with the second domain information exists in the first domain information of the high-frequency information, determining whether target intention information matched with the second intention information exists in the first intention information of the high-frequency information corresponding to the target domain information;
if the first intention information of the high-frequency information corresponding to the target field information has target intention information matched with the second intention information, determining whether the first key slot information of the high-frequency information corresponding to the target intention information has target key slot information matched with the second key slot information, wherein if the first key slot information of the high-frequency information corresponding to the target intention information has target key slot information matched with the second key slot information, determining that the target high-frequency information exists.
Further, after the step of determining whether target high-frequency information corresponding to second user text information exists in the high-frequency information based on the second field information, the second intention information, and the second key slot information, the information processing method further includes:
if the target high-frequency information does not exist, performing natural language generation processing on the second user character information to obtain user reply information, and sending the user reply information to a user terminal, wherein the user terminal receives the user reply information;
and if a voice synthesis request corresponding to the user reply message sent by the user terminal is received, performing voice synthesis processing on the user reply message to obtain a third reply voice message.
Further, the information processing method further includes:
acquiring user operation information sent by the user terminal;
if the user operation information is activation operation information, sending a first fixed reply voice to a user terminal, wherein the user terminal stores the first fixed reply voice;
and if the user operation information is tone color modification information, sending a second fixed reply voice to the user terminal, wherein the user terminal stores the second fixed reply voice.
Further, to achieve the above object, the present invention also provides an information processing apparatus comprising:
the system comprises an acquisition module, a platform server and a processing module, wherein the acquisition module is used for acquiring all first user voice information within preset time, and the first user voice information is sent to the platform server by a user terminal;
the first processing module is used for preprocessing the first user voice information and determining high-frequency information;
the second processing module is used for performing natural language generation processing on the high-frequency information and determining first recovered character information;
and the cache module is used for carrying out voice synthesis processing on the first reply text message, determining a first reply voice message and caching the first reply voice message.
Further, to achieve the above object, the present invention also provides an information processing apparatus comprising: the information processing system comprises a memory, a processor and an information processing program which is stored on the memory and can run on the processor, wherein the information processing program realizes the steps of the information processing method when being executed by the processor.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon an information processing program which, when executed by a processor, realizes the steps of the aforementioned information processing method.
The invention obtains all the first user voice information within the preset time, wherein the first user voice information is sent to a platform server by a user terminal, then the first user voice information is preprocessed to determine high-frequency information, then the high-frequency information is processed by natural language generation to determine first reply text information, finally the first reply text information is processed by voice synthesis to determine first reply voice information and cache the first reply voice information, the high-frequency information in the first user voice information can be used for caching the first reply voice information corresponding to the high-frequency information, when the user voice information matched with the high-frequency information is received subsequently, the corresponding reply voice information can be determined by the cached first reply voice information, the voice synthesis process is reduced, and the burden of the platform server is further reduced, the user terminal can quickly obtain the reply voice information, and the use experience of the user is improved.
Drawings
Fig. 1 is a schematic structural diagram of an information processing apparatus in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first embodiment of an information processing method according to the present invention;
fig. 3 is a schematic diagram of functional modules of an information processing apparatus according to an embodiment of the present invention.
The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic structural diagram of an information processing apparatus in a hardware operating environment according to an embodiment of the present invention.
The information processing device of the embodiment of the invention can be a PC. As shown in fig. 1, the information processing apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. The communication bus 1002 is used to implement connection communication among these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001 described previously.
Alternatively, the information processing apparatus may further include a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Of course, the information processing device may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.
Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an information processing program.
In the information processing apparatus shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to call up an information processing program stored in the memory 1005.
In this embodiment, an information processing apparatus includes: the information processing method comprises a memory 1005, a processor 1001 and an information processing program which is stored on the memory 1005 and can run on the processor 1001, wherein when the processor 1001 calls the information processing program stored in the memory 1005, the steps of the information processing method in each embodiment are executed.
The invention also provides a method, and referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the method of the invention.
In this embodiment, the information processing method is applied to a platform server, and includes the following steps:
step S101, acquiring all first user voice information within preset time, wherein the first user voice information is sent to a platform server by a user terminal;
step S102, preprocessing the first user voice information and determining high-frequency information;
step S103, natural language generation processing is carried out on the high-frequency information, and first reply character information is determined;
step S104, carrying out voice synthesis processing on the first recovered text message, determining a first recovered voice message and caching the first recovered voice message.
In this embodiment, each user terminal monitors the voice of the user in real time and acquires the voice information of the first user, and specifically, the user terminal may acquire the voice information of the first user in real time and send the voice information to the platform server when detecting the acquisition start instruction. And then, the platform server acquires the first user voice information sent by all the user terminals within the preset time in real time, wherein the preset time can be set manually, for example, the preset time is set to be one day or one week.
Then, the first user voice information is preprocessed to determine high-frequency information, in an implementation manner, the platform server performs Natural Language Understanding Processing on the first user text information, Natural Language Understanding Processing may be performed by a Natural Language Processing service NLP (Natural Language Processing) in the platform server, one core function of the NLP is Natural Language Understanding service NLU (Natural Language Understanding), and when the NLU processes the first user text information, text parameters corresponding to each user text information in the first user text information are returned, for example: information such as first domain information, first intention information, first key slot position information and the like. The first user character information with the same first field information, first intention information and first key slot position information is the same type of character information, and the number of the same type of character information is recorded. Or in another implementation mode, the platform server calculates the similarity between the user text messages in the first user text message, if the similarity between one user text message in the first user text message and the other user text messages in the first user text message reaches a numerical value of the artificially preset similarity, the user text messages are determined to be the same type text messages, and the number of the same type text messages is recorded. Specifically, cosine similarity may be used, and a cosine value of an included angle between a vector of the user a text information and a vector of the user B text information in a vector space is used as a measure of a difference between the vector of the user a text information and the user B text information. The closer the cosine value is to 1, indicating that the closer the angle is to 0 degrees, the more similar the two vectors are. And finally, if the number reaches a first preset number, determining that the same type of character information is high-frequency information, or sequencing the same type of character information in a descending order of the number to obtain a sequencing result, and taking a second preset number of the same type of character information which is sequenced earlier in the sequencing result as the high-frequency information.
The method includes performing Natural Language generation processing on the high-frequency information, determining first reply text information, specifically, performing NLG processing on the high-frequency information through a Natural Language Generation (NLG) service, which is a core function of the NLP, so as to obtain specific and real first reply text information for replying the high-frequency information. For example, the high-frequency information is "what is the weather of today", and then the first reply text information of the high-frequency information "cloudy in daytime today, partly raining, cloudy in nighttime today, the highest temperature of 28 ℃, the lowest temperature of 20 ℃, the humidity of 55-95%, and turning to level 2 to 3 from level 3 to level 4 in the north wind" is generated through natural language generation processing.
And finally, performing voice synthesis processing on the first recovered text message, determining the first recovered voice message and caching the first recovered voice message. Specifically, the first user Text information may be processed by a Speech synthesis service TTS (Text to Speech, Speech synthesis) in the platform server to obtain first reply Speech information, where the TTS mainly includes 2 types: "splicing method" and "parametric method". The splicing method selects the needed basic units from a large amount of pre-recorded voices to splice into first user character information. The parameter method is that the voice parameters of every moment are generated according to the statistical model, and then the first user character information is obtained according to the parameters. And then caching the obtained first recovered voice information into a TTS voice synthesis service, or caching the obtained first recovered voice information into a platform server.
It should be noted that after performing speech synthesis processing on the first recovered text message, determining the first recovered speech message, and caching the first recovered speech message, in one implementation, the high-frequency message is associated with the first recovered speech message, or in another implementation, the high-frequency message and the first recovered speech message corresponding to the high-frequency message are tagged, so that when subsequently receiving the user speech message matching with the high-frequency message, the corresponding recovered speech message can be determined through the cached first recovered speech message.
In the information processing method provided in this embodiment, all the first user voice information within a preset time is obtained, where the first user voice information is sent from a user terminal to a platform server, the first user voice information is then preprocessed to determine high-frequency information, the high-frequency information is then subjected to natural language generation processing to determine first reply text information, the first reply text information is finally subjected to voice synthesis processing to determine first reply voice information and cache the first reply voice information, the first reply voice information corresponding to the high-frequency information can be cached according to the high-frequency information in the first user voice information, and when the user voice information matching the high-frequency information is subsequently received, the corresponding reply voice information can be determined through the cached first reply voice information, so that a voice synthesis process is reduced, and then the burden of the platform server is reduced, so that the user terminal can quickly obtain the reply voice information, and the use experience of the user is improved.
A second embodiment of the information processing method of the present invention is proposed based on the first embodiment, and in this embodiment, step S102 includes:
step S201, carrying out voice recognition processing on the voice information of the first user, and determining character information of the first user;
step S202, natural language understanding processing is carried out on the first user character information, and first domain information, first intention information and first key slot position information corresponding to each first user character information are determined;
step S203, high-frequency information is determined based on the first field information, the first intention information and the first key slot position information corresponding to each first user character information.
Specifically, the platform server performs voice Recognition processing on the voice information of the first user, determines text information of the first user, and performs voice Recognition processing on the voice information of the first user through a voice Recognition service ASR (Automatic Speech Recognition) in the platform server to obtain text information of the first user, where the text information of the first user includes text information corresponding to each piece of voice information in the voice information of the first user.
Next, Natural Language Understanding Processing is performed on the first user character information, that is, semantic Understanding, intention recognition, key slot position determination and the like are performed on the first user character information, it should be noted that the Natural Language Understanding Processing may be performed by a Natural Language Processing (NLP) service in the platform server, one core function of the NLP is Natural Language Understanding, and when the NLU processes the first user character information, parameter information such as first domain information, first intention information, first key slot position information and the like is returned. For example: the method comprises the steps that a user terminal detects that first user voice information of a user is 'how much weather of the user's day ', the first user voice information is sent to a platform server in real time, then the platform server carries out voice recognition processing on the first user voice information of' how much weather of the user's day' to obtain first user character information, then NLU processing is carried out on the first user character information to obtain first field information of the first user character information as 'weather', first intention information of the first user character information is 'inquiry about weather conditions', first key slot position information of the first user character information is 'today', and it is required to be noted that key slot positions can be understood as well-defined contents, for example, the key slot positions are time, place or people.
The platform server enables first user character information with the same first field information, first intention information and first key slot position information to be character information of the same type, records the number of the character information of the same type, determines that the character information of the same type is high-frequency information if the number reaches a first preset number, or sorts the character information of the same type according to the sequence of the number from large to small to obtain a sorting result, and takes the character information of the same type with a second preset number, which is ranked earlier in the sorting result, as the high-frequency information.
In the information processing method provided by this embodiment, the first user voice information is determined by performing voice recognition processing on the first user voice information, then, natural language understanding processing is performed on the first user voice information, first domain information, first intention information, and first key slot position information corresponding to each first user voice information are determined, then, high-frequency information is determined based on the first domain information, the first intention information, and the first key slot position information corresponding to each first user voice information, the high-frequency information can be accurately obtained according to the first domain information, the first intention information, and the first key slot position information, the first reply voice information corresponding to the high-frequency information is cached in the following, and when the user voice information matching the high-frequency information is received, the corresponding reply voice information can be determined through the cached first reply voice information, the voice synthesis process is reduced, the burden of the platform server is further reduced, the user terminal can quickly obtain the reply voice information, and the use experience of the user is improved.
A third embodiment of the information processing method of the present invention is proposed based on the second embodiment, and in this embodiment, step S202 includes:
step S301, determining that first user character information with the same first field information, first intention information and first key slot position information is character information of the same type, and recording the number of the character information of the same type;
step S302, if the number reaches a first preset number, determining that the same type of character information is high-frequency information; or sequencing the same type character information based on the sequence of the number from large to small to obtain a sequencing result, and taking a second preset number of same type character information which is sequenced in the sequencing result in front as the high-frequency information.
The platform server detects first field information, first intention information and first key slot position information corresponding to each first user character information, if the first field information, the first intention information and the first key slot position information are identical user character information, the user character information with the same first field information, the same first intention information and the same first key slot position information is determined, and the number of the character information of the same type is calculated.
And then when the number of the same type of text information reaches a first preset number, determining that the same type of text information is high-frequency information, wherein it needs to be described that the same user terminal repeatedly sends first user text information with the same first field information, first intention information and first key slot position information within the same preset time period, and the number of the same type of text information of the first user text information with the same first field information, first intention information and first key slot position information repeatedly sent is not increased or is reduced according to other manually set rules. For example, after the same user terminal sends a plurality of first user text messages within two hours of a manually preset time period, and the first field information, the first intention information and the first key slot position information of the plurality of first user text messages are the same after the platform server performs voice recognition processing and natural voice understanding, the number of the recorded text messages of the same type is still 1. Wherein, the first preset number may be 5, 10, etc.
Or, the same type character information is sequenced according to the sequence of the number from large to small to obtain a sequencing result, and a second preset number of the same type character information which is sequenced in the sequencing result in the front is used as the high-frequency information. And finally, performing voice synthesis processing on the first recovered text information, determining the first recovered voice information and caching the first recovered voice information. For example, if there are 10 kinds of the same type of the text information and the second preset number is 3, the same type of the text information is arranged from large to small according to the number of the same type of the text information, and the first three kinds of the same type of the text information are used as the high frequency information.
In the information processing method provided by this embodiment, it is determined that the first user text information in which the first field information, the first intention information, and the first key slot information are all the same is the same type of text information, and the number of the same type of text information is recorded, and then if the number reaches a first preset number, it is determined that the same type of text information is the high frequency information; or, based on the order of the number from large to small, the same type of character information is ordered to obtain an ordering result, the same type of character information of a second preset number, which is arranged at the front in the ordering result, is used as the high-frequency information, the high-frequency information can be accurately obtained according to the first field information, the first intention information and the first key slot position information, the first reply voice information corresponding to the high-frequency information is cached in the subsequent process, and when the user voice information matched with the high-frequency information is received, the corresponding reply voice information can be determined through the cached first reply voice information, so that the voice synthesis process is reduced, the burden of a platform server is further reduced, the user terminal can quickly obtain the reply voice information, and the use experience of a user is improved.
A fourth embodiment of the information processing method of the present invention is proposed based on the first embodiment, and in this embodiment, the information processing method includes:
step 401, acquiring second user voice information sent by the user terminal;
step 402, performing voice recognition processing on the second user voice information to determine second user character information;
step 403, performing natural language understanding processing on the second user character information, and determining second domain information, second intention information and second key slot position information corresponding to the second user character information;
step 404, determining whether target high-frequency information corresponding to second user character information exists in the high-frequency information based on the second field information, the second intention information and the second key slot position information;
step 405, if the target high-frequency information exists, acquiring second reply voice information corresponding to the target high-frequency information from the cached first reply voice information, and sending the second reply voice information to the user terminal.
In this embodiment, the user terminal monitors the voice of the user in real time and acquires the voice information of the second user, and specifically, the user terminal may acquire the voice information of the second user in real time and send the voice information to the platform server when detecting the acquisition start instruction. And then the platform server acquires the voice information of the second user in real time. And performing voice recognition processing on the second user voice information to obtain second user character information, then performing natural language understanding processing on the second user character information, and determining second field information, second intention information and second key slot position information corresponding to the second user character information. And judging whether target high-frequency information corresponding to the second user character information exists in the high-frequency information or not according to the second field information, the second intention information and the second key slot position information.
Further, in an embodiment, step S404 includes:
step a, if target domain information matched with the second domain information exists in first domain information of the high-frequency information, determining whether target intention information matched with the second intention information exists in first intention information of the high-frequency information corresponding to the target domain information;
and b, if the first intention information of the high-frequency information corresponding to the target field information has target intention information matched with the second intention information, determining whether the first key slot position information of the high-frequency information corresponding to the target intention information has target key slot position information matched with the second key slot position information, wherein if the first key slot position information of the high-frequency information corresponding to the target intention information has target key slot position information matched with the second key slot position information, determining that the target high-frequency information exists.
In this embodiment, the platform server checks whether target domain information matching the second domain information exists in the first domain information of the high-frequency information in the cache. And if the target field information exists, continuously checking whether the first intention information of the high-frequency information corresponding to the target field information in the cache has target intention information matched with the second intention information. And if the target intention information exists, continuously checking whether the first key slot position information of the high-frequency information corresponding to the target intention information in the cache has target key slot position information matched with the second key slot position information. If the target key slot position information exists, determining that high-frequency information corresponding to the target key slot position information is matched with second user character information, namely the high-frequency information corresponding to the target key slot position information is the target high-frequency information, and further accurately determining whether the second user character information is the high-frequency information, if so, determining corresponding reply voice information through the cached first reply voice information, so that the voice synthesis process is reduced, the burden of a platform server is further reduced, the user terminal can quickly obtain the reply voice information, and the use experience of a user is improved.
For example, if the second user voice information is "how the weather is today", the platform server performs voice recognition processing on the second user voice information to obtain second user text information, then performs NLU processing on the second user text information to obtain second field information of the second user text information as "weather", obtain second intention information of the second user text information as "inquiry about weather conditions", obtain second key slot position information of the second user text information as "today", and then the platform server checks whether target field information matched with "weather" exists in the first field information of the high-frequency information in the cache. If the target domain information exists, whether the first intention information of the high-frequency information corresponding to the plurality of target domain information in the cache has target intention information matched with the query about the weather condition is continuously checked. If the target intention information exists, whether the first key slot position information of the high-frequency information corresponding to the plurality of target intention information in the cache has the target key slot position information matched with today is continuously checked. If the target key slot position information exists, determining that high-frequency information corresponding to the target key slot position information is matched with second user character information, namely that first field information of the target high-frequency information is 'weather', first intention information is 'inquiry about weather conditions', and first key slot position information is 'today'.
And finally, the platform server acquires second reply voice information corresponding to the target high-frequency information in the cache and sends the second reply voice information to the user terminal, wherein the user terminal acquires the second reply voice information and plays the second reply voice information.
In the information processing method provided in this embodiment, a second user voice message sent by the user terminal is obtained, then a voice recognition process is performed on the second user voice message, a second user text message is determined, then a natural language understanding process is performed on the second user text message, a second domain message, a second intention message and a second key slot message corresponding to the second user text message are determined, then based on the second domain message, the second intention message and the second key slot message, whether a target high frequency message corresponding to the second user text message exists in the high frequency message is determined, and finally, if the target high frequency message exists, a second reply voice message corresponding to the target high frequency message is obtained from the cached first reply voice message, and the second reply voice message is sent to the user terminal, the target high-frequency information can be accurately acquired from the cache by the second field information, the second intention information and the second key slot position information, and then the reply voice information corresponding to the target high-frequency information is determined, so that the voice synthesis process is reduced, the burden of the platform server is reduced, the user terminal can quickly acquire the reply voice information, and the use experience of a user is improved.
A fifth embodiment of the information processing method of the present invention is proposed based on the fourth embodiment, and in this embodiment, step 404 is followed by:
step 501, if the target high-frequency information does not exist, performing natural language generation processing on the second user character information to obtain user reply information, and sending the user reply information to a user terminal, wherein the user terminal receives the user reply information;
step 502, if a speech synthesis request corresponding to the user reply message sent by the user terminal is received, performing speech synthesis processing on the user reply message to obtain a third reply speech message.
In this embodiment, the platform server checks whether the first domain information of the high-frequency information in the cache has target domain information matched with the second domain information. And if the target domain information matched with the second domain information does not exist in the first domain information of the high-frequency information, performing natural language generation processing on the second user character information to obtain user reply information.
Or the platform server checks whether the first domain information of the high-frequency information in the cache has target domain information matched with the second domain information. And if the target field information exists, continuously checking whether the first intention information of the high-frequency information corresponding to the target field information in the cache has target intention information matched with the second intention information. And if the target intention information does not exist, performing natural language generation processing on the second user character information to obtain user reply information, and performing natural language generation processing on the second user character information to obtain the user reply information.
Or the platform server checks whether the first domain information of the high-frequency information in the cache has target domain information matched with the second domain information. And if the target field information exists, continuously checking whether the first intention information of the high-frequency information corresponding to the target field information in the cache has target intention information matched with the second intention information. If the target intention information exists, whether the target key slot position information matched with the second key slot position information exists in the first key slot position information of the high-frequency information corresponding to the target intention information in the cache or not is continuously checked. And if the target key slot position information does not exist, performing natural language generation processing on the second user character information to obtain user reply information.
And then, the platform server sends the user reply information to the user terminal, the user terminal inquires whether a fixed voice reply corresponding to the user reply information exists in the local cache, and if the fixed voice reply corresponding to the user reply information exists, the user terminal plays the fixed voice reply corresponding to the user reply information. And if the user terminal does not have the fixed voice reply corresponding to the user reply information, the user terminal sends a voice synthesis request corresponding to the user reply information to the platform server, and the TTS server in the platform server performs voice synthesis processing on the user reply information to obtain third reply voice information.
And finally, the platform server sends third reply voice information to the user terminal, and then the user terminal acquires the third reply voice information and plays the third reply voice information.
According to the information processing method provided by the embodiment, if the target high-frequency information does not exist, natural language generation processing is performed on the second user character information to obtain user reply information, and the user reply information is sent to the user terminal, wherein the user terminal receives the user reply information, then if a voice synthesis request corresponding to the user reply information sent by the user terminal is received, voice synthesis processing is performed on the user reply information to obtain third reply voice information, when the target high-frequency information and the fixed voice reply do not exist, the user terminal can obtain the reply voice information through voice synthesis, and the use experience of a user is improved.
Based on the above respective embodiments, a sixth embodiment of the information processing method of the present invention is proposed, in this embodiment, the information processing method further includes:
step 601, obtaining user operation information sent by the user terminal;
step 602, if the user operation information is activation operation information, sending a first fixed reply voice to a user terminal, wherein the user terminal stores the first fixed reply voice;
step 603, if the user operation information is tone color modification information, sending a second fixed reply voice to the user terminal, wherein the user terminal stores the second fixed reply voice.
In this embodiment, the user operation information sent by the user terminal is obtained, where the user operation information may be activation operation information or tone modification information. When the user operation information is the activation operation information, a first fixed reply voice is sent to the user terminal, wherein the user terminal stores the first fixed reply voice, for example, the first fixed reply voice can be 'good', 'owner' and the like, and when the user operation information is the tone color modification information, a second fixed reply voice is sent to the user terminal, wherein the user terminal stores the second fixed reply voice, wherein the second fixed reply voice can be a change of the tone color of the first fixed reply voice, for example, the first fixed reply voice is a male tone color, and a change of the male tone color of the first fixed reply voice to a female tone color can be used as the second fixed reply voice.
In the information processing method provided by this embodiment, user operation information sent by the user terminal is acquired, and then, if the user operation information is activation operation information, a first fixed reply voice is sent to the user terminal, where the user terminal stores the first fixed reply voice, and then, if the user operation information is tone modification information, a second fixed reply voice is sent to the user terminal, where the user terminal stores the second fixed reply voice, and through the activation operation information or the tone modification information of the user operation information, the fixed reply voice is stored at the user terminal and the tone of the fixed reply voice is switched, so that the use experience of a user is improved.
The present invention also provides an information processing apparatus applied to a platform server, and referring to fig. 3, the information processing apparatus includes:
the system comprises an acquisition module 10, a platform server and a processing module, wherein the acquisition module is used for acquiring all first user voice information within a preset time, and the first user voice information is sent to the platform server by a user terminal;
a first processing module 20, configured to pre-process the first user voice information, and determine high-frequency information;
a second processing module 30, configured to perform natural language generation processing on the high-frequency information, and determine first recovered text information;
and the cache module 40 is configured to perform speech synthesis processing on the first recovered text message, determine a first recovered speech message, and cache the first recovered speech message.
Further, the first processing module 20 is further configured to:
performing voice recognition processing on the voice information of the first user to determine word information of the first user;
performing natural language understanding processing on the first user character information, and determining first field information, first intention information and first key slot position information corresponding to each first user character information;
first collar corresponding to each first user text information
And determining high-frequency information according to the domain information, the first intention information and the first key slot position information.
Further, the first processing module 20 is further configured to:
determining that the first user character information with the same first field information, first intention information and first key slot position information is the same type character information, and recording the number of the same type character information;
if the number reaches a first preset number, determining that the same type of character information is high-frequency information; or sequencing the same type character information based on the sequence of the number from large to small to obtain a sequencing result, and taking a second preset number of same type character information which is sequenced in the sequencing result in front as the high-frequency information.
Further, the information processing apparatus is further configured to:
acquiring second user voice information sent by the user terminal;
performing voice recognition processing on the voice information of the second user to determine second user character information;
performing natural language understanding processing on the second user character information, and determining second field information, second intention information and second key slot position information corresponding to the second user character information;
determining whether target high-frequency information corresponding to second user character information exists in the high-frequency information or not based on the second field information, the second intention information and the second key slot position information;
and if the target high-frequency information exists, acquiring second reply voice information corresponding to the target high-frequency information from the cached first reply voice information, and sending the second reply voice information to the user terminal.
Further, the information processing apparatus is further configured to:
if target domain information matched with the second domain information exists in the first domain information of the high-frequency information, determining whether target intention information matched with the second intention information exists in the first intention information of the high-frequency information corresponding to the target domain information;
and if the first intention information of the high-frequency information corresponding to the target field information has target intention information matched with the second intention information, determining whether the first key slot position information of the high-frequency information corresponding to the target intention information has target key slot position information matched with the second key slot position information, wherein if the first key slot position information of the high-frequency information corresponding to the target intention information has target key slot position information matched with the second key slot position information, determining that the target high-frequency information exists.
Further, the information processing apparatus is further configured to:
if the target high-frequency information does not exist, performing natural language generation processing on the second user character information to obtain user reply information, and sending the user reply information to a user terminal, wherein the user terminal receives the user reply information;
and if a voice synthesis request corresponding to the user reply message sent by the user terminal is received, performing voice synthesis processing on the user reply message to obtain a third reply voice message.
Further, the information processing apparatus is further configured to:
acquiring user operation information sent by the user terminal;
if the user operation information is activation operation information, sending a first fixed reply voice to a user terminal, wherein the user terminal stores the first fixed reply voice;
and if the user operation information is tone color modification information, sending a second fixed reply voice to the user terminal, wherein the user terminal stores the second fixed reply voice.
The methods executed by the program units can refer to the embodiments of the information processing method of the present invention, and are not described herein again.
Furthermore, an embodiment of the present invention also provides an information processing apparatus, including: a memory, a processor and an information processing program stored on the memory and executable on the processor, the information processing program, when executed by the processor, implementing the steps of the information processing method as described above.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, on which an information processing program is stored, and the information processing program, when executed by a processor, implements the steps of the information processing method described above.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or system in which the element is included.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling an information processing apparatus (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the methods according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present invention or directly or indirectly applied to other related technical fields are also included in the scope of the present invention.

Claims (10)

1. An information processing method, applied to a platform server, the information processing method further comprising:
acquiring all first user voice information within preset time, wherein the first user voice information is sent to a platform server by a user terminal;
preprocessing the first user voice information to determine high-frequency information;
performing natural language generation processing on the high-frequency information, and determining first reply character information;
and carrying out voice synthesis processing on the first reply text message, determining a first reply voice message and caching the first reply voice message.
2. The information processing method of claim 1, wherein the step of preprocessing the first user voice information to determine high frequency information comprises:
performing voice recognition processing on the voice information of the first user to determine word information of the first user;
performing natural language understanding processing on the first user character information, and determining first field information, first intention information and first key slot position information corresponding to each first user character information;
and determining high-frequency information based on the first field information, the first intention information and the first key slot position information corresponding to each first user character information.
3. The information processing method according to claim 2, wherein the step of determining high-frequency information based on the first domain information, the first intention information, and the first key slot information corresponding to each first user text information includes:
determining that the first user character information with the same first field information, first intention information and first key slot position information is the same type character information, and recording the number of the same type character information;
if the number reaches a first preset number, determining that the same type of character information is high-frequency information; or sequencing the same type character information based on the sequence of the number from large to small to obtain a sequencing result, and taking a second preset number of same type character information which is sequenced in the sequencing result in front as the high-frequency information.
4. The information processing method according to claim 1, wherein the information processing method further comprises:
acquiring second user voice information sent by the user terminal;
performing voice recognition processing on the voice information of the second user to determine second user character information;
performing natural language understanding processing on the second user character information, and determining second field information, second intention information and second key slot position information corresponding to the second user character information;
determining whether target high-frequency information corresponding to second user text information exists in the high-frequency information or not based on the second field information, the second intention information and the second key slot position information;
and if the target high-frequency information exists, acquiring second reply voice information corresponding to the target high-frequency information from the cached first reply voice information, and sending the second reply voice information to the user terminal.
5. The information processing method according to claim 4, wherein the step of determining whether target high-frequency information corresponding to second user text information exists in the high-frequency information based on the second domain information, the second intention information, and the second key slot information includes:
if target domain information matched with the second domain information exists in the first domain information of the high-frequency information, determining whether target intention information matched with the second intention information exists in the first intention information of the high-frequency information corresponding to the target domain information;
and if the first intention information of the high-frequency information corresponding to the target field information has target intention information matched with the second intention information, determining whether the first key slot position information of the high-frequency information corresponding to the target intention information has target key slot position information matched with the second key slot position information, wherein if the first key slot position information of the high-frequency information corresponding to the target intention information has target key slot position information matched with the second key slot position information, determining that the target high-frequency information exists.
6. The information processing method according to claim 4, wherein after the step of determining whether target high-frequency information corresponding to second user text information exists in the high-frequency information based on the second domain information, the second intention information, and the second key slot information, the information processing method further comprises:
if the target high-frequency information does not exist, performing natural language generation processing on the second user character information to obtain user reply information, and sending the user reply information to a user terminal, wherein the user terminal receives the user reply information;
and if a voice synthesis request corresponding to the user reply message sent by the user terminal is received, performing voice synthesis processing on the user reply message to obtain a third reply voice message.
7. The information processing method according to any one of claims 1 to 6, characterized by further comprising:
acquiring user operation information sent by the user terminal;
if the user operation information is activation operation information, sending a first fixed reply voice to a user terminal, wherein the user terminal stores the first fixed reply voice;
and if the user operation information is tone color modification information, sending a second fixed reply voice to the user terminal, wherein the user terminal stores the second fixed reply voice.
8. An information processing apparatus characterized by comprising:
the system comprises an acquisition module, a platform server and a processing module, wherein the acquisition module is used for acquiring all first user voice information within preset time, and the first user voice information is sent to the platform server by a user terminal;
the first processing module is used for preprocessing the first user voice information and determining high-frequency information;
the second processing module is used for performing natural language generation processing on the high-frequency information and determining first reply character information;
and the cache module is used for carrying out voice synthesis processing on the first recovered text information, determining first recovered voice information and caching the first recovered voice information.
9. An information processing apparatus characterized by comprising: memory, processor and information processing program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the information processing method according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that an information processing program is stored thereon, which when executed by a processor implements the steps of the information processing method according to any one of claims 1 to 7.
CN202210639832.2A 2022-06-06 2022-06-06 Information processing method, device, equipment and computer readable storage medium Pending CN114927134A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210639832.2A CN114927134A (en) 2022-06-06 2022-06-06 Information processing method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210639832.2A CN114927134A (en) 2022-06-06 2022-06-06 Information processing method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN114927134A true CN114927134A (en) 2022-08-19

Family

ID=82813342

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210639832.2A Pending CN114927134A (en) 2022-06-06 2022-06-06 Information processing method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN114927134A (en)

Similar Documents

Publication Publication Date Title
US6415207B1 (en) System and method for automatically providing vehicle status information
CN107731229B (en) Method and apparatus for recognizing speech
CN109996026B (en) Video special effect interaction method, device, equipment and medium based on wearable equipment
US20080126077A1 (en) Dynamic modification of a messaging language
CN109599130B (en) Sound reception method, device and storage medium
US20210124967A1 (en) Method and apparatus for sample labeling, and method and apparatus for identifying damage classification
US20090299957A1 (en) Methods, apparatuses, and computer program products for providing an audible interface to publish/subscribe services
CN110956955B (en) Voice interaction method and device
CN107103899B (en) Method and apparatus for outputting voice message
CN111914057A (en) Method and device for detecting and filtering sensitive words of customer service system
CN109902726B (en) Resume information processing method and device
CN115202599A (en) Screen projection display method and related device
CN111325880A (en) Bluetooth-based positioning door opening method and device, intelligent terminal and storage medium
CN112579031A (en) Voice interaction method and system and electronic equipment
CN109784947A (en) Worksheet processing method, equipment, storage medium and device are rescued after sale
CN114927134A (en) Information processing method, device, equipment and computer readable storage medium
US11115905B2 (en) Method and apparatus for publishing information at wireless routing device end
CN111225115B (en) Information providing method and device
CN113823282A (en) Voice processing method, system and device
CN111726461B (en) Telephone conversation method, apparatus, device and computer readable storage medium
CN114861056A (en) Information pushing method and device, electronic equipment and storage medium
CN109379704B (en) Method, device and equipment for correcting regional information of short message and storage medium
CN112071313A (en) Voice broadcasting method and device, electronic equipment and medium
CN116028628B (en) Customer service robot image generation method and device, terminal equipment and storage medium
CN111639167B (en) Task dialogue method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination