CN112363861A - Voice interaction method and device for subway ticket purchasing - Google Patents

Voice interaction method and device for subway ticket purchasing Download PDF

Info

Publication number
CN112363861A
CN112363861A CN202011306399.8A CN202011306399A CN112363861A CN 112363861 A CN112363861 A CN 112363861A CN 202011306399 A CN202011306399 A CN 202011306399A CN 112363861 A CN112363861 A CN 112363861A
Authority
CN
China
Prior art keywords
voice
result
information
software process
voice software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011306399.8A
Other languages
Chinese (zh)
Inventor
宋泽
甘津瑞
邓建凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AI Speech Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Priority to CN202011306399.8A priority Critical patent/CN112363861A/en
Publication of CN112363861A publication Critical patent/CN112363861A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0715Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a system implementing multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention discloses a voice interaction method and a voice interaction device for subway ticket purchasing, wherein the method comprises the following steps: responding to the acquired command for starting the daemon process, and detecting the process state of the voice software; if the voice software process is in the non-existing state, starting the voice software process; if the voice software process is in the existing state, judging whether the voice software process is overtime; and if the voice software process is overtime, detecting the state of the voice software process again. The noise interference is effectively inhibited by detecting the mouth shape change through the human face and processing the microphone signal; in addition, the voice software service not only provides a language switching function and meets the requirements of special scenes such as subway ticket buying, but also communicates with the equipment terminal in a serial port mode, so that the purpose of plug and play can be achieved, and hardware facilities of aged voice software can be conveniently replaced; finally, the voice software service also supports an off-line function, and can still realize the ticket purchasing function under the condition that the network is not communicated.

Description

Voice interaction method and device for subway ticket purchasing
Technical Field
The invention belongs to the technical field of voice interaction, and particularly relates to a voice interaction method and device for subway ticket purchasing.
Background
With the development of artificial intelligence, many Information Technology (IT) personnel are concerned more and more about the research of artificial intelligence. For example, Face Detection (FD), Voice end Detection (VAD), Speech recognition (ASR), Natural Language Processing (NLP), and multithreading are used to organize these Speech capabilities, so as to implement Speech interaction, and are often used in scenes such as subway ticket buying, consulting systems, and bank ATMs.
Because the voice recognition, voice endpoint detection and natural language understanding technologies all provide single voice capability, the functions are single, and developers are required to logically organize the technologies to realize voice interaction; in addition, most of voice interaction software on the market at present adopts a multithreading technology, a plurality of threads are established in one process for modular management, and once the process runs, the whole voice capability cannot be used; in addition, whether a person can not effectively suppress noise within a specified range is detected by simply detecting the distance, so that the ticket purchasing of a user is interfered; the off-line voice interaction provided by the voice service on the market is weak and does not support multi-language voice interaction.
Disclosure of Invention
The embodiment of the invention provides a voice interaction method and device for subway ticket purchasing, which are used for solving at least one of the technical problems.
In a first aspect, an embodiment of the present invention provides a voice interaction method for subway ticket booking, including: responding to the acquired command for starting the daemon process, and detecting the process state of the voice software; if the voice software process is in the non-existing state, starting the voice software process; if the voice software process is in the existing state, judging whether the voice software process is overtime; and if the voice software process is overtime, detecting the state of the voice software process again.
In a second aspect, an embodiment of the present invention provides a voice interaction device for ticket booking for a subway, including: the first detection module is configured to respond to the acquired daemon starting instruction and detect the state of the voice software process; the starting module is configured to start the voice software process if the voice software process is in an absent state; the judging module is configured to judge whether the voice software process is overtime or not if the voice software process is in the existing state; and the second detection module is configured to detect the state of the voice software process again if the voice software process is overtime.
In a third aspect, an electronic device is provided, comprising: the system comprises at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the multi-intent recognition training or using method of any of the embodiments of the present invention.
In a fourth aspect, the present invention also provides a computer program product, which includes a computer program stored on a non-volatile computer-readable storage medium, the computer program including program instructions, which, when executed by a computer, cause the computer to perform the steps of the multi-intent recognition training or using method of any one of the embodiments of the present invention.
The voice software service of the method and the device supports the integrated operation of face detection, audio acquisition, voice recognition, semantic processing and dialogue processing, thereby effectively reducing the workload of developers; secondly, the noise interference is effectively inhibited through the human face detection mouth shape change and the microphone signal processing; in addition, the voice software service not only provides a language switching function and meets the requirements of special scenes such as subway ticket buying, but also communicates with the equipment terminal in a serial port mode, so that the purpose of plug and play can be achieved, and hardware facilities of aged voice software can be conveniently replaced; finally, the voice software service also supports an off-line function, and can still realize the ticket purchasing function under the condition that the network is not communicated.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of a voice interaction method for subway ticket purchasing according to an embodiment of the present invention;
fig. 2 is a flowchart of another voice interaction method for subway ticket purchasing according to an embodiment of the present invention;
fig. 3 is a flowchart of a voice interaction method for subway ticket purchasing according to another embodiment of the present invention;
fig. 4 is a flowchart of another voice interaction method for subway ticket purchasing according to an embodiment of the present invention;
fig. 5 is a flowchart of a voice interaction method for subway ticket purchasing according to an embodiment of the present invention;
fig. 6 is a logic flow diagram of voice interaction for subway ticket purchasing according to an embodiment of the present invention;
fig. 7 is a block diagram of a voice interaction apparatus for purchasing tickets for a subway according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, which shows a flowchart of an embodiment of a voice interaction method for ticket booking for a subway according to the present application, the voice interaction method for ticket booking for a subway according to the present embodiment may be applied to a terminal having a real-time voice interaction function.
As shown in fig. 1, the voice interaction method for subway ticket purchasing of the embodiment includes the following steps:
step 101, responding to the acquired command of starting the daemon process, and detecting the state of the voice software process.
In this embodiment, for step 101, when the user needs to perform voice interaction with the device end to purchase a ticket by voice, the voice interaction device detects the process state of the voice software in response to the acquired start daemon instruction, so as to monitor the voice software of the device end in real time.
And step 102, if the voice software process is in the non-existing state, starting the voice software process.
In this embodiment, for step 102, when the voice interaction apparatus detects that the voice software process at the device side does not exist, the voice interaction apparatus starts the voice software process.
And 103, if the voice software process is in the existing state, judging whether the voice software process is overtime.
In this embodiment, in step 103, if the voice software process is in the existing state, the voice interaction apparatus determines whether the voice software process is overtime, so as to avoid the problem that the connection is overtime and cannot be used although the voice software process is opened.
And step 104, if the voice software process is overtime, detecting the state of the voice software process again.
In this embodiment, in step 103, if the voice software process is over time, the voice interaction apparatus detects the state of the voice software process again. Therefore, the problems of non-starting and connection overtime of the voice software process can be avoided through repeated detection.
In the method, a multi-process technology is adopted to monitor whether the voice software process exists or not, if not, the voice module is restarted, and the problems that the voice software process is easy to generate downtime and the robustness is poor due to the fact that a user needs to manually restart the voice program when the downtime occurs are effectively solved.
Referring to fig. 2, a flowchart of another voice interaction method for subway ticket purchasing according to an embodiment of the present application is shown. The flow chart is primarily a flow chart of steps further defined for the additional flow of the flow chart 1.
As shown in fig. 2, in step 201, in response to an acquired instruction for starting a voice software process, performing face detection and determining whether a face is detected;
in step 202, if a face is detected, a microphone is turned on and whether lip movement exists in the current user is continuously judged based on the face detection;
in step 203, if there is lip movement, collecting audio information based on a microphone;
in step 204, in response to the acquired audio information, determining whether a human voice exists;
in step 205, if there is a human voice, audio recognition is performed on the audio information, and the obtained recognition result is output.
In this embodiment, for step 201, the voice interaction apparatus performs face detection and determines whether a face is detected in response to the acquired instruction to start the voice software process. Then, in step 202, if a face is detected, the voice interaction device turns on the microphone and continues to determine whether there is lip movement of the current user based on the face detection. Thereafter, for step 203, if there is lip movement, the voice interaction device collects audio information based on the microphone. Thereafter, for step 204, the voice interaction apparatus determines whether human voice exists in response to the acquired audio information. Then, in step 205, if there is a voice, the voice interaction apparatus performs audio recognition on the audio information and outputs the obtained recognition result.
The scheme that this embodiment provided adopts the face detection module not only detects whether someone, judges mouth type moreover to this judges whether speaks, if the mouth opens, then opens voice interaction, otherwise does not open voice interaction, and more effective collection is interacted, suppresses the noise.
Referring to fig. 3, a flowchart of another voice interaction method for subway ticket purchasing according to an embodiment of the present application is shown. The flowchart mainly shows a more limited procedure for "if there is a human voice, audio recognition is performed on the audio information and the obtained recognition result is output" in step 205.
As shown in fig. 3, in step 301, if there is a voice, it is detected whether to network;
in step 302, if networking is detected, online audio recognition is performed on the audio information, and an obtained online recognition result is output.
In this embodiment, for step 301, if there is a voice, the voice interaction apparatus detects whether to connect to the network. Then, in step 302, if networking is detected, the voice interaction device performs online audio recognition on the audio information and outputs an obtained online recognition result. In this way, online identification of audio information can be achieved.
Referring to fig. 4, a flowchart of another voice interaction method for subway ticket purchasing according to an embodiment of the present application is shown. The flowchart mainly shows a more limited procedure for "if there is a human voice, audio recognition is performed on the audio information and the obtained recognition result is output" in step 205.
As shown in fig. 4, in step 401, in response to the obtained online recognition result, outputting an online semantic result based on an online semantic processing model;
in step 402, responding to the obtained online semantic result, and outputting an online dialogue result based on an online dialogue processing model;
in step 403, determining whether ticket purchasing information and/or language information exists in the online conversation result;
in step 404, if the ticket purchasing information and/or language information exists in the online dialog result, the ticket purchasing information and/or language information is analyzed to output a ticket purchasing information result and/or language information result.
In this embodiment, for step 401, the voice interaction apparatus outputs an online semantic result based on the online semantic processing model in response to the acquired online recognition result. Thereafter, for step 402, the voice interaction device outputs an online dialog result based on the online dialog processing model in response to the obtained online semantic result. Thereafter, in step 403, the voice interactive apparatus determines whether ticket purchasing information and/or language information exists in the online dialog result. Then, in step 404, if the online dialog result includes the ticket purchasing information and/or language information, the voice interaction apparatus parses the ticket purchasing information and/or language information to output a result of the ticket purchasing information and/or language information.
The scheme provided by the embodiment adopts online voice interaction, and can analyze the ticket purchasing information and/or language information, so that the ticket purchasing operation and/or the multi-language switching operation are/is completed.
In some optional embodiments, if the ticket purchasing information and/or language information does not exist in the online conversation result, the audio information is collected again until the ticket purchasing information and/or language information is collected, so that continuous voice interaction is realized.
In some optional embodiments, if it is detected that the network is not connected, performing offline audio recognition on the audio information to obtain an offline recognition result. In this way, voice interaction can be made to be suitable for multiple scenes.
Please refer to fig. 5, which shows a flowchart of a voice interaction method for subway ticket purchasing according to an embodiment of the present application. The flowchart mainly shows a more limited procedure for "if there is a human voice, audio recognition is performed on the audio information and the obtained recognition result is output" in step 205.
As shown in fig. 5, in step 501, in response to the obtained offline recognition result, an offline semantic result is output based on the offline semantic processing model;
in step 502, in response to the obtained offline semantic result, outputting an offline dialogue result based on the offline dialogue processing model;
in step 503, determining whether there is ticket purchasing information and/or language information in the offline conversation result;
in step 504, if the offline dialogue result includes the ticket purchasing information and/or language information, the ticket purchasing information and/or language information is analyzed to output a ticket purchasing information result and/or language information result.
In this embodiment, for step 501, the voice interaction apparatus outputs an offline semantic result based on the offline semantic processing model in response to the acquired offline recognition result. Thereafter, for step 502, the voice interaction device outputs an offline dialogue result based on the offline dialogue processing model in response to the acquired offline semantic result. Then, in step 503, the voice interaction apparatus determines whether the offline dialogue result includes ticket purchasing information and/or language information. Then, in step 504, if the offline dialog result includes the ticket-buying information and/or language information, the voice interaction apparatus parses the ticket-buying information and/or language information to output the result of the ticket-buying information and/or language information.
In the scheme provided by the embodiment, the off-line voice interaction capability is improved, and the ticket purchasing information and/or the language information can be analyzed from ears, so that the ticket purchasing operation and/or the multi-language switching operation can be completed.
It should be noted that the above method steps are not intended to limit the execution order of the steps, and in fact, some steps may be executed simultaneously or in the reverse order of the steps, which is not limited herein.
The following description is provided to enable those skilled in the art to better understand the present disclosure by describing some of the problems encountered by the inventors in implementing the present disclosure and by describing one particular embodiment of the finally identified solution.
The inventor finds that the defects in the prior art are mainly caused by the following reasons in the process of implementing the application:
1) only a single voice capability is provided, so that more development logic and heavier tasks are caused;
2) the multithreading technology is adopted, and once the process to which the thread belongs runs, the whole voice software service cannot run;
3) the face detection only detects whether a face exists, if the face exists, voice interaction is started, and noise interference cannot be avoided;
4) due to the limitation of voice technology, the multi-language off-line conversation of other schemes can not meet the performance index of the customer.
The inventor also found that: since other vendors do not have a full-link dialog platform (DUI), it is difficult to implement full-link voice interaction.
Due to the adoption of the multithreading technology, high-quality codes are required, the downtime behavior is avoided as much as possible, if the downtime phenomenon occurs, the voice program is manually restarted by a client, and the robustness is poor.
Due to the technical limitation, the face detection does not detect the mouth shape change, only uses an irregular linear microphone array to inhibit noise from angles and ranges, and has poor effect.
At present, most schemes adopt a hybrid model to support multi-language voice conversation, and the scheme configures multi-language speaking on a DUI platform, so that the requirements of customers can be easily met.
As shown in fig. 6, the scheme of the application provides voice ticket-buying software service, is mainly used in a subway ticket-buying scene, not only provides an online voice ticket-buying function, but also supports offline ticket-buying, provides a serial port to read ticket-buying information, allows an equipment end to be externally connected with voice service software, and does not affect the original performance of ticket-buying equipment. The main functions are as follows:
function one: the daemon monitors the voice software service.
The method comprises the following steps: and starting a daemon process.
Step two: voice software services are checked.
Step three: judging whether a voice service software process exists or not, and jumping to the fourth step if the voice service software process exists; otherwise, the voice software service is started.
Step four: a check voice service timeout timer is set.
Step five: judging whether the timer is overtime, and jumping to the second step if the timer is overtime; otherwise, go to step five.
And a second function: through the voice software service, the equipment side can acquire the recognition result in an off-line or on-line scene.
The method comprises the following steps: and starting a voice software process and running voice service.
Step two: the face detection module detects a face and acquires a face state.
Step three: judging whether a face exists according to the face state, and entering the next step if the face exists; otherwise, the human face is continuously detected.
Step four: the linear microphone array module detects the microphones and acquires the microphone states.
Step five: judging whether the microphone is opened or not according to the state of the microphone, and if so, entering the next step; otherwise, the microphone is turned on.
Step six: the mouth type module detects the mouth type and obtains the state of the mouth type.
Step seven: judging whether to open the mouth according to the mouth shape state, and if so, carrying out the next step; otherwise, the mouth shape is continuously monitored.
Step eight: the audio acquisition module acquires audio.
Step nine: and sending the collected audio to an information processing module for echo cancellation and noise reduction.
Step ten: and the processed audio is sent to a voice endpoint detection module to acquire an endpoint state.
Step eleven: judging whether sound exists or not according to the end point state, and entering the next step if the sound exists; otherwise, voice endpoint detection continues.
Step twelve: the network monitoring module detects the network and acquires the network state.
Step thirteen: judging whether a network exists according to the network state, and jumping to the step fifteen if the network exists; otherwise, the next step is carried out.
Fourteen steps: and sending the audio thrown out by the voice endpoint kernel into offline recognition, and then jumping to the step sixteen.
Step fifteen: and sending the audio thrown by the voice endpoint kernel into online recognition.
Sixthly, the steps are as follows: and sending the identification result to a serial port input module.
Seventeen steps: and the equipment side acquires the identification result from the serial port output module.
And function III: through the voice software service, under an off-line or on-line scene, the equipment side can obtain the language switching information result.
The method comprises the following steps: and starting a voice software process and running voice service.
Step two: the face detection module detects a face and acquires a face state.
Step three: judging whether a face exists according to the face state, and entering the next step if the face exists; otherwise, the human face is continuously detected.
Step four: the linear microphone array module detects the microphones and acquires the microphone states.
Step five: judging whether the microphone is opened or not according to the state of the microphone, and if so, entering the next step; otherwise, the microphone is turned on.
Step six: the mouth type module detects the mouth type and obtains the state of the mouth type.
Step seven: judging whether to open the mouth according to the mouth shape state, and if so, carrying out the next step; otherwise, the mouth shape is continuously monitored.
Step eight: the audio acquisition module acquires audio.
Step nine: and sending the collected audio to an information processing module for echo cancellation and noise reduction.
Step ten: and the processed audio is sent to a voice endpoint detection module to acquire an endpoint state.
Step eleven: judging whether sound exists or not according to the end point state, and entering the next step if the sound exists; otherwise, voice endpoint detection continues.
Step twelve: the network monitoring module detects the network and acquires the network state.
Step thirteen: judging whether a network exists according to the network state, and jumping to a seventeenth step if the network exists; otherwise, the next step is carried out.
Fourteen steps: and sending the audio thrown out by the voice endpoint kernel into offline recognition, and outputting an offline recognition result.
Step fifteen: and sending the result of the offline recognition into offline semantic processing, and outputting a semantic result.
Sixthly, the steps are as follows: and sending the result of the offline semantic meaning to offline dialogue processing, outputting a dialogue result, and then jumping to the step twenty.
Seventeen steps: and sending the audio thrown by the voice endpoint kernel into online recognition, and outputting an online recognition result.
Eighteen steps: and sending the result of online identification into online semantic processing, and outputting a semantic result.
Nineteen steps: and sending the result of the online semantics into online conversation processing, and outputting a conversation result.
Twenty steps: judging whether the language information result is the language information result, if so, sending the language information result to a serial port input module; otherwise, jumping to step eight.
Twenty one: and the equipment side obtains language information results from the serial port output module.
And the function is four: through voice software service, under an off-line or on-line scene, the equipment side can acquire ticket purchasing information results.
The method comprises the following steps: and starting a voice software process and running voice service.
Step two: the face detection module detects a face and acquires a face state.
Step three: judging whether a face exists according to the face state, and entering the next step if the face exists; otherwise, the human face is continuously detected.
Step four: the linear microphone array module detects the microphones and acquires the microphone states.
Step five: judging whether the microphone is opened or not according to the state of the microphone, and if so, entering the next step; otherwise, the microphone is turned on.
Step six: the mouth type module detects the mouth type and obtains the state of the mouth type.
Step seven: judging whether to open the mouth according to the mouth shape state, and if so, carrying out the next step; otherwise, the mouth shape is continuously monitored.
Step eight: the audio acquisition module acquires audio.
Step nine: and sending the collected audio to an information processing module for echo cancellation and noise reduction.
Step ten: and the processed audio is sent to a voice endpoint detection module to acquire an endpoint state.
Step eleven: judging whether sound exists or not according to the end point state, and entering the next step if the sound exists; otherwise, voice endpoint detection continues.
Step twelve: the network monitoring module detects the network and acquires the network state.
Step thirteen: judging whether a network exists according to the network state, and jumping to a seventeenth step if the network exists; otherwise, the next step is carried out.
Fourteen steps: and sending the audio thrown out by the voice endpoint kernel into offline recognition, and outputting an offline recognition result.
Step fifteen: and sending the result of the offline recognition into offline semantic processing, and outputting a semantic result.
Sixthly, the steps are as follows: and sending the result of the offline semantic meaning to offline dialogue processing, outputting a dialogue result, and then jumping to the step twenty.
Seventeen steps: and sending the audio thrown by the voice endpoint kernel into online recognition, and outputting an online recognition result.
Eighteen steps: and sending the result of online identification into online semantic processing, and outputting a semantic result.
Nineteen steps: and sending the result of the online semantics into online conversation processing, and outputting a conversation result.
Twenty steps: judging whether the ticket purchasing information result is the ticket purchasing information result, and if so, sending the ticket purchasing information result to a serial port input module; otherwise, jumping to step eight.
Twenty one: and the equipment side acquires the ticket purchasing information result from the serial port output module.
In conclusion, the voice software service supports the integrated operation of face detection, audio acquisition, voice recognition, semantic processing and dialogue processing, and effectively reduces the workload of developers; secondly, the noise interference is effectively inhibited through the human face detection mouth shape change and the microphone signal processing; in addition, the voice software service not only provides a language switching function and meets the requirements of special scenes such as subway ticket buying, but also communicates with the equipment terminal in a serial port mode, so that the purpose of plug and play can be achieved, and hardware facilities of aged voice software can be conveniently replaced; finally, the voice software service also supports an off-line function, and can still realize the ticket purchasing function under the condition that the network is not communicated.
The inventors have also adopted the following alternatives in the course of carrying out the present application and summarized the advantages and disadvantages of the alternatives.
Alternative scheme: the communication mode of the equipment side and the voice software service adopts a websocket protocol, the voice program is a service side, the equipment side is a client side, and related information is obtained through the websocket
The disadvantages are as follows: because the voice software service program and the program run by the client run on two different machines, communication is performed through wbsocket, and the IP address of the voice software service cannot be automatically acquired by the client equipment side.
Beta version: the daemon process is removed, the communication among the modules is organized by adopting multiple processes, the processes monitor each other, and if any process runs, other processes restart the service.
The disadvantages are as follows: because the processes have independent memory units in the execution process, the voice software services adopt multi-process processes, and the memory can not be shared, thereby greatly reducing the running efficiency of the program.
Referring to fig. 7, a block diagram of a voice interaction apparatus for subway ticket purchasing according to an embodiment of the present invention is shown.
As shown in fig. 7, the voice interaction apparatus 600 includes a first detection module 610, an activation module 620, a determination module 630, and a second detection module 640.
The first detection module 610 is configured to respond to the acquired daemon starting instruction and detect the state of the voice software process; a starting module 620 configured to start the voice software process if the voice software process is in an absent state; a determining module 630, configured to determine whether the voice software process is overtime if the voice software process is in an existing state; the second detecting module 640 is configured to detect the state of the voice software process again if the voice software process is over time.
It should be understood that the modules recited in fig. 7 correspond to various steps in the methods described with reference to fig. 1, 2, 3, 4, and 5. Thus, the operations and features described above for the method and the corresponding technical effects are also applicable to the modules in fig. 7, and are not described again here.
In other embodiments, an embodiment of the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions may execute the voice interaction method for subway ticket booking in any of the above method embodiments;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
responding to the acquired command for starting the daemon process, and detecting the process state of the voice software;
if the voice software process is in the non-existing state, starting the voice software process;
if the voice software process is in the existing state, judging whether the voice software process is overtime;
if the voice software process is overtime, the state of the voice software process is detected again.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the voice interactive apparatus for subway ticket purchase, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory remotely located from the processor, which may be connected over a network to a voice interaction device for subway ticketing. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Embodiments of the present invention further provide a computer program product, where the computer program product includes a computer program stored on a non-volatile computer-readable storage medium, and the computer program includes program instructions, and when the program instructions are executed by a computer, the computer executes any one of the above voice interaction methods for subway ticket purchasing.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 8, the electronic device includes: one or more processors 710 and a memory 720, one processor 710 being illustrated in fig. 8. The equipment of the voice interaction method for subway ticket purchasing can also comprise the following steps: an input device 730 and an output device 740. The processor 710, the memory 720, the input device 730, and the output device 740 may be connected by a bus or other means, such as the bus connection in fig. 7. The memory 720 is a non-volatile computer-readable storage medium as described above. The processor 710 executes various functional applications and data processing of the server by running the nonvolatile software programs, instructions and modules stored in the memory 720, namely, implements the voice interaction method for subway ticket purchasing of the above-mentioned method embodiments. The input device 730 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the voice interactive device for subway ticket purchase. The output device 740 may include a display device such as a display screen.
The product can execute the method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.
As an implementation manner, the electronic device is applied to a voice interaction device for subway ticket purchasing, and is used for a client, and the voice interaction device comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to:
responding to the acquired command for starting the daemon process, and detecting the process state of the voice software;
if the voice software process is in the non-existing state, starting the voice software process;
if the voice software process is in the existing state, judging whether the voice software process is overtime;
if the voice software process is overtime, the state of the voice software process is detected again.
The electronic device of the embodiments of the present application exists in various forms, including but not limited to:
(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.
(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc.
(3) A portable entertainment device: such devices can display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.
(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.
(5) And other electronic devices with data interaction functions.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A voice interaction method for subway ticket purchasing comprises the following steps:
responding to the acquired command for starting the daemon process, and detecting the process state of the voice software;
if the voice software process is in the non-existing state, starting the voice software process;
if the voice software process is in the existing state, judging whether the voice software process is overtime;
and if the voice software process is overtime, detecting the state of the voice software process again.
2. The method of claim 1, wherein prior to detecting a state of a speech software process in response to the retrieved launch daemon instruction, the method comprises:
responding to the acquired instruction for starting the voice software process, carrying out face detection and judging whether a face is detected or not;
if the face is detected, starting a microphone and continuously judging whether the current user has lip movement based on the face detection;
if lip motion exists, audio information is collected based on the microphone;
responding to the acquired audio information, and judging whether human voice exists or not;
and if the voice exists, carrying out audio recognition on the audio information, and outputting the obtained recognition result.
3. The method of claim 2, wherein the audio recognition further comprises online audio recognition, and the audio recognition of the audio information and outputting of the obtained recognition result if the human voice exists comprises:
if the voice exists, detecting whether networking is performed or not;
and if the networking is detected, performing online audio recognition on the audio information, and outputting an obtained online recognition result.
4. The method of claim 3, wherein the performing audio recognition on the audio information if the human voice exists and outputting the obtained recognition result further comprises:
responding to the obtained online identification result, and outputting an online semantic result based on an online semantic processing model;
responding to the acquired online semantic result, and outputting an online dialogue result based on an online dialogue processing model;
judging whether ticket purchasing information and/or language information exists in the online conversation result;
and if the online dialogue result contains ticket purchasing information and/or language information, analyzing the ticket purchasing information and/or language information to output a ticket purchasing information result and/or language information result.
5. The method of claim 4, wherein after determining whether ticketing information and/or language information is present in the online conversation result, the method further comprises:
and if the ticket purchasing information and/or language information does not exist in the online conversation result, acquiring the audio information again.
6. The method of claim 2, wherein the audio recognition comprises offline audio recognition, and the detecting whether networking is enabled if a human voice is present comprises:
and if the networking is not detected, performing the offline audio recognition on the audio information to obtain an offline recognition result.
7. The method of claim 6, wherein the performing audio recognition on the audio information if the human voice exists and outputting the obtained recognition result further comprises:
responding to the acquired offline recognition result, and outputting an offline semantic result based on an offline semantic processing model;
responding to the acquired offline semantic result, and outputting an offline dialogue result based on an offline dialogue processing model;
judging whether ticket purchasing information and/or language information exists in the offline conversation result;
and if the offline conversation result contains ticket purchasing information and/or language information, analyzing the ticket purchasing information and/or language information to output a ticket purchasing information result and/or language information result.
8. The method of claim 7, wherein after determining whether ticketing information and/or language information is present in the offline conversation result, the method further comprises:
and if the off-line dialogue result does not contain ticket purchasing information and/or language information, acquiring the audio information again.
9. A voice interaction device for subway ticket purchasing comprises:
the first detection module is configured to respond to the acquired daemon starting instruction and detect the state of the voice software process;
the starting module is configured to start the voice software process if the voice software process is in an absent state;
the judging module is configured to judge whether the voice software process is overtime or not if the voice software process is in the existing state;
and the second detection module is configured to detect the state of the voice software process again if the voice software process is overtime.
10. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1 to 8.
CN202011306399.8A 2020-11-19 2020-11-19 Voice interaction method and device for subway ticket purchasing Withdrawn CN112363861A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011306399.8A CN112363861A (en) 2020-11-19 2020-11-19 Voice interaction method and device for subway ticket purchasing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011306399.8A CN112363861A (en) 2020-11-19 2020-11-19 Voice interaction method and device for subway ticket purchasing

Publications (1)

Publication Number Publication Date
CN112363861A true CN112363861A (en) 2021-02-12

Family

ID=74534288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011306399.8A Withdrawn CN112363861A (en) 2020-11-19 2020-11-19 Voice interaction method and device for subway ticket purchasing

Country Status (1)

Country Link
CN (1) CN112363861A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114566148A (en) * 2022-04-02 2022-05-31 北京百度网讯科技有限公司 Cluster voice recognition service, detection method and device thereof, and electronic equipment
CN115585529A (en) * 2021-07-05 2023-01-10 宁波奥克斯电气股份有限公司 Online voice module daemon method and system and air conditioner

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115585529A (en) * 2021-07-05 2023-01-10 宁波奥克斯电气股份有限公司 Online voice module daemon method and system and air conditioner
CN114566148A (en) * 2022-04-02 2022-05-31 北京百度网讯科技有限公司 Cluster voice recognition service, detection method and device thereof, and electronic equipment

Similar Documents

Publication Publication Date Title
CN110648692B (en) Voice endpoint detection method and system
CN109473104B (en) Voice recognition network delay optimization method and device
CN112363861A (en) Voice interaction method and device for subway ticket purchasing
CN107103906A (en) It is a kind of to wake up method, smart machine and medium that smart machine carries out speech recognition
CN111312218B (en) Neural network training and voice endpoint detection method and device
CN110827858B (en) Voice endpoint detection method and system
CN104766608A (en) Voice control method and voice control device
CN111302167A (en) Elevator voice control method and device
CN109360551B (en) Voice recognition method and device
CN111968631A (en) Interaction method, device, equipment and storage medium of intelligent equipment
CN111816216A (en) Voice activity detection method and device
CN110910874A (en) Interactive classroom voice control method, terminal equipment, server and system
CN112286364A (en) Man-machine interaction method and device
CN110890104B (en) Voice endpoint detection method and system
CN112331187B (en) Multi-task speech recognition model training method and multi-task speech recognition method
CN110808073A (en) Voice activity detection method, voice recognition method and system
CN111785277A (en) Speech recognition method, speech recognition device, computer-readable storage medium and processor
CN113852835A (en) Live broadcast audio processing method and device, electronic equipment and storage medium
CN112331203A (en) Intelligent household equipment control method and device, electronic equipment and storage medium
CN109726026A (en) A kind of interaction data processing method, device, equipment and storage medium
CN111369985A (en) Voice interaction method, device, equipment and medium
CN116129890A (en) Voice interaction processing method, device and storage medium
CN111783723A (en) Dynamic gesture recognition normalization method and device, electronic equipment and storage medium
CN115762505A (en) Voice interaction method, electronic device and storage medium
CN116259304A (en) Continuous interactive method of voice and related products

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant before: AI SPEECH Ltd.

CB02 Change of applicant information
WW01 Invention patent application withdrawn after publication

Application publication date: 20210212

WW01 Invention patent application withdrawn after publication