CN112669839B

CN112669839B - Voice interaction method, device, equipment and storage medium

Info

Publication number: CN112669839B
Application number: CN202011496868.7A
Authority: CN
Inventors: 林炜贤
Original assignee: Apollo Zhilian Beijing Technology Co Ltd
Current assignee: Apollo Zhilian Beijing Technology Co Ltd
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2023-08-08
Anticipated expiration: 2040-12-17
Also published as: CN112669839A

Abstract

The application discloses a voice interaction method, a voice interaction device, voice interaction equipment and a voice interaction storage medium, and relates to the field of artificial intelligence. The specific implementation scheme is as follows: outputting a first prompt voice for prompting the input of a specific vocabulary; in response to receiving the voice information, judging whether the number of words included in the voice information is greater than or equal to a preset value; if the number of words included in the voice information is greater than or equal to the preset value, matching the voice information with a preset information base; and outputting a second prompt voice according to the matching result. The method improves the response efficiency of voice interaction.

Description

Voice interaction method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to an artificial intelligence technology, in particular to a voice interaction method, a device, equipment and a storage medium, which can be used in the field of Internet of vehicles.

Background

Along with the development of intelligent vehicle-mounted equipment, the voice function of the vehicle-mounted equipment also derives a plurality of entertainment functions which are convenient for users to relax moods, such as intelligent question answering, idiom receiving, and the like.

When the voice interaction function is used, the vehicle-mounted equipment normally outputs a prompt, the user carries out corresponding voice response, the vehicle-mounted equipment recognizes the voice of the user, matches the voice of the user with a preset information base, judges whether the user response is correct or not, and then carries out the next interaction according to the judgment result.

In the process of answering, the user may speak some invalid impurity voices due to spoken language habits or word jams, etc., but the impurity voices are also recognized by the vehicle-mounted device and matched with the preset information base, so that the system can process a lot of invalid data, and the response efficiency is low.

Disclosure of Invention

The application provides a voice interaction method, device and equipment for improving response efficiency and a storage medium.

According to an aspect of the present application, there is provided a voice interaction method, including:

outputting a first prompt voice for prompting the input of a specific vocabulary;

in response to receiving the voice information, judging whether the number of words included in the voice information is greater than or equal to a preset value;

if the number of words included in the voice information is greater than or equal to the preset value, matching the voice information with a preset information base;

and outputting a second prompt voice according to the matching result.

According to another aspect of the present application, there is provided a voice interaction device, including:

the first output module is used for outputting first prompt voice for prompting to input specific vocabulary;

the judging module is used for responding to the received voice information and judging whether the number of words included in the voice information is larger than or equal to a preset value;

the matching module is used for matching the voice information with a preset information base when the number of words included in the voice information is greater than or equal to the preset value;

and the second output module is used for outputting a second prompt voice according to the matching result.

According to still another aspect of the present application, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the voice interaction method described above.

According to yet another aspect of the present application, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the above-described voice interaction method.

According to yet another aspect of the present application, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the above-mentioned voice interaction method.

According to the technical scheme, the processing process of impurity voice in the voice interaction process is reduced, and the response efficiency is improved.

It should be understood that the description of this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

fig. 1 is an application scenario schematic diagram of a voice interaction method according to an embodiment of the present application;

FIG. 2 is a flow chart of a voice interaction method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a voice interaction method according to an embodiment of the present application;

fig. 4 is a schematic block diagram of an electronic device for implementing a voice interaction method of an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is an application scenario schematic diagram of a voice interaction method according to an embodiment of the present application. As shown in fig. 1, the in-vehicle apparatus 101 has a voice function to perform voice interaction with a user. The voice input by the user can be recognized by the in-vehicle apparatus 101 and subjected to subsequent interactive logic processing. Alternatively, the in-vehicle apparatus 101 may also be connected to the server apparatus 102, and the server apparatus 102 may communicate with the in-vehicle apparatus 101 in various ways. The server device 102 is, for example, a cloud control platform, a vehicle-road cooperative management platform, a central subsystem, an edge computing platform, a cloud computing platform, or the like. The in-vehicle device 101 may transmit the recognized user voice to the server device 102, and the server device 102 performs subsequent interactive logic processing.

When the user uses the voice interaction function, taking idiom receiving as an example, the vehicle-mounted device 101 outputs an idiom first, the user performs corresponding voice response, the vehicle-mounted device 101 recognizes the user voice and matches the user voice with the idiom library, or the vehicle-mounted device 101 sends the recognized user voice to the server device 102, the server device 102 matches the user voice with the idiom library, whether the user response is correct or not is judged, and then the next round of interaction is performed according to the judgment result.

For example, the vehicle-mounted device 101 outputs idioms "plain sailing", the user makes a corresponding connection, for example, the user speaks "forward seal", after the vehicle-mounted device 102 recognizes the user voice, the user voice "forward seal" is sent to the server device 102, and the server device 102 matches the user voice with the idioms in the idiom library to determine whether the idioms are idioms, and further determines whether the end-to-end connection is satisfied. However, when the user answers the dragon, the user may speak some ineffective impurity voices due to the conditions of spoken language habit, word-of-word katon and the like, for example, when the user answers the dragon, the user speaks two words and pauses for thinking, after recognizing the user voice, the vehicle-mounted device 101 sends the user voice to the server device 102, and the server device 102 matches the user voice with the idioms in the idiom library to determine whether the idioms are idioms.

However, in the above case, the voice replied by the user is not actually required to be matched with the idiom library to determine that the voice replied by the user is not the idiom correctly connected, so that in this case, it is unnecessary to match the voice of the user with the idiom in the idiom library, and because the amount of data in the idiom library is large, the matching process is time-consuming, so that the response efficiency of voice interaction is low, and the user experience is poor.

The method can be applied to the field of Internet of vehicles in the field of artificial intelligence, and the method performs preliminary screening and filtering on the voice input by the user, performs subsequent matching processing on the filtered non-impurity voice, and does not need subsequent processing on the impurity voice, so that the processing efficiency is improved, and the voice interaction experience is better.

The voice interaction method provided by the application will be described in detail through specific embodiments. It is to be understood that the following embodiments may be combined with each other and that some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 2 is a flow chart of a voice interaction method according to an embodiment of the present application. The execution main body of the method is vehicle-mounted equipment. As shown in fig. 2, the method includes:

s201, outputting first prompt voice for prompting to input specific vocabulary.

The user can wake up through voice or touch selection on a screen of the vehicle-mounted device to start a voice interaction entertainment function of the vehicle-mounted device, and then the vehicle-mounted device outputs first prompt voice for prompting to input specific words. The first alert voices may also be different depending on the voice interactive entertainment function. For example, the first prompting voice may be a idiom randomly output by the vehicle-mounted device, and is used for prompting the user to make a dragon according to the idiom, for example, the vehicle-mounted device outputs "spring bloom"; alternatively, the first prompt voice may be a prompt message prompting the user to speak a idiom of a certain type, for example, the in-vehicle apparatus outputs "please speak idiom describing mood".

S202, in response to receiving the voice information, judging whether the number of words included in the voice information is larger than or equal to a preset value.

The user speaks corresponding voice information according to the heard first prompt voice to answer, the vehicle-mounted equipment receives the voice information input by the user and then recognizes the voice information, and specifically, a voice recognition engine in the vehicle-mounted equipment can convert the voice information into text information.

After the vehicle-mounted equipment recognizes the voice information, whether the number of words included in the voice information is larger than or equal to a preset value is continuously judged. For example, since the number of words of the idiom is generally 4 words or more, the preset value may be set to 4, that is, whether the number of words included in the voice information is greater than or equal to 4 is determined, so that a part of the impurity voice is filtered by the number of words.

S203, if the number of words included in the voice information is greater than or equal to a preset value, matching the voice information with a preset information base.

When the number of words included in the voice information is greater than or equal to a preset value, the voice information is determined to be a reply of the user according to the first prompt voice, so that the voice information is matched with a preset information base, the preset information base can be a idiom base, whether the reply of the user is correct or not is determined according to a matching result, and a second prompt voice is output. For example, the vehicle-mounted device outputs "spring bloom", and if the voice information input by the user is "mountain-blooming nose ancestor", the voice information is continuously matched with the preset information base because the voice information comprises 4 words. Here, the vehicle-mounted device may send the recognized voice information to the server device, so that the server device matches the voice information with the preset information base, or the vehicle-mounted device may match the voice information with the preset information base by itself, which is not limited in this embodiment.

S204, outputting a second prompt voice according to the matched result.

If the voice information input by the user is successfully matched with the preset information base, the second prompt voice can be a prompt voice interacted again, for example, the idiom is correct, and the vehicle-mounted equipment outputs the idiom of the idiom receiving user voice; if the matching of the voice information input by the user and the preset information base fails, the second prompting voice can prompt the user to input a new idiom again or perform a new round of interaction, for example, the idiom is failed to connect, and the second prompting voice can prompt the user to input the idiom again or output a new idiom again to perform a new interaction.

According to the voice interaction method provided by the embodiment, the word number screening is carried out on the voice input by the user, the impurity voice with fewer words is initially filtered, and only the effective voice information with more words is subjected to subsequent matching processing, so that the interaction response efficiency is prevented from being influenced by the processing of the impurity voice, and the voice interaction experience is better.

On the basis of the above embodiment, if the number of words included in the voice information input by the user is smaller than the preset value, a third prompting voice is output, and the third prompting voice is used for prompting the user to reenter the specific vocabulary. For example, the in-vehicle device outputs "spring bloom", the voice information input by the user is "what is on", and since the number of words of the voice information is 3 words, it is not necessary to match the voice information with the preset information base, but the third prompting voice is directly output, for example, the third prompting voice may be "please speak the correct idiom again". If the matching process is processed by the server device, when the number of words included in the voice information input by the user is smaller than a preset value, the vehicle does not need to send the voice information input by the user to the server device, so that the flow consumption of the vehicle-mounted device is reduced.

In different voice interaction scenarios, the matching of the voice information with the preset information base in S204 in the above embodiment may be implemented in different manners, which is further described below.

Mode one

Matching the voice information with the first prompt voice; if the first word in the voice information is matched with the last word in the first prompt voice, the voice information is matched with a preset information base, and a second prompt voice is output according to the matching result.

In this implementation manner, taking the voice interaction function as a idiom receiving unit as an example, the first prompt voice is an idiom output by the vehicle-mounted device, after judging that the number of words included in the voice information input by the user is greater than or equal to a preset value, matching the voice information with the first prompt voice, if the first word in the voice information is matched with the last word in the first prompt voice, indicating that the voice information input by the user meets end-to-end connection, thus, further determining whether the voice information input by the user is idiom, and matching the voice information with a preset information library, namely, the idiom library, and outputting the second prompt voice according to a matching result.

If the first word in the voice information is not matched with the last word in the first prompt voice, the voice input by the user is not satisfied with the end-to-end connection, so that whether the voice information input by the user is idiom is not needed to be further determined, namely, the voice information is not needed to be matched with a preset information base, namely, the idiom base.

Because the data volume in the preset information base is larger, the matching process of the voice information and the preset information base is relatively time-consuming, so that the method of the embodiment can reduce the matching of the voice information and the preset information base and improve the response efficiency.

Mode two

Determining whether a plurality of words included in the voice information meet a preset format; if the plurality of words included in the voice information meet the preset format, the voice information is matched with a preset information base, and a second prompt voice is output according to the matching result.

In the implementation manner, taking the voice interaction function as an example of speaking the idioms meeting the requirements according to the prompt of the vehicle-mounted equipment, the first prompt voice is used for prompting the user to speak the idioms meeting the preset format. Illustratively, the first prompt voice prompts the user to speak idioms in AABB format. After the user inputs the voice information, after judging that the number of words included in the voice information is greater than or equal to a preset value, determining whether a plurality of words included in the voice information meet an AABB format, if the plurality of words included in the voice information meet the AABB format, further determining whether the voice information input by the user is idioms, and matching the voice information with a preset information base, namely a idiom base, and outputting a second prompt voice according to a matching result.

If the plurality of words in the voice information do not meet the preset format, whether the voice information input by the user is idiom or not is not needed to be further determined, namely, the voice information is not needed to be matched with a preset information base.

Because the data volume in the preset information base is large, the matching process of the voice information and the preset information base is relatively time-consuming, and therefore the method of the embodiment reduces the matching of the voice information and the preset information base through format matching and improves response efficiency.

Mode three

Matching the voice information with target labels in a preset information base; the target tag is a first alert voice indicated tag.

In this way, taking the voice interaction function as an example of the vehicle-mounted device to prompt the user to speak the idioms with the target tags, the first prompt voice is used to prompt the user to speak the idioms with the target tags. For example, the first alert voice is "please speak idioms describing moods". After the user inputs the voice information, after judging that the word number included in the voice information is larger than or equal to a preset value, determining that information of a target tag for describing moods in a preset information base is included, matching the voice information with the information of the target tag, and outputting second prompt voice according to a matching result.

For example, if 100 idioms with a target tag describing moods are in the idiom library, after judging that the number of words included in the voice information is greater than or equal to a preset value, matching the voice information with the 100 idioms, and determining whether the answer of the user is correct. By matching the voice information with the target label in the preset information base, the matching of the voice information with the whole preset information base is reduced, and the response efficiency is improved.

On the basis of any of the above matching modes, when matching the voice information with a preset information base, it can be further determined whether the voice information input by the user is historical voice information, wherein the historical voice information is information output by the vehicle-mounted device or input by the user before receiving the voice information; and if the voice information is not the historical voice information, matching the voice information with a preset information base.

For example, when the idiom is picked up, the user is not allowed to speak the idiom which has been spoken before, after the vehicle-mounted device outputs "give birth to pick up the meaning", the user picks up the meaning "thin cloud day", because the user picks up the meaning correctly, the vehicle-mounted device picks up the meaning "heaven and earth" again, if the user speaks the meaning "thin cloud day" again, because the idiom is historical voice information, the vehicle-mounted device prompts the user to speak a new idiom again. If the user receives the dragon "no review" the voice information is not the historical voice information, so the voice information is continuously matched with the preset information base to determine whether the answer of the user is correct.

According to the above various matching modes and the matching of the historical voice information in the embodiment, in the voice interaction method of the embodiment of the application, when the user voice is verified, other limiting conditions of voice interaction are firstly matched, finally, matching of idioms is carried out, if other limiting conditions are not met, matching of the idioms is not needed, and therefore response efficiency is improved, and user experience is improved.

In addition, in the preset time after the vehicle-mounted device outputs the first prompt voice, if voice information input by the user is not received, the vehicle-mounted device outputs fourth prompt voice which is used for prompting the user to finish voice interaction, and therefore the vehicle-mounted device is prevented from being in a monitoring state for a long time to wait for the user.

Fig. 3 is a schematic structural diagram of a voice interaction device according to an embodiment of the present application. As shown in fig. 3, the voice interaction apparatus 300 includes:

a first output module 301, configured to output a first prompt voice for prompting to input a specific vocabulary;

a judging module 302, configured to respond to receiving the voice information, and judge whether the number of words included in the voice information is greater than or equal to a preset value;

a matching module 303, configured to match the voice information with a preset information base when the number of words included in the voice information is greater than or equal to a preset value;

and a second output module 304, configured to output a second prompt voice according to the matching result.

Optionally, the matching module 303 includes:

a first matching unit 3031, configured to match the voice information with a first prompt voice;

and the second matching unit 3032 is used for matching the voice information with a preset information base when the first word in the voice information is matched with the last word in the first prompt voice.

Optionally, the matching module 303 includes:

a first determining unit 3033, configured to determine whether a plurality of words included in the voice information satisfy a preset format;

and a third matching unit 3034, configured to match the voice information with a preset information base when a plurality of words included in the voice information satisfy a preset format.

Optionally, the matching module 303 includes:

a fourth matching unit 3035, configured to match the voice information with information having the target tag in a preset information base; the target tag is a first alert voice indicated tag.

Optionally, the matching module 303 includes:

a second determining unit 3036 for determining whether the voice information is history voice information; the history voice information is information output by the in-vehicle apparatus or input by the user before the voice information is received;

and a fifth matching unit 3037, configured to match the voice information with a preset information base when the voice information is not the history voice information.

Optionally, the voice interaction device 300 further includes:

and the third output module is used for outputting third prompt voice when the number of words included in the voice information is smaller than a preset value, wherein the third prompt voice is used for prompting a user to input the specific word again.

Optionally, the voice interaction device 300 further includes:

and the fourth output module is used for outputting fourth prompt voice when voice information is not received in a preset time after the first prompt voice is output, and the fourth prompt voice is used for prompting the user to finish voice interaction.

The voice interaction device provided in this embodiment of the present application may execute the technical scheme of the voice interaction method in any of the foregoing embodiments, and the implementation principle and beneficial effects of the voice interaction method are similar to those of the voice interaction method, and may refer to the implementation principle and beneficial effects of the voice interaction method, which are not described herein.

According to embodiments of the present application, an electronic device and a readable storage medium are also provided.

According to an embodiment of the present application, there is also provided a computer program product comprising: a computer program stored in a readable storage medium, from which at least one processor of an electronic device can read, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any one of the embodiments described above.

Fig. 4 is a schematic block diagram of an electronic device for implementing a voice interaction method of an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 4, the electronic device 400 includes a computing unit 401 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In RAM 403, various programs and data required for the operation of device 400 may also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

Various components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, etc.; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408, such as a magnetic disk, optical disk, etc.; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the various methods and processes described above, such as a voice interaction method. For example, in some embodiments, the voice interaction method may be implemented as a computer software program tangibly embodied on a machine-readable medium, e.g., the storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into RAM 403 and executed by computing unit 401, one or more steps of the voice interaction method described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the voice interaction method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A voice interaction method, comprising:

if the number of words included in the voice information is smaller than a preset value, outputting a third prompting voice, wherein the third prompting voice is used for prompting a user to input a specific word again;

if the number of words included in the voice information is greater than or equal to the preset value, determining whether the voice information is historical voice information; the history voice information is information output by the vehicle-mounted device or input by a user before the voice information is received;

if the voice information is not the historical voice information, matching the voice information with a preset information base;

outputting a second prompt voice according to the matching result;

if the first prompting voice is used for prompting the user to speak the voice information with the target tag, the matching the voice information with the preset information base comprises the following steps:

matching the voice information with target labels in the preset information base; the target tag is a tag of the first prompt voice indication;

if the first prompting voice is used for prompting the user to speak the voice information meeting the preset format, the matching the voice information with the preset information base includes:

determining whether a plurality of words included in the voice information meet a preset format;

and if the plurality of words included in the voice information meet the preset format, matching the voice information with a preset information base.

2. The method according to claim 1, wherein if the first prompt voice is idioms output by the vehicle-mounted device, the matching the voice information with a preset information base includes:

matching the voice information with the first prompt voice;

and if the first word in the voice information is matched with the last word in the first prompt voice, matching the voice information with a preset information base.

3. The method of claim 1, the method further comprising:

and if the voice information is not received within the preset time after the first prompting voice is output, outputting a fourth prompting voice, wherein the fourth prompting voice is used for prompting the user to finish voice interaction.

4. A voice interaction apparatus comprising:

the third output module is used for outputting third prompt voice when the number of words included in the voice information is smaller than a preset value, and the third prompt voice is used for prompting a user to input a specific word again;

the matching module is used for determining whether the voice information is historical voice information or not if the number of words included in the voice information is larger than or equal to the preset value; the history voice information is information output by the vehicle-mounted device or input by a user before the voice information is received; if the voice information is not the historical voice information, matching the voice information with a preset information base;

the second output module is used for outputting a second prompt voice according to the matching result;

the matching module comprises:

a fourth matching unit, configured to match the voice information with the target tag in the preset information base if the first prompting voice is used to prompt the user to speak the voice information with the target tag; the target tag is a tag of the first prompt voice indication;

a first determining unit, configured to determine whether a plurality of words included in the voice information satisfy a preset format if the first prompting voice is used for prompting a user to speak the voice information satisfying the preset format;

and the third matching unit is used for matching the voice information with a preset information base when a plurality of words included in the voice information meet a preset format.

5. The apparatus of claim 4, the matching module further comprising:

the first matching unit is used for matching the voice information with the first prompt voice if the first prompt voice is idiom output by the vehicle-mounted equipment;

and the second matching unit is used for matching the voice information with a preset information base when the first word in the voice information is matched with the last word in the first prompt voice.

6. The apparatus of claim 4, the apparatus further comprising:

7. An electronic device, comprising:

at least one processor; and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.

8. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-3.