CN117012194A

CN117012194A - Method for improving visible and so-to-speak recognition rate of vehicle-end networking application

Info

Publication number: CN117012194A
Application number: CN202310875319.8A
Authority: CN
Inventors: 席辉; 王阳; 杨志为; 李振龙
Original assignee: FAW Bestune Car Co Ltd
Current assignee: FAW Bestune Car Co Ltd
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-11-07

Abstract

The application belongs to the technical field of automobiles, and particularly relates to a method for improving the visible and namely identification rate of an automobile end internet connection application. The method comprises the following steps: step one, extracting visible control information; step two, providing visible control information which can be said of the current page; step three, voice reception is carried out; step four, recognizing text content corresponding to the voice; step five, carrying out semantic arbitration; step six, executing the semantics when judging that the effective semantics are obtained, and ending the arbitration when judging that the ineffective semantics are not obtained. The application can realize the semantic recognition function of off-line voice at the vehicle end under the condition of weak network, perfect and improve the recognition rate of the text-to-voice function and ensure the user experience.

Description

Method for improving visible and so-to-speak recognition rate of vehicle-end networking application

Technical Field

The application belongs to the technical field of automobiles, and particularly relates to a method for improving the visible and namely identification rate of an automobile end internet connection application.

Background

The voice visible and speaking function of the vehicle-end networking application at the present stage, namely, the page control which can be seen by the current user, can be realized through voice, and simple operations such as clicking, sliding and the like can be realized, so that both hands are liberated. Although the offline speech recognition of the vehicle end can analyze most of the characters, the semantic is analyzed, the offline speech recognition of the vehicle end is almost impossible, so that the service is always to upload control information of the current page to a cloud speech recognition server after the control information of the current page is extracted, and then local speech is uploaded to the cloud speech recognition server, so that the success rate of speech recognition of the cloud is improved, after the characters corresponding to the speech are analyzed, the corresponding semantic is analyzed through a powerful semantic analysis function of the cloud, and finally the corresponding action is performed after the corresponding action is issued to the vehicle end. The function extremely depends on analysis of the cloud server, and when the vehicle is in a weak network environment, the usability of the function can be seen to be reduced linearly, and the user experience is affected.

Disclosure of Invention

In order to solve the problems, the application provides a method for improving the visible and namely the speaking recognition rate of the vehicle-end networking application, which can realize the semantic recognition function of the vehicle-end offline voice under the condition of weak network, perfect and improve the recognition rate of the text-to-voice function and ensure the user experience.

The technical scheme of the application is as follows in combination with the accompanying drawings:

in a first aspect, an embodiment of the present application provides a method for improving a visible i.e. a recognition rate of a vehicle-end internet connection application, including the following steps:

step one, extracting visible control information;

step two, providing visible control information which can be said of the current page;

step three, voice reception is carried out;

step four, recognizing text content corresponding to the voice;

step five, carrying out semantic arbitration;

step six, executing the semantics when judging that the effective semantics are obtained, and ending the arbitration when judging that the ineffective semantics are not obtained.

In the fourth step, the cloud server is used for identifying the semantics or the vehicle end is used for identifying the semantics offline.

Further, after the vehicle-end offline recognition of the semantics, performing multi-mode matching with priority order on the semantic characters, and matching out the action and control information as the vehicle-end offline recognition semantic result.

Further, in the step six, before blanking is finished, an effective cloud server semantic result is received, arbitration is finished, and corresponding operation is immediately executed according to the cloud server semantic result; before the arbitration is finished, receiving an effective vehicle-end semantic result, waiting, and if the vehicle-end semantic result is informed that the semantic result is invalid, finishing the arbitration, and immediately executing corresponding operation according to the vehicle-end semantic result; if the cloud server semantics do not return a result until the arbitration is overtime, immediately executing corresponding operation according to the vehicle-end semantic result.

Further, only executable semantics are considered valid semantics.

Further, the specific method for identifying the semantics offline through the vehicle end is as follows:

analyzing the content of the voice converted text offline;

matching control information;

matching action information;

judging whether the control information and the action information exist or not;

if only the action information exists, searching a control capable of executing the action, judging whether the semantic result is unique, and if so, successfully analyzing the semantic; if not, the semantic analysis fails;

if only the control information exists, analyzing the default action corresponding to the control, judging whether the semantic result is unique, and if so, successfully analyzing the semantic; if not, the semantic analysis fails;

judging whether the action is executable for the control if the action information exists and the action information exists, judging whether the semantic result is unique if the action is executable for the control, and if the semantic result is unique, successfully analyzing the semantic; if not, the semantic analysis fails; if the action is not executable for the control, the semantic parsing fails.

In a second aspect, an embodiment of the present application further provides a device for improving a visible i.e. a recognition rate of a vehicle end internet connection application, including:

the extraction module is used for extracting visible control information;

the providing module is used for providing visible control information of the current page;

the sound receiving module is used for receiving sound;

the recognition module is used for recognizing the text content corresponding to the voice;

the arbitration module is used for carrying out semantic arbitration;

and the judging module is used for executing the semantics when judging that the effective semantics are obtained, and ending the arbitration when judging that the ineffective semantics are not obtained.

In a third aspect, a terminal is provided, including:

one or more processors;

a memory for storing the one or more processor-executable instructions;

wherein the one or more processors are configured to:

the method according to the first aspect of the embodiment of the application is performed.

In a fourth aspect, a non-transitory computer readable storage medium is provided, which when executed by a processor of a terminal, enables the terminal to perform the method according to the first aspect of the embodiments of the application.

In a fifth aspect, an application product is provided, which when running at a terminal causes the terminal to perform the method according to the first aspect of the embodiments of the application.

The beneficial effects of the application are as follows:

1) Because the page control of the vehicle-end networking application is easy to understand by the user, the text content corresponding to the user voice can be accurately identified with great probability by using the vehicle-end offline voice identification and analysis function, namely, the cloud identification is not necessarily completely relied on;

2) It can be seen that the actions required to be implemented by the application program are very limited, i.e. clicking, sliding, etc., and the corresponding actions can be deduced in many cases by processing the action keywords according to the controls corresponding to the phonetic text. In other words, the application can realize the semantic recognition function of the off-line voice of the vehicle end under the condition of weak network, perfect and improve the recognition rate of the visible and namely the function, and ensure the user experience.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for improving the visual and so-to-speak recognition rate of a vehicle-end networking application according to the present application;

FIG. 2 is a flow diagram of offline semantic recognition;

FIG. 3 is a schematic structural diagram of a device for improving the visual and so-to-speak recognition rate of a vehicle-end networking application according to the present application;

fig. 4 is a schematic block diagram of a terminal structure.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.

Example 1

Fig. 1 is a flowchart of a method for improving the visible i.e. identification rate of a vehicle end internet connection application according to an embodiment of the present application, where the embodiment is applicable to a case of improving the visible i.e. identification rate of a vehicle end internet connection application, and the method may be performed by a device for improving the visible i.e. identification rate of a vehicle end internet connection application according to an embodiment of the present application, and the device may be implemented in a software and/or hardware manner.

Referring to fig. 1, a method for improving the visible and so-to-speak recognition rate of a vehicle-end networking application includes the following steps:

step one, extracting visible control information;

step three, voice reception is carried out;

step four, recognizing text content corresponding to the voice;

the semantics are identified through the cloud server or offline through the vehicle end. The vehicle end offline recognition semantics can be customized, on the one hand, the vehicle end internet-connected application related recognition library can be customized, only the characters and the character parts in the semantics are analyzed, and compared with cloud recognition, the service is simpler, and the recognition success rate is higher; on the other hand, the matching strategy of the offline voice can gradually update and strengthen the recognition capability according to the accumulation of practical data and big data learning. These all provide reinforcement for the visible i.e. function in the case of weak networks, improving the recognition rate of the visible i.e. function.

Referring to fig. 2, a specific method for offline semantic recognition at the vehicle end is as follows:

analyzing the content of the voice converted text offline;

matching control information;

matching action information;

Step five, carrying out semantic arbitration;

Starting semantic arbitration timing from the completion of voice reception, and ending arbitration if no valid semantic result of the cloud or the vehicle end returns within a set time or the results are invalid, wherein the semantic recognition is considered to be failed.

Before blanking is finished, receiving an effective cloud server semantic result, finishing arbitration, and immediately executing corresponding operation according to the cloud server semantic result; before the arbitration is finished, receiving an effective vehicle-end semantic result, waiting, and if the vehicle-end semantic result is informed that the semantic result is invalid, finishing the arbitration, and immediately executing corresponding operation according to the vehicle-end semantic result; if the cloud server semantics do not return a result until the arbitration is overtime, immediately executing corresponding operation according to the vehicle-end semantic result.

Only the executable semantics are considered as valid semantics.

In conclusion, the application can realize the semantic recognition function of the off-line voice of the vehicle end under the condition of weak network, perfect and improve the recognition rate of the visible and namely the function, and ensure the user experience.

Example two

Referring to fig. 3, an apparatus for improving the visible and so-to-speak recognition rate of a vehicle end network connection application includes:

the extraction module is used for extracting visible control information;

the sound receiving module is used for receiving sound;

the arbitration module is used for carrying out semantic arbitration;

Example III

Fig. 4 is a block diagram of a terminal according to an embodiment of the present application, and the terminal may be a terminal according to the above embodiment. The terminal may be a portable mobile terminal such as: smart phone, tablet computer. Terminals may also be referred to by other names, user equipment, portable terminals, etc.

Generally, the terminal includes: a processor 301 and a memory 302.

Processor 301 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 301 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 301 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 301 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 301 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 302 may include one or more computer-readable storage media, which may be tangible and non-transitory. Memory 302 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 302 is used to store at least one instruction for execution by processor 301 to implement a method of improving the visual, i.e., the recognition rate of a vehicle end networking application provided in the present application.

In some embodiments, the terminal may further optionally include: a peripheral interface 303, and at least one peripheral. Specifically, the peripheral device includes: at least one of radio frequency circuitry 304, touch screen 305, camera 306, audio circuitry 307, positioning component 308, and power supply 309.

The peripheral interface 303 may be used to connect at least one Input/Output (I/O) related peripheral to the processor 301 and the memory 302. In some embodiments, processor 301, memory 302, and peripheral interface 303 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 301, the memory 302, and the peripheral interface 303 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 304 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuitry 304 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 304 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 304 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 304 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 304 may also include NFC (Near Field Communication ) related circuitry, which is not limiting of the application.

The touch display screen 305 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. The touch screen 305 also has the ability to collect touch signals at or above the surface of the touch screen 305. The touch signal may be input as a control signal to the processor 301 for processing. The touch screen 305 is used to provide virtual buttons and/or virtual keyboards, also known as soft buttons and/or soft keyboards. In some embodiments, the touch display 305 may be one, providing a front panel of the terminal; in other embodiments, the touch display screen 305 may be at least two, respectively disposed on different surfaces of the terminal or in a folded design; in still other embodiments, the touch display 305 may be a flexible display disposed on a curved surface or a folded surface of the terminal. Even more, the touch display screen 305 may be arranged in an irregular pattern that is not rectangular, i.e., a shaped screen. The touch display 305 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 306 is used to capture images or video. Optionally, the camera assembly 306 includes a front camera and a rear camera. In general, a front camera is used for realizing video call or self-photographing, and a rear camera is used for realizing photographing of pictures or videos. In some embodiments, the number of the rear cameras is at least two, and the rear cameras are any one of a main camera, a depth camera and a wide-angle camera, so as to realize fusion of the main camera and the depth camera to realize a background blurring function, and fusion of the main camera and the wide-angle camera to realize a panoramic shooting function and a Virtual Reality (VR) shooting function. In some embodiments, camera assembly 306 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 307 is used to provide an audio interface between the user and the terminal. The audio circuit 307 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 301 for processing, or inputting the electric signals to the radio frequency circuit 304 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones can be respectively arranged at different parts of the terminal. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 301 or the radio frequency circuit 304 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 307 may also include a headphone jack.

The location component 308 is used to locate the current geographic location of the terminal to enable navigation or LBS (Location Based Service, location-based services). The positioning component 308 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, or the Galileo system of Russia.

The power supply 309 is used to power the various components in the terminal. The power source 309 may be alternating current, direct current, disposable or rechargeable. When the power source 309 comprises a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

Those skilled in the art will appreciate that the structure shown in fig. 4 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

Example IV

In an exemplary embodiment, a computer readable storage medium is also provided, on which a computer program is stored, which when executed by a processor implements a method for improving the visible and so-to-speak recognition rate of a vehicle end network application as provided by all the inventive embodiments of the present application.

Any combination of one or more computer readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

Example five

In an exemplary embodiment, an application program product is also provided that includes one or more instructions that are executable by the processor 301 of the apparatus to perform the above-described method of improving the visual and so-called recognition rate of a vehicle end-network application.

Although embodiments of the present application have been disclosed above, they are not limited to the use listed in the description and modes of implementation. It can be applied to various fields suitable for the present application. Additional modifications will readily occur to those skilled in the art. Therefore, the application is not to be limited to the specific details and illustrations shown and described herein, without departing from the general concepts defined in the claims and their equivalents.

Claims

1. A method for improving the visual and i.e. the recognition rate of a vehicle-end networking application, comprising the steps of:

step one, extracting visible control information;

step three, voice reception is carried out;

step four, recognizing text content corresponding to the voice;

step five, carrying out semantic arbitration;

2. The method for improving the visible and speaking recognition rate of the vehicle-end networking application according to claim 1, wherein in the fourth step, the recognition semantics are obtained through a cloud server or the semantics are recognized through a vehicle-end offline.

3. The method for improving the visible and speaking recognition rate of the vehicle-end networking application according to claim 2, wherein after the vehicle-end offline recognizes the semantics, multi-mode matching with priority order is performed on the semantic characters, and the action and control information is matched to be used as the vehicle-end offline recognition semantic result.

4. The method for improving the visible and speaking recognition rate of the vehicle-end networking application according to claim 2, wherein in the sixth step, an effective cloud server semantic result is received before blanking is finished, arbitration is finished, and corresponding operation is immediately executed according to the cloud server semantic result; before the arbitration is finished, receiving an effective vehicle-end semantic result, waiting, and if the vehicle-end semantic result is informed that the semantic result is invalid, finishing the arbitration, and immediately executing corresponding operation according to the vehicle-end semantic result; if the cloud server semantics do not return a result until the arbitration is overtime, immediately executing corresponding operation according to the vehicle-end semantic result.

5. A method of improving the visual i.e. the recognition rate of a vehicle end networking application according to claim 2, characterized in that only the executable semantics are considered as valid semantics.

6. The method for improving the visible and can-said recognition rate of the vehicle-end networking application according to claim 2, wherein the specific method for recognizing the semantics offline through the vehicle end is as follows:

analyzing the content of the voice converted text offline;

matching control information;

matching action information;

7. A device for improving the visual and so-to-speak recognition rate of a vehicle-end networking application, comprising:

the extraction module is used for extracting visible control information;

the sound receiving module is used for receiving sound;

the arbitration module is used for carrying out semantic arbitration;

8. A terminal, comprising:

one or more processors;

a memory for storing the one or more processor-executable instructions;

wherein the one or more processors are configured to:

a method of improving the visual and so-to-speak recognition rate of a vehicle end networking application as claimed in any one of claims 1 to 6.

9. A non-transitory computer readable storage medium, wherein instructions in the storage medium, when executed by a processor of a terminal, enable the terminal to perform a method of improving the visual so-to-speak recognition rate of a vehicle end networking application as claimed in any one of claims 1 to 6.