CN116013311A

CN116013311A - Edge intelligent voice recognition method and related equipment

Info

Publication number: CN116013311A
Application number: CN202211542006.2A
Authority: CN
Inventors: 邝先信
Original assignee: Shenzhen Oribo Technology Co Ltd
Current assignee: Shenzhen Oribo Technology Co Ltd
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2023-04-25

Abstract

The application provides an edge intelligent voice recognition method and related equipment, wherein the related equipment comprises an edge intelligent voice recognition device, electronic equipment and a storage medium, and the edge intelligent voice recognition method comprises the following steps: the off-line server receives voice data to be recognized; the offline server identifies the voice data to be identified and obtains an offline identification result; the offline server calculates the confidence coefficient of the offline identification result; the offline server compares the confidence coefficient with a preset first threshold value, if the confidence coefficient is higher than the preset first threshold value, the offline recognition result is used as a voice recognition result, and if the confidence coefficient is smaller than or equal to the preset first threshold value, the voice data to be recognized and the confidence coefficient are sent to an online server. The method can ensure the balance between the voice recognition efficiency and the accuracy by evaluating the accuracy of the offline recognition result to determine whether the online server needs to be started to process the voice data.

Description

Edge intelligent voice recognition method and related equipment

Technical Field

The application relates to the technical field of artificial intelligence and the internet of things, in particular to an edge intelligent voice recognition method and related equipment, wherein the related equipment comprises an edge intelligent voice recognition device, electronic equipment and a storage medium.

Background

With the development of internet of things technology, more and more enterprises tend to provide convenient services for clients by utilizing a voice recognition technology. For example, some internet of things providers tend to provide intelligent voice conversations to customers through voice recognition technology to enable customers to control edge smartdevices in the internet of things through voice commands; some car manufacturers tend to provide voice interaction devices inside the vehicle to facilitate the driver or passenger to control certain functions of the vehicle by voice to enhance driving safety.

Currently, received voice data is generally recognized by using a powerful online voice recognition server, however, the processing efficiency of the online voice recognition server may be low due to the fluctuation of the network signal, thereby negatively affecting the processing efficiency of the voice recognition task.

Disclosure of Invention

In view of the foregoing, it is necessary to provide an edge intelligent voice recognition method and related devices to solve the technical problem of how to improve the efficiency of voice recognition, wherein the related devices include an edge intelligent voice recognition device, an electronic device and a storage medium.

In a first aspect, an embodiment of the present application provides an edge intelligent speech recognition method, where the method includes:

the off-line server receives voice data to be recognized;

the off-line server identifies the voice data to be identified and obtains an off-line identification result;

the offline server calculates the confidence coefficient of the offline identification result;

the offline server compares the confidence coefficient with a preset first threshold value, if the confidence coefficient is higher than the preset first threshold value, the offline recognition result is used as a voice recognition result,

and if the confidence coefficient is smaller than or equal to a preset first threshold value, sending the voice data to be recognized and the confidence coefficient to the online server.

In some embodiments, the offline identification result includes a probability value of at least one byte corresponding to each byte, and the offline server calculates a confidence level of the offline identification result, which specifically includes:

the offline server calculates the mean value of all probability values as the confidence level of the offline identification result.

In a second aspect, an embodiment of the present application provides an edge intelligent speech recognition method, where the method includes:

the online server receives the voice data to be recognized and the confidence coefficient of the offline recognition result, and recognizes the voice data to be recognized to obtain the online recognition result;

if the online server obtains the online recognition result within the preset waiting time, the online recognition result is used as a voice recognition result,

if the online server does not acquire the online identification result within the preset waiting time, comparing the confidence coefficient with a preset second threshold value to formulate a retention strategy of the offline identification result.

In some embodiments, the step of comparing the confidence level with a preset second threshold value to formulate a retention policy for the offline identification result specifically includes:

if the confidence is higher than the preset second threshold, the offline recognition result is reserved as the voice recognition result,

if the confidence coefficient is smaller than or equal to a preset second threshold value, discarding the offline identification result and sending out an identification failure notification.

In a third aspect, an embodiment of the present application provides an edge intelligent voice recognition device, where the device includes an offline recognition module, where the offline recognition module is configured to:

receiving voice data to be recognized;

recognizing the voice data to be recognized and obtaining an offline recognition result;

calculating the confidence coefficient of the offline identification result;

comparing the confidence coefficient with a preset first threshold value, if the confidence coefficient is higher than the preset first threshold value, taking the offline recognition result as a voice recognition result,

In some embodiments, the offline identification result includes a probability value for at least one byte corresponding to each byte, and the offline identification module calculates a confidence level of the offline identification result, including:

the offline identification module calculates the average value of all probability values to be used as the confidence level of the offline identification result.

In a fourth aspect, an embodiment of the present application provides an edge intelligent voice recognition device, where the device includes an online recognition module, where the online recognition module is configured to:

receiving confidence degrees of the voice data to be recognized and the offline recognition result, and then recognizing the voice data to be recognized to obtain the online recognition result;

if the online recognition module obtains the online recognition result within the preset waiting time, the online recognition result is used as the voice recognition result,

if the online identification module does not acquire the online identification result within the preset waiting time, comparing the confidence coefficient with a preset second threshold value to formulate a retention strategy of the offline identification result.

In some embodiments, the online identification module compares the confidence level to a preset second threshold to formulate a retention policy for offline identification results, including:

In a fifth aspect, embodiments of the present application further provide an electronic device, including:

a memory storing computer readable instructions; a kind of electronic device with high-pressure air-conditioning system

And a processor executing computer readable instructions stored in the memory to implement the edge intelligent speech recognition method.

In a sixth aspect, embodiments of the present application further provide a computer-readable storage medium having computer-readable instructions stored therein, the computer-readable instructions being executable by a processor in an electronic device to implement an edge-based intelligent speech recognition method.

According to the edge intelligent voice recognition method, firstly, the offline server is utilized to process the voice data to be recognized, the offline recognition result is obtained, the confidence coefficient of the offline recognition result is calculated to evaluate the accuracy of the offline recognition result, and then the online server is started to perform online recognition processing on the voice data to be recognized according to the evaluation result, so that the processing efficiency of a voice recognition task and the accuracy of the voice recognition result can be balanced.

Drawings

Fig. 1 is a flowchart of a preferred embodiment of an edge-based intelligent speech recognition method according to the present application.

FIG. 2 is a flow chart of a preferred embodiment of another edge-based intelligent speech recognition method according to the present application.

FIG. 3 is a functional block diagram of a preferred embodiment of an edge-based intelligent speech recognition device according to the present application.

FIG. 4 is a functional block diagram of a preferred embodiment of another edge-based intelligent speech recognition device according to the present application.

Fig. 5 is a schematic structural diagram of an electronic device according to a preferred embodiment of the edge-based intelligent voice recognition method according to the present application.

Detailed Description

In order that the objects, features and advantages of the present application may be more clearly understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, the described embodiments are merely some, rather than all, of the embodiments of the present application.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

The embodiment of the application provides an edge intelligent voice recognition method, which can be applied to one or more electronic devices, wherein the electronic devices are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and hardware of the electronic devices comprises, but is not limited to, a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, an ASIC), a programmable gate array (Field-Programmable Gate Array, an FPGA), a digital processor (Digital Signal Processor, a DSP), an embedded device and the like.

The electronic device may be any electronic product that can interact with a user in a human-computer manner, such as a personal computer, tablet computer, smart phone, personal digital assistant (Personal Digital Assistant, PDA), game console, interactive internet protocol television (Internet Protocol Television, IPTV), smart wearable device, etc.

The electronic device may also include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network server, a server group composed of a plurality of network servers, or a Cloud based Cloud Computing (Cloud Computing) composed of a large number of hosts or network servers.

The network in which the electronic device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.

Example 1

FIG. 1 illustrates a flow chart of one embodiment of an edge-based intelligent speech recognition method of the present application. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs.

S101, the offline server receives voice data to be recognized.

In this alternative embodiment, the offline server refers to an offline speech recognition model, which refers to a program or collection of programs with automatic speech recognition (Auto Speech Recognition, ASR) functionality.

For example, the offline speech recognition model may be ESPNET, kaldi, wenet or other existing software or program frameworks with automatic speech recognition function, and the application does not limit the category of the offline speech recognition model.

In this optional embodiment, the function of the offline speech recognition model is to receive the speech data to be recognized, and automatically recognize the speech data to be recognized, so as to obtain semantic information corresponding to the speech data to be recognized, where the semantic information is text data corresponding to the speech data to be recognized.

The voice data to be recognized can be audio data such as dialogue audio between the edge intelligent device and a natural person in the internet of things environment, audio instructions sent by the natural person and the like and received by the edge intelligent device, and the specific content of the voice data to be recognized is not limited.

Therefore, the off-line server receives the voice data to be recognized, the voice data to be recognized can be ensured to be processed by the off-line voice recognition model preferentially, the voice data to be recognized is not required to be transmitted to the cloud for processing, the time consumption of voice recognition can be reduced, the efficiency of voice recognition is improved, the power consumption of voice recognition equipment can be reduced, and the occupation of bandwidth is reduced.

S102, the offline server identifies the voice data to be identified and obtains an offline identification result.

In some alternative embodiments, before the offline server identifies the voice data to be identified and obtains the offline identification result, the method further includes:

framing the voice data to be recognized to obtain at least one short-time frame;

and respectively inputting each short-time frame into an offline identification model operated in the offline identification server to obtain a probability value of bytes corresponding to each short-time frame and each byte.

In this alternative embodiment, the voice data to be recognized is a continuous waveform data, and to improve the accuracy of voice recognition and reduce the time consumption of voice recognition, the voice data to be recognized may be first divided into a plurality of short-time frames with the same time length, and the time length of the short-time frames is typically set to any value between 20 ms and 50 ms.

In this alternative embodiment, after obtaining a plurality of short-time frames, each short-time frame may be identified using an offline speech recognition model, and a probability value corresponding to each byte and to each byte corresponding to each short-time frame may be obtained. The probability value is used to characterize the probability that the short-time frame belongs to a byte, and a higher probability value indicates a higher probability that the short-time frame belongs to a byte. For example, when the voice data to be recognized is divided into 4 short-time frames and the byte corresponding to the first short-time frame is "speech", the probability value corresponding to the byte is 90%, the probability that the first short-time frame belongs to the byte "speech" is 90%.

In this alternative embodiment, the probability value of bytes corresponding to all short-time frames and each byte may be used as the offline identification result. Illustratively, when the probability of the first short-time frame corresponding to the byte "word" in the 4 short-time frames is 90%, the probability of the second short-time frame corresponding to the byte "sound" is 80%, the probability of the third short-time frame corresponding to the byte "identification" is 95%, and the probability of the fourth short-time frame corresponding to the byte "identification" is 85%, the offline speech recognition result includes 4 bytes and probability values corresponding to each byte, wherein the 4 bytes are "words", "sounds", "identifications", and the probability values corresponding to each byte are 90%, 80%, 95%, and 85%, respectively.

Therefore, the voice data to be recognized can be analyzed frame by performing frame processing on the voice data to be recognized to obtain a plurality of short-time frames, and the short-time frames are recognized to obtain the bytes corresponding to each short-time frame and the probability of each byte, so that the accuracy of voice recognition can be improved.

And S103, the offline server calculates the confidence of the offline identification result.

In an alternative embodiment, the offline identification result includes a probability value corresponding to each byte and the step of calculating the confidence level of the offline identification result by the offline server specifically includes:

For example, when the offline recognition result includes 4 bytes and probability values corresponding to each byte, and the 4 bytes are respectively "speech", "voice", "recognition", and "identification", the probability values corresponding to each byte are respectively 90%, 80%, 95%, and 85%, the confidence of the offline recognition result is: 87.5%.

In this alternative embodiment, the confidence level of the offline identification result is used to characterize the confidence level of the offline identification result, and the higher the confidence level, the more accurate the offline identification result.

Therefore, the confidence coefficient of the offline identification result is obtained by counting the average value of the probability values corresponding to all bytes in the offline identification result, and global information in the offline identification result can be synthesized, so that data support is provided for the subsequent evaluation of the accuracy of the offline identification result.

S104, the offline server compares the confidence coefficient with a preset first threshold value, if the confidence coefficient is higher than the preset first threshold value, the offline recognition result is used as a voice recognition result, and if the confidence coefficient is smaller than or equal to the preset first threshold value, the voice data to be recognized and the confidence coefficient are sent to an online server.

In this optional embodiment, the preset first threshold refers to a confidence level preset according to a service requirement, and the higher the preset first threshold is, the higher the requirement on the accuracy of the offline identification result is, which may be, for example, 80%, 85% or 90%, and the value of the preset first threshold is not limited in this application.

In this alternative embodiment, if the confidence coefficient is higher than the preset first threshold value, it indicates that the offline recognition result meets the confidence coefficient standard corresponding to the service requirement, and then the offline recognition result may be used as the voice recognition result. Illustratively, when the offline identification results in: the confidence is higher than the first threshold when the first threshold is 85%, and therefore the offline recognition result, namely, four bytes of the words, the voice and the recognition can be used as the voice recognition result.

In this optional embodiment, if the confidence coefficient of the offline recognition result is smaller than or equal to the preset first threshold, it indicates that the accuracy of the offline recognition result fails to meet the confidence coefficient standard corresponding to the service requirement, and the voice data to be recognized and the confidence coefficient may be sent to the online server for online recognition. The online server refers to an online voice recognition model, the online voice recognition model is stored in the cloud server, the online voice recognition model is used for recognizing received voice data to be recognized to obtain an online recognition result, the accuracy of the online recognition result is generally higher than that of the offline recognition result, but the online voice recognition model needs to acquire the voice data to be recognized through the internet and transmit the online recognition result, so that the process of recognizing the voice data to be recognized by using the online voice recognition model is often time-consuming.

Therefore, whether the offline recognition result meets the confidence coefficient standard specified by the service requirement or not is judged by comparing the confidence coefficient of the offline recognition result with the first threshold value, and whether the voice data to be recognized is transmitted to the online server or not is judged according to the comparison result so as to perform online recognition, so that the accuracy of voice recognition can be improved while the acquisition efficiency of the recognition result is ensured.

Example two

FIG. 2 illustrates a flow chart of another embodiment of an edge-based intelligent speech recognition method of the present application. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs.

S201, the online server receives the voice data to be recognized and the confidence of the offline recognition result, and recognizes the voice data to be recognized to obtain the online recognition result.

In this alternative embodiment, the online server refers to an online voice recognition model, and the online voice recognition model is stored in the cloud server, and its function is to recognize received voice data to be recognized so as to obtain an online recognition result. Illustratively, the online speech recognition model may be: ESPNET, kaldi, wenet, the present application does not limit the type of the online speech recognition model. Because the online voice recognition model needs to acquire the voice data to be recognized through the internet and transmit the online recognition result, the process of recognizing the voice data to be recognized by using the online voice recognition model often takes a relatively high time.

In this optional embodiment, the confidence level of the to-be-identified voice data and the offline identification result is received by the online server, where the to-be-identified voice data refers to a section of continuous waveform data, and the to-be-identified voice data may be audio data, such as dialogue audio between an edge intelligent device and a natural person in the internet of things environment, and an audio instruction sent by the natural person and received by the edge intelligent device.

In this optional embodiment, the offline recognition result refers to a recognition result generated after the voice data to be recognized is recognized by the offline server, the offline recognition result includes at least one first byte, each first byte corresponds to one first probability value, the confidence level of the offline recognition result refers to the mean value of all the first probability values, the confidence level of the offline recognition result is used for representing the accuracy of the offline recognition result, and the higher the confidence level is, the higher the accuracy of the offline recognition result is indicated.

In this optional embodiment, when the confidence coefficient of the offline recognition result is smaller than or equal to a preset first threshold, the online server is used to receive the voice data to be recognized and the confidence coefficient of the offline recognition result, so as to perform online recognition on the voice data to be recognized. The preset first threshold value refers to a confidence level standard preset according to service requirements, the preset first threshold value is used for evaluating the accuracy of the offline identification result, and the higher the preset first threshold value is, the higher the requirement on the accuracy of the offline identification result is.

Thus, after the offline recognition result is evaluated, the online server performs online recognition on the online recognition result so as to ensure that a voice recognition result with higher accuracy is obtained.

S202, if the online server obtains the online recognition result within the preset waiting time, the online recognition result is used as a voice recognition result, and if the online server does not obtain the online recognition result within the preset waiting time, the confidence level is compared with a preset second threshold value to formulate a retention strategy of the offline recognition result.

In this optional embodiment, the preset waiting duration refers to a preset time period, and the preset waiting duration may be 1.5 seconds, 2 seconds, 3 seconds, or the like. The preset waiting time is used for ensuring timeliness in the process of processing the voice data to be recognized by the online server.

In this alternative embodiment, if the online server obtains the online recognition result within the preset waiting duration, it indicates that the time for the online server to generate the online recognition result is short enough, and processing efficiency of the speech recognition task may be guaranteed, so that the online recognition result may be used as the speech recognition result, and the online recognition result includes at least one second byte and a second probability value corresponding to each second byte.

In this optional embodiment, if the online server does not obtain the online recognition result within the preset waiting duration, it indicates that the timeliness of the speech recognition task cannot be guaranteed, and the confidence level and the preset second threshold may be compared to formulate a retention policy of the offline recognition result.

In an alternative embodiment, the step of comparing the confidence level with a preset second threshold value to formulate a retention policy for the offline identification result specifically includes:

In this alternative embodiment, the preset second threshold refers to a preset value lower than the preset first threshold, where the second threshold is generally used as the lowest standard for characterizing the accuracy of the offline identification result, and, by way of example, when the preset first threshold is 90%, the preset second threshold may be 80%, 70% or 60%, and the specific value of the preset second threshold is not limited in this application.

In this alternative embodiment, if the confidence level of the offline recognition result is higher than the preset second threshold, it indicates that the accuracy of the offline recognition result, although not meeting the service requirement, meets the minimum standard, and the offline recognition result may still be used as the speech recognition result.

In this optional embodiment, if the confidence level of the offline recognition result is less than or equal to the preset second threshold, it indicates that the accuracy of the offline recognition result does not reach the minimum standard, the offline recognition result may be discarded, and a recognition failure notification may be sent to the offline speech recognition model and the operation and maintenance personnel of the online speech recognition model.

Therefore, the confidence coefficient of the offline recognition result is compared with a preset second threshold value to make a retention strategy of the offline recognition result, so that the processing efficiency of the voice recognition task and the accuracy of the voice recognition result can be balanced.

Fig. 3 is a functional block diagram of a preferred embodiment of an edge intelligent voice recognition device according to an embodiment of the present application. The edge intelligent speech recognition device 30 comprises an offline recognition module 301. The module/unit referred to in this application refers to a series of computer program segments capable of being executed by the processor 13 and of performing fixed functions, which are stored in the memory 12. In the present embodiment, the functions of the respective modules/units will be described in detail in the following embodiments.

The offline identification module 301 is configured to:

receiving voice data to be recognized;

identifying the voice data to be identified and obtaining an offline identification result;

calculating the confidence coefficient of the offline identification result;

and if the confidence coefficient is smaller than or equal to the preset first threshold value, the voice data to be recognized and the confidence coefficient are sent to an online server.

In an alternative embodiment, the offline identification result includes a probability value corresponding to each byte and the offline identification module 301 is further configured to:

and calculating the average value of all probability values to serve as the confidence degree of the offline identification result.

Example IV

Fig. 4 is a functional block diagram of an embodiment of an edge intelligent speech recognition device according to an embodiment of the present application. The edge intelligent speech recognition device 40 comprises an online recognition module 401. The module/unit referred to in this application refers to a series of computer program segments capable of being executed by the processor 13 and of performing fixed functions, which are stored in the memory 12. In the present embodiment, the functions of the respective modules/units will be described in detail in the following embodiments.

The online identification module 401 is configured to:

receiving the confidence coefficient of the voice data to be recognized and the offline recognition result, and then recognizing the voice data to be recognized to obtain the online recognition result;

if the online recognition module obtains the online recognition result within the preset waiting time, the online recognition result is used as a voice recognition result,

and if the online identification module does not acquire the online identification result within the preset waiting time, comparing the confidence coefficient with a preset second threshold value to formulate a retention strategy of the offline identification result.

In an alternative embodiment, the online identification module 401 compares the confidence level with a preset second threshold value to execute a retention policy of the offline identification result, including:

if the confidence is higher than a preset second threshold, the offline recognition result is reserved as a voice recognition result,

and if the confidence coefficient is smaller than or equal to the preset second threshold value, discarding the offline identification result and sending an identification failure notification.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 1 comprises a memory 12 and a processor 13. The memory 12 is used for storing computer readable instructions, and the processor 13 is used for executing the computer readable instructions stored in the memory to implement the edge intelligent speech recognition method of any of the above embodiments.

In an alternative embodiment, the electronic device 1 further comprises a bus, a computer program stored in the memory 12 and executable on the processor 13, such as an edge smart speech recognition program.

Fig. 5 shows only the electronic device 1 with the components 12-13, it will be understood by those skilled in the art that the structure shown in fig. 5 is not limiting of the electronic device 1 and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components.

In connection with fig. 1, the memory 12 in the electronic device 1 stores a plurality of computer readable instructions to implement an edge-smart speech recognition method, the processor 13 being executable to implement:

the off-line server receives voice data to be recognized;

the offline server identifies the voice data to be identified and obtains an offline identification result;

the offline server compares the confidence coefficient with a preset first threshold value, if the confidence coefficient is higher than the preset first threshold value, the offline recognition result is used as a voice recognition result, and if the confidence coefficient is smaller than or equal to the preset first threshold value, the voice data to be recognized and the confidence coefficient are sent to an online server.

Specifically, the specific implementation method of the above instructions by the processor 13 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.

In connection with fig. 2, the memory 12 in the electronic device 1 stores a plurality of computer readable instructions to implement another edge-smart speech recognition method, the processor 13 being executable to implement:

and if the online server obtains the online recognition result within the preset waiting time, taking the online recognition result as a voice recognition result, and if the online server does not obtain the online recognition result within the preset waiting time, comparing the confidence coefficient with a preset second threshold value to formulate a retention strategy of the offline recognition result.

Specifically, the specific implementation method of the above instruction by the processor 13 may refer to the description of the relevant steps in the corresponding embodiment of fig. 2, which is not repeated herein.

It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the electronic device 1 and does not constitute a limitation of the electronic device 1, the electronic device 1 may be a bus type structure, a star type structure, the electronic device 1 may further comprise more or less other hardware or software than illustrated, or a different arrangement of components, e.g. the electronic device 1 may further comprise an input-output device, a network access device, etc.

It should be noted that the electronic device 1 is only used as an example, and other electronic products that may be present in the present application or may be present in the future are also included in the scope of the present application and are incorporated herein by reference.

The memory 12 includes at least one type of readable storage medium, which may be non-volatile or volatile. The readable storage medium includes flash memory, a removable hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 12 may in some embodiments be an internal storage unit of the electronic device 1, such as a mobile hard disk of the electronic device 1. The memory 12 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 12 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 12 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of an edge smart speech recognition program, but also for temporarily storing data that has been output or is to be output.

The processor 13 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, a combination of various control chips, and the like. The processor 13 is a Control Unit (Control Unit) of the electronic device 1, connects the respective components of the entire electronic device 1 using various interfaces and lines, executes or executes programs or modules stored in the memory 12 (for example, executes an edge intelligent speech recognition program or the like), and invokes data stored in the memory 12 to perform various functions of the electronic device 1 and process data.

The processor 13 executes an operating system of the electronic device 1 and various types of applications installed. The processor 13 executes the application program to implement the steps in the various edge-intelligent speech recognition method embodiments described above, such as the steps shown in fig. 1.

The computer program may be divided into one or more modules/units, which are stored in the memory 12 and executed by the processor 13 to complete the present application, for example. The one or more modules/units may be a series of computer readable instruction segments capable of performing the specified functions, which instruction segments describe the execution of the computer program in the electronic device 1. For example, the computer program may be split into an offline identification module 301 and/or an online identification module 401.

The integrated units implemented in the form of software functional modules described above may be stored in a computer readable storage medium. The software functional modules are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a processor (processor) to perform portions of the edge intelligent speech recognition methods described in various embodiments of the present application.

The integrated modules/units of the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by instructing the relevant hardware device by a computer program, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each method embodiment described above when executed by a processor.

Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory, other memories, and the like.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.

The blockchain referred to in the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The bus may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one arrow is shown in FIG. 5, but only one bus or one type of bus is not shown. The bus is arranged to enable a connection communication between the memory 12 and the at least one processor 13 etc.

Although not shown, the electronic device 1 may further comprise a power source (such as a battery) for powering the various components, which may preferably be logically connected to the at least one processor 13 via a power management means, whereby the functions of charge management, discharge management, and power consumption management are achieved by the power management means. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which are not described herein.

Further, the electronic device 1 may also comprise a network interface, optionally comprising a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.

The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.

The embodiment of the application further provides a computer readable storage medium (not shown), in which computer readable instructions are stored, and the computer readable instructions are executed by a processor in an electronic device to implement the edge intelligent voice recognition method according to any one of the embodiments.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. Several of the elements or devices described in the specification may be embodied by one and the same item of software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above embodiments are merely for illustrating the technical solution of the present application and not for limiting, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application.

Claims

1. An edge intelligent speech recognition method, characterized in that the method comprises the following steps:

the off-line server receives voice data to be recognized;

2. The method for intelligent speech recognition according to claim 1, wherein the offline recognition result includes a probability value of at least one byte corresponding to each byte, and the step of calculating the confidence level of the offline recognition result by the offline server specifically includes:

and the offline server calculates the average value of all the probability values to be used as the confidence degree of the offline identification result.

3. An edge intelligent speech recognition method, characterized in that the method comprises the following steps:

and if the online server does not acquire the online identification result within the preset waiting time, comparing the confidence coefficient with a preset second threshold value to formulate a retention strategy of the offline identification result.

4. The method of claim 3, wherein the step of comparing the confidence level with a preset second threshold to formulate a retention policy for the offline recognition result specifically comprises:

5. An edge intelligent speech recognition device, characterized in that the device comprises an offline recognition module for:

receiving voice data to be recognized;

calculating the confidence coefficient of the offline identification result;

6. The edge-based intelligent speech recognition apparatus of claim 5, wherein the offline recognition result includes a probability value for at least one byte corresponding to each of the bytes, and wherein the offline recognition module calculates a confidence level of the offline recognition result, comprising:

and the offline identification module calculates the average value of all the probability values to be used as the confidence degree of the offline identification result.

7. An edge intelligent voice recognition device, which is characterized by comprising an online recognition module, wherein the online recognition module is used for:

8. The edge-based intelligent speech recognition device of claim 7, wherein the online recognition module compares the confidence level to a preset second threshold to formulate a retention policy for the offline recognition result, comprising:

9. An electronic device, the electronic device comprising:

A processor executing computer readable instructions stored in the memory to implement the edge-based intelligent speech recognition method of any one of claims 1 to 4.

10. A computer-readable storage medium, characterized by: the computer readable storage medium has stored therein computer readable instructions that are executed by a processor in an electronic device to implement the edge-based intelligent speech recognition method of any one of claims 1 to 4.