CN110415698B

CN110415698B - Artificial intelligence data detection method and device and storage medium

Info

Publication number: CN110415698B
Application number: CN201910809813.8A
Authority: CN
Inventors: 郑脊萌; 高毅; 黎韦伟; 于蒙
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-11-15
Filing date: 2018-11-15
Publication date: 2022-05-13
Anticipated expiration: 2038-11-15
Also published as: CN110164431B; CN110517680B; CN110517679B; CN110517680A; CN110517679A; CN110364162B; CN110364162A; CN110164431A; CN110415698A

Abstract

The embodiment of the invention provides an artificial intelligence data detection method, an artificial intelligence data detection device and a storage medium, wherein the method comprises the following steps: acquiring audio data to be detected; when a main detection passage and a backup detection passage of the detected detection passages are detected, the reset and start controller is used for controlling the reset of the main detection passage and controlling the reset and start of the backup detection passage to obtain a reset voice detection model of each detection passage; recognizing the audio data to be detected of the main detection channel and the backup detection channel by using the reset voice detection model to obtain a main detection result of the main detection channel and a backup detection result of the backup detection channel; and after the main detection result and the backup detection result are comprehensively processed, outputting a total detection result.

Description

Artificial intelligence data detection method and device and storage medium

Description of the cases

The present application is proposed based on the chinese patent application with application number 201811361659.4, application date 2018, 11/15/h, entitled audio data processing method and apparatus, and storage medium, and the scope of the present application is defined by the claims of the chinese patent application, and the entire content of the chinese patent application is incorporated herein by reference.

Technical Field

The invention relates to a voice recognition technology in the field of artificial intelligence, in particular to an artificial intelligence data detection method and device and a storage medium.

Background

With the increasing application of Artificial Intelligence (AI) in various fields, it becomes an important means for decision making and prediction in various fields. For example, common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, unmanned aerial vehicles, robots, smart medical services, smart customer service, and the like, it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and exert more and more important values.

At present, voice interaction intelligent equipment in the electronic field is mainly an intelligent sound box, for example, a smart television or a television box with a voice control function. One or more wake words are typically set in such voice interactive smart devices and the like. Detection of the wake-up word is generally performed using a Long Short Term Memory cell model (LSTM) as a wake-up detection model.

However, since an important feature of LSTM is the history accumulation property, that is, when speech recognition is performed by using LSTM, the detection result of a piece of speech data (e.g., speech data of a wakeup word) is not only related to the piece of speech data itself, but also greatly influenced by audio data preceding the piece of speech data. Therefore, after the data detection of the wake-up word for a period of noise accumulation, the accumulation of the noise data affects the detection performance of the following wake-up word, thereby causing a decrease in the accuracy of the data detection.

Disclosure of Invention

The embodiment of the invention provides an artificial intelligence data detection method and device and a storage medium, which can improve the accuracy of data detection.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides an artificial intelligence data detection method, which comprises the following steps:

acquiring audio data to be detected;

when a main detection passage and a backup detection passage of the detected detection passages are detected, the reset and start controller is used for controlling the reset of the main detection passage and controlling the reset and start of the backup detection passage to obtain a reset voice detection model of each detection passage;

recognizing the audio data to be detected of a main detection channel and a backup detection channel by using the reset voice detection model to obtain a main detection result of the main detection channel and a backup detection result of the backup detection channel;

and after the main detection result and the backup detection result are subjected to comprehensive processing, outputting a total detection result.

In the foregoing solution, when detecting a main detection path and a backup detection path of a detection path, the obtaining a post-reset voice detection model of each detection path by controlling the reset of the main detection path and the reset and the start of the backup detection path through the reset and start controller includes:

acquiring a voice detection model, wherein the voice detection model is a corresponding relation between audio data of at least one detection channel with history accumulation characteristics and a voice recognition result;

when the detected detection path backup comprises a main detection path and a backup detection path, acquiring a current time point;

determining the current time point as a reset time point of the backup detection path when the current time point reaches a preset preheating time point, wherein the preset preheating time point is a time point of a preset preheating time period before the preset reset time point;

when the reset time point is reached, controlling the reset and the start of the backup detection path through a reset and start controller, resetting the backup detection path, and obtaining a reset voice detection model of the backup detection path;

performing voice recognition by adopting the main detection channel and the backup detection channel;

and when the preset reset time point is reached after the preset preheating time period, the reset of the main detection passage is controlled by the reset and start controller, the main detection passage is reset, and the reset voice detection model of the main detection passage is obtained.

In the foregoing solution, when the preset reset time point is reached after the preset preheating time period, the controller is reset and started to control the reset of the main detection path, and the main detection path is reset, so as to obtain the reset voice detection model of the main detection path, and the method further includes:

and when the preset reset time point passes through the preset preheating time period, closing the backup detection passage, and performing voice recognition by adopting the main detection passage.

In the above scheme, after the voice detection model after the reset is used to identify the audio data to be detected of the main detection path and the backup detection path, and a main detection result of the main detection path and a backup detection result of the backup detection path are obtained, the method further includes:

acquiring a historical detection result before the current time point;

and when the variation range between the current detection result and the historical detection result meets a preset false wake-up range, determining the current time point as the reset time point.

acquiring a historical detection result before the current time point;

In the above solution, the performing voice recognition by using the main detection path and the backup detection path includes:

receiving audio data to be detected;

respectively carrying out voice recognition on the audio data to be detected by adopting the main detection channel and the backup detection channel to obtain a main detection result and a backup detection result;

comprehensively processing the main detection result and the backup detection result to obtain a total detection result;

and when the total detection result is larger than a preset awakening threshold, recognizing that the audio data to be detected is an awakening word, and starting an awakening function.

In the above solution, the performing voice recognition by using the main detection path includes:

receiving audio data to be detected;

performing voice recognition on the audio data to be detected by adopting the main detection channel to obtain a main detection result;

and when the main detection result is larger than a preset awakening threshold, recognizing that the audio data to be detected is an awakening word, and starting an awakening function.

The embodiment of the invention provides a data detection device for artificial intelligence, which comprises:

the acquisition unit is used for acquiring audio data to be detected;

the reset unit is used for controlling the reset of the main detection channel and controlling the reset and start of the backup detection channel through the reset and start controller when the main detection channel and the backup detection channel of the detected detection channels are detected, so as to obtain a reset voice detection model of each detection channel;

the identification unit is used for identifying the audio data to be detected of a main detection channel and a backup detection channel by using the reset voice detection model to obtain a main detection result of the main detection channel and a backup detection result of the backup detection channel; and after the main detection result and the backup detection result are subjected to comprehensive processing, outputting a total detection result.

In the above apparatus, the obtaining unit is further configured to obtain a speech detection model, where the speech detection model is a correspondence between audio data of at least one detection path having a history accumulation characteristic and a speech recognition result; when the detected detection path backup comprises a main detection path and a backup detection path, acquiring a current time point;

a determining unit, configured to determine, when the current time point reaches a preset preheating time point, the current time point as a resetting time point of the backup detection path, where the preset preheating time point is a time point of a preset preheating time period before a preset resetting time point;

the reset unit is also used for controlling the reset and the start of the backup detection path through the reset and start controller when the reset time point arrives, resetting the backup detection path and obtaining a reset voice detection model of the backup detection path;

the identification unit is also used for carrying out voice identification by adopting the main detection channel and the backup detection channel;

and the resetting unit is also used for controlling the resetting of the main detection passage through the resetting and starting controller when the preset resetting time point is reached after the preset preheating time period, resetting the main detection passage and obtaining the reset voice detection model of the main detection passage.

In the above device, the identification unit is further configured to, when the preset preheating time period elapses and the preset reset time point is reached, control the reset of the main detection path by the reset and start controller, reset the main detection path, obtain the reset voice detection model of the main detection path, and when the preset preheating time period elapses from the preset reset time point, close the backup detection path and perform voice identification using the main detection path.

In the device, the obtaining unit is further configured to identify the audio data to be detected of a main detection path and a backup detection path by using the reset voice detection model, and obtain a main detection result of the main detection path and a backup detection result of the backup detection path, and then obtain a historical detection result before a current time point;

the determining unit is further configured to determine that the current time point is the reset time point when a variation range between the current detection result and the historical detection result meets a preset false wake-up range.

In the above apparatus, the preset reset time points are time series spaced by a preset time length;

the preset time length is within the range of 2 times of the preset preheating time period and the preset tolerance awakening threshold value;

the preset tolerant awakening threshold value is between a preset optimal awakening upper limit value and a preset optimal false awakening lower limit value;

the preset preheating time period is greater than or equal to the preset awakening word duration.

In the above apparatus, the receiving unit is configured to receive audio data to be detected;

the identification unit is used for respectively carrying out voice identification on the audio data to be detected by adopting the main detection channel and the backup detection channel to obtain a main detection result and a backup detection result; comprehensively processing the main detection result and the backup detection result to obtain a total detection result; and when the total detection result is larger than a preset awakening threshold, recognizing that the audio data to be detected is an awakening word, and starting an awakening function.

the identification unit is further configured to perform voice identification on the audio data to be detected by using the main detection path to obtain a main detection result; and when the main detection result is larger than a preset awakening threshold, recognizing that the audio data to be detected is an awakening word, and starting an awakening function.

The embodiment of the invention provides an artificial intelligence data detection device, which comprises:

a memory to store executable data detection instructions;

and the processor is used for realizing the artificial intelligence data detection method provided by the embodiment of the invention when the executable data detection instruction stored in the memory is executed.

The embodiment of the invention provides a computer-readable storage medium, which stores executable data detection instructions and is used for causing a processor to execute the executable data detection instructions so as to realize the artificial intelligence data detection method provided by the embodiment of the invention.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides an artificial intelligence data detection method, an artificial intelligence data detection device and a storage medium, wherein audio data to be detected are acquired; when a main detection passage and a backup detection passage of the detected detection passages are detected, the reset and start controller is used for controlling the reset of the main detection passage and controlling the reset and start of the backup detection passage to obtain a reset voice detection model of each detection passage; recognizing the audio data to be detected of the main detection channel and the backup detection channel by using the reset voice detection model to obtain a main detection result of the main detection channel and a backup detection result of the backup detection channel; and after the main detection result and the backup detection result are comprehensively processed, outputting a total detection result. With the above technical implementation, since the artificial intelligence data detection apparatus can determine the determination of the reset operation of each detection channel in the voice detection model for the number of at least one detection channel of different voice detection models, so that a further determination based on the reference object is the reset time point, that is, for each detection channel of the voice detection model, the determination of the respective reset time point can be achieved through the determination of different reference objects, and the reset time point is the time point at which the history accumulation in the voice detection model is initialized under the condition of ensuring the voice recognition performance, then at the reset time point, if the voice detection model of each detection channel is reset, the reset voice detection model has no history memory, so that, under the condition that the voice detection model is not affected by the long-time history accumulation characteristic, the accuracy rate of data detection by the reset voice detection model of each detection channel is improved.

Drawings

FIG. 1 is an alternative structural diagram of an artificial intelligence data detection system architecture provided by an embodiment of the present invention;

fig. 2 is an alternative structural diagram of a terminal according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an alternative structure of an artificial intelligence data detection apparatus provided by an embodiment of the present invention;

FIG. 4 is a first flowchart illustrating an alternative artificial intelligence data detection method according to an embodiment of the present invention;

fig. 5A is a first scenario diagram of an exemplary wake word detection provided in an embodiment of the present invention;

fig. 5B is a schematic view of an exemplary wake word detection scenario according to an embodiment of the present invention;

FIG. 6 is a block diagram of an exemplary LSTM memory cell provided by an embodiment of the present invention;

FIG. 7 is a schematic diagram of an alternative flow chart of a data detection method using artificial intelligence according to an embodiment of the present invention;

FIG. 8 is a schematic flow chart diagram III illustrating an alternative method for artificial intelligence data detection according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of an exemplary speech recognition scenario with at least two detection paths provided by an embodiment of the present invention;

FIG. 10 is a graph illustrating an exemplary first wake-up success rate versus standby time according to an embodiment of the present invention;

fig. 11 is a timing diagram of an exemplary active/standby detection path according to an embodiment of the present invention;

FIG. 12 is a schematic flow chart diagram of an alternative method for artificial intelligence data detection according to an embodiment of the present invention;

FIG. 13 is a diagram illustrating an exemplary multi-directional branch speech detection scenario provided by an embodiment of the present invention;

FIG. 14 is a diagram illustrating an exemplary multi-directional branch speech detection scenario, according to an embodiment of the present invention;

FIG. 15 is a third exemplary multi-directional branch speech detection scenario provided by an embodiment of the present invention;

FIG. 16 is a diagram illustrating an exemplary multi-directional branch speech detection scenario, four, provided by an embodiment of the present invention;

FIG. 17 is a diagram illustrating an exemplary speech recognition scenario provided by an embodiment of the present invention;

fig. 18 is a schematic diagram of an exemplary speech recognition scenario provided in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) The awakening word is a keyword for starting the voice interaction intelligent equipment, and in the embodiment of the invention, the awakening word is a voice signal corresponding to the keyword for starting the artificial intelligent data detection device.

2) And (3) feature extraction: the original features are converted into a set of features with obvious physical significance (Gabor, geometric features [ corner points, invariant ], texture [ LBP HOG ], etc.) or statistical significance or kernel. The feature extraction in the embodiment of the present invention refers to extracting feature quantities of important audio information from audio data.

3) The Long Short Term Memory unit model (LSTM) is a time-recursive neural network that can selectively memorize history information (history accumulation characteristics). The method is further improved on the basis of the RNN model, and the LSTM is formed by replacing hidden layer nodes in the RNN with LSTM units.

4) And model training: and inputting the manually selected samples into a machine learning system, and continuously adjusting model parameters to ensure that the accuracy of the final model for sample identification is optimal.

5) Machine Learning (ML): based on theories such as probability theory, statistics, nerve propagation and the like, the computer can simulate the learning behavior of human beings so as to obtain new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer.

6) Artificial intelligence: the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge.

It should be noted that artificial intelligence is a comprehensive technique in computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The following describes an exemplary application of the artificial intelligence data detection apparatus according to the embodiment of the present invention, and the artificial intelligence data detection apparatus according to the embodiment of the present invention may be implemented as various types of user terminals having voice recognition or data detection functions, such as a smart phone, a tablet computer, a notebook computer, and a voice interaction smart device (e.g., a smart speaker), and may also be implemented as a server, where the server is a background server running the data detection function or the application of the voice recognition function. In the following, exemplary applications of the artificial intelligence data detection apparatus when implemented as a terminal, encompassing the terminal, will be explained.

Referring to fig. 1, fig. 1 is an alternative architecture diagram of an artificial intelligence data detection system 100 according to an embodiment of the present invention, in order to support an exemplary application, a terminal 400 (an exemplary terminal 400-1 and a terminal 400-2 are shown) is connected to a server 300 through a network 200, where the network 200 may be a wide area network or a local area network, or a combination of the two, and data transmission is implemented using a wireless link.

The terminal 400 is configured to obtain a voice detection model, where the voice detection model is a corresponding relationship between audio data of at least one detection channel with a history accumulation characteristic and a voice recognition result; determining a reference object based on the detected number of at least one detection pass; the reference object is a factor for judging the resetting operation; determining a reset time point based on the reference object, wherein the reset time point is a moment of initializing historical accumulation in the voice detection model under the condition of ensuring the voice recognition performance; when the reset time point arrives, resetting the voice detection model to obtain a reset voice detection model; and performing voice recognition on the acquired audio data to be detected by adopting the reset voice detection model, determining whether to perform a wake-up function, receiving the audio data to be detected when the wake-up function is determined, performing voice recognition on the audio data to be detected to obtain a functional voice instruction, and sending the functional voice instruction to the server 300.

And the server 300 is configured to generate a function triggering instruction according to the function voice instruction, and control the terminal 400 or other terminals to implement a function triggered by the function voice instruction according to the function triggering instruction.

The artificial intelligence data detection device provided by the embodiment of the present invention may be implemented as hardware or a combination of hardware and software, and various exemplary implementations of the device provided by the embodiment of the present invention are described below.

Referring to fig. 2, fig. 2 is a schematic diagram of an alternative structure of a terminal 400 according to an embodiment of the present invention, where the terminal 400 may be a mobile phone, a computer, a digital broadcast terminal, an audio data transceiver, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and so on, and an exemplary structure of an artificial intelligence data detection apparatus implemented as the terminal may be foreseen according to the structure of the terminal 400, and therefore, the structure described herein should not be considered as a limitation, for example, some components described below may be omitted, or components not described below may be added to adapt to special requirements of some applications.

The terminal 400 shown in fig. 2 includes: at least one processor 410, memory 440, at least one network interface 420, and a user interface 430. The various components in the terminal 400 are coupled together by a bus system 450. It is understood that the bus system 450 is used to enable connected communication between these components. The bus system 450 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 450 in fig. 2.

The user interface 430 may include a display, keyboard, mouse, trackball, click wheel, keys, buttons, touch pad or touch screen, etc.

Memory 440 may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), a Flash Memory (Flash Memory), and the like. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM). The memory 440 described in connection with the embodiments of the invention is intended to comprise these and any other suitable types of memory.

The memory 440 in the embodiment of the present invention can store data to support the operation of the terminal 400. Examples of such data include: any computer programs for operating on the terminal 400, such as an operating system 442 and executable programs 441. The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The executable program may contain various application programs, such as executable data detection instructions.

As an example of the artificial intelligence data detection method provided by the embodiment of the present invention implemented by combining software and hardware, the artificial intelligence data detection method provided by the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 410, the software modules may be located in a storage medium, the storage medium is located in the memory 440, the processor 410 reads executable data detection instructions included in the software modules in the memory 440, and the artificial intelligence data detection method provided by the embodiment of the present invention is completed in combination with necessary hardware (for example, including the processor 410 and other components connected to the bus 450).

By way of example, the Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor or the like.

Illustratively, an embodiment of the present invention provides an artificial intelligence data detection apparatus, including at least:

a memory 440 for storing executable data detection instructions;

the processor 410 is configured to implement the artificial intelligence data detection method provided by the embodiment of the present invention when executing the executable data detection instruction stored in the memory 440.

An exemplary structure of software modules is described below, and in some embodiments, as shown in fig. 3, the software modules in the artificial intelligence data detection apparatus 1 may include: an acquisition unit 10, a determination unit 11, and a reset unit 12; wherein, the first and the second end of the pipe are connected with each other,

an obtaining unit 10, configured to obtain a voice detection model, where the voice detection model is a correspondence between audio data of at least one detection path having a history accumulation characteristic and a voice recognition result;

a determination unit 11 for determining a reference object based on the detected number of the at least one detection pass; the reference object is a factor for judging the resetting operation; determining a reset time point based on the reference object, wherein the reset time point is a time point for initializing history accumulation in the voice detection model under the condition of ensuring the voice recognition performance;

a resetting unit 12, configured to reset the speech detection model when the resetting time point arrives.

In some embodiments of the present invention, the determining unit 11 is further configured to determine that the reference object is a current detection result when the number of detected detection paths is one.

In some embodiments of the present invention, the determining unit 11 is further configured to determine that the reference object is the current time point when the number of detected detection paths is greater than one.

In some embodiments of the present invention, the obtaining unit 10 is further configured to obtain audio data to be detected; recognizing the audio data to be detected by using the voice detection model to obtain a current detection result;

the determining unit 11 is further specifically configured to determine, when the current detection result meets a preset reset threshold, that the current time point is the reset time point; and the preset reset threshold is greater than or equal to the preset awakening threshold.

In some embodiments of the present invention, the obtaining unit 10 is further configured to identify the audio data to be detected by using the voice detection model, and obtain a historical detection result before a current time point after obtaining a current detection result;

the determining unit 11 is further configured to determine that the current time point is the reset time point when a variation range between the current detection result and the historical detection result satisfies a preset false wake-up range.

In some embodiments of the invention, the at least one detection path comprises: backing up a detection path;

the acquiring unit 10 is further configured to acquire a current time point;

the determining unit 11 is further configured to determine the current time point as a reset time point of the backup detection path when the current time point reaches a preset preheating time point, where the preset preheating time point is a time point of a preset preheating time period before starting from a preset reset time point.

In some embodiments of the present invention, the reset unit 12 is specifically configured to reset and start the backup detection path when the current time point reaches a preset preheating time point.

In some embodiments of the invention, the at least one detection path further comprises: a main detection path; the artificial intelligence data detection device 1 further comprises an identification unit 13 and a closing unit 14;

the identification unit 13 is configured to perform voice identification by using the main detection path and the backup detection path after the resetting and starting of the backup detection path;

the reset unit 12 is further specifically configured to reset the main detection path when the preset reset time point is reached after the preset preheating time period elapses;

the closing unit 14 is configured to close the backup detection path when the preset warm-up time period elapses from the preset reset time point,

the recognition unit 13 is further configured to perform voice recognition by using the main detection path.

In some embodiments of the present invention, the preset reset time points are time sequences separated by preset time lengths;

In some embodiments of the present invention, the artificial intelligence data detection apparatus 1 further comprises a receiving unit 15 and an integrated processing unit 16;

the receiving unit 15 is configured to receive audio data to be detected;

the recognition unit 13 is specifically configured to perform voice recognition on the audio data to be detected by using the main detection path to obtain a main detection result; and when the main detection result is larger than a preset awakening threshold, recognizing that the audio data to be detected is an awakening word, and starting an awakening function.

In some embodiments of the invention, the artificial intelligence data detection apparatus 1 further comprises an identification unit 13;

the recognition unit 13 is configured to, after the voice detection model is reset at the reset time point, perform voice recognition by using the reset voice detection model.

In some embodiments of the invention, the artificial intelligence data detection apparatus 1 further comprises an integrated processing unit 16;

the recognition unit 13 is specifically configured to, in the voice detection based on at least one direction branch, respectively perform voice recognition on the at least one direction branch according to the reset voice detection model to obtain at least one current detection result;

the comprehensive processing unit 16 is configured to perform comprehensive processing on the at least one current detection result to obtain a comprehensive detection result;

the identification unit 13 is further specifically configured to identify a wake-up word and start a wake-up function when the comprehensive detection result is greater than a preset wake-up threshold.

In some embodiments of the present invention, the resetting unit 12 is specifically configured to initialize data with a history accumulation characteristic in the speech detection model when the resetting time point arrives, so as to obtain a reset speech detection model.

In practical applications, the obtaining unit 10, the determining unit 11, the resetting unit 12, the identifying unit 13, the closing unit 14 and the comprehensive processing unit 16 may be implemented by a processor, and the receiving unit 15 may be implemented by a user interface, which is not limited in the embodiment of the present invention.

As an example of the artificial intelligence data detection method provided by the embodiment of the present invention implemented by hardware, the artificial intelligence data detection method provided by the embodiment of the present invention may be implemented by directly using the processor 410 in the form of a hardware decoding processor, for example, by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

The data detection method for implementing artificial intelligence according to an embodiment of the present invention will be described below with reference to the foregoing exemplary application and implementation of the data detection apparatus for implementing artificial intelligence according to an embodiment of the present invention.

Referring to fig. 4, fig. 4 is an alternative flow chart of the artificial intelligence data detection method provided by the embodiment of the invention, which will be described with reference to the steps shown in fig. 4.

S101, obtaining a voice detection model, wherein the voice detection model is a corresponding relation between audio data of at least one detection channel with history accumulation characteristics and a voice recognition result.

S102, determining a reference object based on the number of the detected at least one detection path; the reference object is a factor for making a judgment of the reset operation.

S103, determining a reset time point based on the reference object, wherein the reset time point is the time when the history accumulation in the voice detection model is initialized under the condition of ensuring the voice recognition performance.

And S104, resetting the voice detection model when the reset time point arrives.

The artificial intelligence data detection method provided by the embodiment of the invention is applied to a voice detection or voice recognition scene, such as a wake-up word detection scene, and the embodiment of the invention is not limited.

An example description of the artificial intelligence data detection method provided by the embodiment of the present invention is given below by taking a wakeup word detection scenario as an example.

In the embodiment of the present invention, in the wakeup word detection model shown in fig. 5A, the artificial intelligence data detection apparatus receives audio data to be detected in real time, inputs the received audio data to be detected into the wakeup word detection model (i.e., the voice detection model) for recognition, outputs a wakeup word detection result, and determines whether to wake up the artificial intelligence data detection apparatus according to the detection result.

For example, the audio data to be detected may be a monaural continuous signal (a continuous time domain signal or a continuous frequency domain signal, but the embodiment of the present invention is not limited thereto), and the monaural continuous signal is often sent to the wake-up word detection model in units of frames. After obtaining each frame of input continuous signals, the awakening word detection model detects/judges whether predefined awakening words appear in the latest T time window, namely whether the predefined awakening words are preset awakening words or not. And finally, outputting a detection result by frames from the awakening word detection model.

It should be noted that, in the embodiment of the present invention, an output form of the detection result is not limited, and may be a specific score, and may be two identification forms, such as binary representation or text result representation, which may or may not be a wakeup word, and the embodiment of the present invention is not limited.

Illustratively, the detection result is represented by binary, and the output 1 represents that the awakening word is detected in the T time window; output 0: no wake-up word is detected within the T time window.

In the embodiment of the present invention, based on the scenario similar to speech recognition shown in fig. 5A, a method for resetting a speech detection model is provided, so that during subsequent audio data detection using the reset speech detection model with better detection or recognition effect, the level of higher recognition accuracy can be maintained.

Here, the data detection device of artificial intelligence performs speech recognition using a speech detection model, where the speech detection model is a correspondence between audio data of at least one detection path having a history accumulation characteristic and a speech recognition result. The artificial intelligence data detection device needs to acquire a voice detection model firstly, and one or more paths capable of performing voice recognition detection in the voice detection model can be provided, so that the artificial intelligence data detection device needs to detect the detection paths of the voice detection model firstly, and after at least one detection path is detected, corresponding reference objects under the conditions of different detection paths are respectively determined based on the number of the at least one detection path; the reference object is a factor for performing the reset operation judgment, and is data or a characteristic that ensures that the accuracy of detecting the wakeup word can be maintained when the model is reset according to the reset time point determined by the reset judgment. Then, after obtaining the reference object, the data detection apparatus with artificial intelligence may determine, based on the reference object corresponding to the different type 2 detection path, that a reset time point in the case of the different type detection path is reached, where the reset time point is a time point at which history accumulation in the speech detection model is initialized while ensuring speech recognition performance. And when the reset time point arrives, resetting the voice detection model to obtain a reset voice detection model.

In some embodiments of the present invention, the specific reset procedure in the embodiments of the present invention is: and when the reset time point arrives, initializing the data with the history accumulation characteristic in the voice detection model to obtain the reset voice detection model.

In some embodiments of the present invention, when the number of detected detection passes is one, the reference object is determined as the current detection result.

In some embodiments of the present invention, when the number of detected detection passes is more than one, the reference object is determined as the current time point.

That is, in the embodiment of the present invention, the cases of different kinds of detection paths can be classified into the case of one detection path and the case of at least two (i.e., more than one) detection paths. In the case of one detection path, the artificial intelligence data detection device judges the reset time point of the reset voice detection model based on the current detection result; in the case of at least two detection paths, the artificial intelligence data detection apparatus determines the reset time point for resetting the voice detection model based on the current time point, and specifically determines the reset time point according to the current time point and a preset reset time condition, which will be described in detail in the following embodiments.

It can be understood that, since the artificial intelligence data detecting apparatus can determine the determination of the reset operation in the voice detection model for the number of at least one detection path of different voice detection models, so as to further determine the reset time point based on the reference object, that is, for different detection paths of the voice detection model, the determination of the respective reset time points can be realized by the determination of different reference objects, and the reset time point is the time point of initializing the history accumulation in the voice detection model under the condition of ensuring the voice recognition performance, then if the voice detection model is reset at the reset time point, the reset voice detection model has no history memory, so that the voice detection model is not influenced by the long-time history accumulation characteristic under the condition that the reset time point ensures the wake-up performance, the accuracy of speech recognition is improved when speech recognition of the wake-up word is performed.

In some embodiments of the present invention, after S104, after the voice detection model is reset, the artificial intelligence data detection apparatus may perform voice recognition by using the reset voice detection model, so that the recognition accuracy of the obtained detection result is good.

It should be noted that, in the embodiment of the present invention, the speech detection model is a speech recognition model with history accumulation characteristics, for example, LSTM.

In an embodiment of the present invention, LSTM is a time-recursive neural network that can selectively memorize historical information (history accumulation characteristics). The method is further improved on the basis of the RNN model, and the LSTM is formed by replacing hidden layer nodes in the RNN with LSTM units.

The Memory Cell (i.e. core gate) state of an LSTM Cell is gated by 3 gates, namely an input gate, a forgetting gate and an output gate.

Wherein the input gate selectively inputs the current data to the memory cell; influence of forgetting to control historical information on the current memory unit state value is avoided; the output gate is used for selectively outputting the state value of the memory unit. The design of 3 gates and independent memory unit makes the LSTM unit possess the functions of saving, reading, resetting and updating long-distance history information. Illustratively, fig. 6 shows the structure of an LSTM memory Cell.

First, a feature x is input at time t_tAnd t-1 time hidden layer variable h_t-1In the weight transfer matrices W and U, and offsetGenerating a state quantity i at the time t under the combined action of the vector b_t、f_tAnd o_tSee formulas (1) to (3). Further at time t-1 core gate state quantity c_t-1With the aid of (3), a core gate state quantity c at time t is generated_tSee equation (4). Finally, at time t, the core gate state quantity c_tAnd output gate state quantity o_tUnder the action of (1), generating a hidden layer variable h at the time t_tAnd further influences the internal changes of the LSTM neurons at time t +1, see equation (5).

i_t＝σ(W_ix_t+U_ih_t-1+b_i) (1)

f_t＝σ(W_fx_t+U_fh_t-1+b_f) (2)

o_t＝σ(W_ox_t+U_oh_t-1+b_o) (3)

c_t＝f_t*c_t-1+i_t*φ(W_cx_t+U_ch_t-1+b_c) (4)

h_t＝o_t*φ(c_t) (5)

Wherein the two nonlinear activation functions are respectively

And phi (x)_t)＝tanh(x_t)。

i_t、f_t、o_tAnd c_tAnd respectively representing an input gate state value, a forgetting gate state value, an output gate state value and a core gate state value at the time t. In an embodiment of the invention, W is for each logic gate_i、W_f、W_oAnd W_cRespectively representing the weight transfer matrixes corresponding to the input gate, the forgetting gate, the output gate and the core gate; u shape_i、U_f、U_oAnd U_cRespectively representing t-1 moments corresponding to the input gate, the forgetting gate, the output gate and the core gateHidden layer variable h_t-1Corresponding weight transfer matrix, b_i、b_f、b_oAnd b_cIt represents the offset vectors corresponding to the input gate, the forgetting gate, the output gate and the core gate.

For example, because the LSTM has a history memory (which can be understood as a history accumulation characteristic), when voice detection or voice recognition is performed on audio data to be detected, a detection result is output under the influence of history detection data, and the history memory is limited, so that the detection result cannot exist without limitation, and in a time length in which the history memory exists, as a standby time of the artificial intelligent data detection apparatus increases, a false wake-up performance is higher and higher, that is, a false wake-up probability is higher and higher. The specific resetting process is that the data with history memory stored in the voice detection model is initialized and cleaned by the artificial intelligence data detection device at the resetting time point, so that the reset voice detection model is not influenced by the history memory in long-time standby.

In some embodiments of the present invention, when the number of detected detection paths is one, the reference object is a current detection result, and the artificial intelligence data detection apparatus performs a model resetting process in the voice recognition process, referring to fig. 7, where fig. 7 is an optional flowchart of the artificial intelligence data detection method provided in the embodiment of the present invention, and after S102, S201 to S205 may also be performed. The following were used:

s201, audio data to be detected are obtained.

S202, identifying the audio data to be detected by using the voice detection model to obtain a current detection result.

And S203, when the current detection result meets a preset reset threshold, determining that the current time point is a reset time point.

And the preset reset threshold is greater than or equal to the preset awakening threshold.

In the embodiment of the invention, when the number of the detected detection paths is one, the reference object is the current detection result, and the artificial intelligence data detection device resets the model in the process of voice recognition.

In S201, the artificial intelligence data detection device acquires or receives audio data to be detected in real time.

In the embodiment of the present invention, the audio data to be detected may be noise or noise in the received outside environment or may be a continuous signal input by a user or another sound generating device, which is obtained in real time.

In S202, after the data detection device of artificial intelligence receives the audio data to be detected, because the data detection device of artificial intelligence is provided with the voice detection model, the data detection device of artificial intelligence can perform voice recognition on the audio data to be detected by using the voice detection model, and then output the current detection result.

In the embodiment of the invention, in the process of carrying out voice detection on the audio data to be detected by the artificial intelligent data detection device, the artificial intelligent data detection device needs to firstly carry out audio feature extraction on the audio data to be detected and input the audio features into the voice detection device, so that the current detection result is output.

In some embodiments of the invention, the manner of feature extraction comprises: SPP feature extraction, mel-frequency cepstrum coefficient feature, and the like, which are not limited in the embodiments of the present invention.

The detection result in the embodiment of the present invention may be a score, or may be identification information (e.g., 0, 1), and the embodiment of the present invention is not limited.

In S203, the preset reset threshold is a value consistent with the type of the current detection result, that is, the preset reset threshold is data that can be compared with the current detection result. In the embodiment of the present invention, the artificial intelligent data detection apparatus compares the current detection result with the preset reset threshold, and when the current detection result satisfies the preset reset threshold, the representation can reset the voice detection model at this time, so that the artificial intelligent data detection apparatus obtains the current time point and determines that the current time point is the reset time point. And the preset reset threshold is greater than or equal to the preset awakening threshold.

In the embodiment of the invention, a preset reset threshold represents a numerical value lower limit value capable of resetting the voice detection model, or represents a numerical value range capable of resetting the voice detection model; when the current detection result meets the numerical value lower limit value of the voice detection model reset or belongs to the numerical value range of the voice detection model reset, the representation can reset the voice detection model.

It should be noted that, in the embodiment of the present invention, when the method is applied to the wake-up word detection scenario shown in fig. 5B, an artificial intelligence data detection device obtains audio data to be detected, identifies the audio data to be detected by using a wake-up word detection model (voice detection model), obtains a current detection result, performs reset judgment according to the current detection result, determines that the current time point is a reset time point when the current detection result meets a preset reset threshold, and resets a wake-up word detection algorithm at the reset time point.

The preset reset threshold is necessarily larger than the preset wake-up threshold. The preset wake-up threshold is a threshold value which can be used for performing the wake-up function of the data detection device and is determined based on the detection result.

It should be noted that, in the embodiment of the present invention, the current detection result is required to be used for both the reset determination and the wake-up determination.

In the embodiment of the present invention, when the current detection result exceeds the preset reset threshold, the voice detection model (wake-up word detection algorithm) is reset. It can be understood that when the preset reset threshold is selected to be greater than or equal to the wakeup threshold, the reset operation always follows the wakeup judgment, so that the problem of resetting in the process of detecting the wakeup words can be avoided, and the problems of errors and accuracy of voice recognition are caused.

Illustratively, the data detection device of artificial intelligence carries out voice detection on the audio data 1 to obtain a detection result of 85 minutes, while the preset reset threshold is 90 minutes, and the preset wakeup threshold is 80 minutes, that is, in the detection of this time, the data detection device of artificial intelligence meets the wakeup decision, is awakened, does not meet the reset threshold, and does not need to reset the voice detection model, but if the detection result is 95 minutes, the value of the detection result detected by the voice detection model slowly rises to finally obtain 95 minutes, then the wakeup judgment is carried out when the detection result increases to 80 minutes, the data detection device of artificial intelligence is awakened, then continues to increase until exceeding 90 minutes, the judgment that the voice detection model needs to be reset is carried out, and at this time, the judgment of wakeup is completed; if the preset reset threshold is smaller than the preset awakening threshold, the voice detection model is reset all the time when the awakening condition is not reached, the condition of mistaken reset is generated, and the problem that the voice detection model can be reset in the awakening word detection process is avoided.

It should be noted that the preset reset threshold is consistent with the preset wake-up threshold in type, and the specific numerical value is not limited in the embodiment of the present invention.

It should be noted that the above-mentioned setting of the reset time point is best applied to a usage scenario in which a user needs to perform a plurality of wake-up operations in a short time.

In the embodiment of the invention, if a user needs to perform a plurality of awakening operations in a short time, after the score (current detection result) output by a voice detection model of the awakening word (audio data to be detected) of the user by the artificial intelligent data detection device successfully exceeds a preset reset threshold once, the next awakening operation or awakening judgment can obtain the optimal awakening performance response (because the awakening performance after each reset is optimal); meanwhile, because the voice detection model is reset, the following awakening words can obtain higher scores more easily, and the high scores promote the purpose of easily achieving the preset reset threshold, namely, the voice detection model is easier to trigger to be reset again.

Meanwhile, in the aspect of false awakening, if the preset reset threshold is high enough (greater than or equal to the preset awakening threshold adopted by awakening), the probability of resetting of the voice detection model caused by noise in the standby process of the artificial intelligent data detection device is very small; moreover, because the expected average value of the time length from the initialization to the first false wake-up of the voice detection model is far greater than the time before the false wake-up performance reaches the optimal state after each reset, the probability of false wake-up or false reset is very low, even if the artificial intelligent data detection device is mistakenly woken up and mistakenly reset by noise in the standby process, the wake-up performance of the artificial intelligent data detection device cannot be obviously damaged, and the accuracy of voice recognition such as wake-up operation can be improved.

And S204, acquiring a historical detection result before the current time point.

And S205, when the variation range between the current detection result and the historical detection result meets a preset false wake-up range, determining the current time point as a reset time point.

In S204, the artificial intelligence data detection device acquires the audio data to be detected in real time, so that the audio detection device can perform voice detection or voice recognition in real time, and the artificial intelligence data detection device can acquire many detection results. Then the artificial intelligence data detection device has performed many times of voice detection before the current time point, and thus the artificial intelligence data detection device can obtain the historical detection results before the current time point.

Illustratively, the artificial intelligence data detection device obtains 50 historical detection results of 50 voice detections before the time t.

In some embodiments of the present invention, the artificial intelligence data detecting device may further obtain all detection results in a preset time period before the current time point, as historical detection results, and the specific implementation manner is not limited in the embodiments of the present invention.

In S205, the data detection apparatus with artificial intelligence may count whether the change of the detection result is severe or large among the detection results of such a plurality of times based on the current detection result and the historical detection result, and when the change of the detection result is large and the detection result is rapidly and drastically decreased, the voice detection model needs to be reset, that is, when the change range between the current detection result and the historical detection result satisfies the preset false wake-up range, the current time point is determined as the reset time point, and the voice recognition or the voice detection is continued after the voice detection model is reset at the reset time point.

Wherein, the preset false wake-up range represents the numerical range of the drastic decline of the detection result, and the probability of false wake-up in the range is very high.

It should be noted that when the detection result is rapidly and sharply decreased, the voice detection model is reset. It can be understood that the detection result of the speech detection model with the history memory generally only drops slowly and slightly under the common noise (the speech detection model already contains the noise of the corresponding type in the training data set), and often only when the strong noise or the noise type that is not seen by the speech detection model in the training process appears in the speech detection, the detection result can be dropped rapidly and greatly, so that the awakening performance in the following time period is obviously deteriorated; therefore, when the artificial intelligence data detection device detects the change of the detection result, the voice detection model is reset, so that the problems (in this section) can be avoided, meanwhile, the awakening performance, the mistaken awakening performance, the memory and the operation amount under the common use scene cannot be obviously influenced, and the awakening accuracy is improved.

It should be noted that, in the embodiment of the present invention, S203 and S204-S205 are two optional implementations after S202, and the artificial intelligence data detection apparatus may execute the steps after S202 according to actual situations, which is not limited in the embodiment of the present invention.

In some embodiments of the present invention, when the number of detected detection paths is greater than one, the reference object is the current time point, and the detection paths include: a backup detection path and a main detection path; the artificial intelligence data detection apparatus performs a model resetting process in the speech recognition process referring to fig. 8, fig. 8 is an optional flowchart of the artificial intelligence data detection method provided by the embodiment of the present invention, and after S102, S301-S306 can be further performed. The following were used:

s301, acquiring a current time point.

And S302, when the current time point reaches a preset preheating time point, determining the current time point as a resetting time point of the backup detection path, wherein the preset preheating time point is a time point of a preset preheating time period before the preset resetting time point.

And S303, resetting and starting the backup detection path when the current time point reaches the preset preheating time point.

And S304, performing voice recognition by adopting a main detection channel and a backup detection channel.

S305, resetting the main detection path when the preset reset time point is reached after the preset preheating time period.

And S306, when the preset preheating time period is started from the preset reset time point, closing the backup detection channel, and performing voice recognition by adopting the main detection channel.

In the embodiment of the present invention, when the number of detected detection paths is greater than one, the reference object is the current time point, and the detection paths include: a backup detection path and a main detection path; the number of the backup detection paths and the number of the main detection paths are not limited in the embodiments of the present invention.

For example, referring to the voice detection process shown in fig. 9, in the data detection device with artificial intelligence, a main detection path and a backup detection path are taken as an example for description, and a reset and start controller is arranged between the main detection path and the backup detection path, and is used for controlling the reset of the main detection path and the reset and start of the backup detection path. After the audio data to be detected passes through the main detection path and the backup detection path, detection results (a main detection result and a backup detection result) can be obtained, and finally, all the detection results are comprehensively processed and then final detection results, namely total detection results, are output.

In the embodiment of the present invention, the reference object is a current time point, and specifically, the artificial intelligence data detection apparatus determines the reset time point based on the current time point and a preset time condition.

The time parameters in the preset time condition include a preset reset time point, a preset optimal awakening upper limit value, a preset optimal false awakening lower limit value, a preset preheating time period and a preset awakening word duration. The preset preheating time point is a time point of a preset preheating time period before the preset resetting time point.

In this way, after the artificial intelligence data detection device acquires the current time point, when the current time point reaches the preset preheating time point, the current time point is determined as the reset time point of the backup detection path. And when the current time point reaches the preset preheating time point, resetting and starting the backup detection path. And performing voice recognition by adopting a main detection channel and a backup detection channel. And resetting the main detection path when the preset reset time point is reached after the preset preheating time period. And when the preset preheating time period is started from the preset reset time point, closing the backup detection channel and carrying out voice recognition by adopting the main detection channel.

Wherein, the time parameter of the preset time condition satisfies:

presetting a reset time point as a time sequence with a preset time length;

the preset preheating time period is more than or equal to the preset awakening word duration.

It should be noted that, for a speech detection model with a history accumulation characteristic, in a wake-up detection scenario, a wake-up success rate varies with time.

For example, as shown in the first wake-up success rate vs. standby time curve of fig. 10, the data detection device with artificial intelligence is in standby (not connected)Receiving a wake-up word of a user) time T is more than or equal to T₀Later, the wake-up success rate of the next first or first wake-up operation will be significantly reduced. The magnitude of the wake-up performance degradation depends on the magnitude of t and the strength and characteristics of the ambient noise within the standby time period t. Wherein, T₀And the lower limit value represents the history insensitive time of the awakening word detection algorithm (namely the voice detection model), namely the preset optimal awakening upper limit value. When T is less than or equal to T₀There is no significant decrease in wake-up success rate (if the characteristics of the ambient noise during the standby period do not differ much from the noise data characteristics used in model training). T is₀The value of (c) depends on the data configuration at the time of model training. The history memory duration of the wake word detection algorithm is often limited and is marked as T₁(i.e. preset optimal false wake-up lower limit), the value of which is determined by the model structure and debugging parameters of the speech detection model of the algorithm, and the historically accumulated data exceeding the time length will not affect (or the effect is negligibly small) the current result of the wake-up word detection algorithm.

Therefore, in the embodiment of the present invention, T ≦ T₁The time is before the false wake-up performance reaches the optimum.

In the embodiment of the present invention, when the wake-up operations of the user are randomly distributed in time and the time (the preset time length) between the two wake-up operations is long, the reset operation needs to be performed in the standby state to ensure that the standby time T before the next wake-up operation of the user satisfies that T is less than or equal to T₁。

Illustratively, as shown in FIG. 11, in the standby state of the artificial intelligence data detection device, the signal is at { t }₁-K,t₂-K,t₃-K, … } time, the reset and start controller will initiate reset and start operations on the wake word detect algorithm of the backup detect path. And after receiving the reset and start commands, the detection module of the backup detection channel clears the data accumulated in the detection module and starts to receive the input audio data to be detected. Where K is referred to as the preset warm-up period, K needs to be equal to or greater than the preset wake-up duration τ: k is more than or equal to tau so as to ensure that the backup detection path can correctly detect the awakening word and improve the accuracy of awakening word detectionAnd (6) determining the rate.

And the reset and start controller module initiates reset operation to the wake-up word detection module of the main detection path every D time (preset reset time point).

Wherein D may be less than T₁The constant of (c) may also be a random number that is regenerated each time.

In the embodiment of the present invention, the preset reset time point is recorded as: { t₁,t₂,t₃… }. The selection of the reset time point needs to satisfy equation (6):

2K<t_i+1-t_i≤T₂ (6)

k is a preset preheating time period T₂The tolerable performance degradation time selected for system design satisfies T₀≤T₂≤T₁。

At { t₁+K,t₂+K,t₃+ K, …, the reset and start controller will issue a stop command to the wake word detection algorithm of the backup detection path, and the backup detection path will stop running or be closed.

Wherein the running time of the backup detection path is from t_i-K to t_i+K。

It will be understood that if t is_iAnd when the detection path is just in the audio data range of a certain awakening word, at least the backup detection path can receive the audio data of the complete awakening word, so that the detection of the awakening word is realized, and the accuracy of the detection of the awakening word is improved. Meanwhile, as long as formula (7) is satisfied

T₀/2≥K≥τ (7)

Then, at t_i-K to t_iThe awakening words appearing in the + K time period can all obtain the response of the optimal awakening performance of the backup detection path, and the optimal awakening word detection accuracy is achieved.

It should be noted that, in the embodiment of the present invention, the backup detection path is initially closed, and is only started when the preset preheating time point is reached.

In some embodiments of the present invention, the specific process of performing voice recognition by the artificial intelligence data detection apparatus in S304 is to receive audio data to be detected; respectively carrying out voice recognition on audio data to be detected by adopting a main detection path and a backup detection path to obtain a main detection result and a backup detection result; comprehensively processing the main detection result and the backup detection result to obtain a total detection result; and when the total detection result is larger than the preset awakening threshold, recognizing that the audio data to be detected is awakening words, and starting an awakening function.

In some embodiments of the present invention, the specific process of performing speech recognition by the artificial intelligence data detection apparatus in S306 is as follows: receiving audio data to be detected; performing voice recognition on audio data to be detected by adopting a main detection channel to obtain a main detection result; and when the main detection result is larger than a preset awakening threshold, recognizing that the audio data to be detected is an awakening word, and starting an awakening function.

In the embodiment of the invention, when the artificial intelligent data detection device starts the backup detection channel, the main detection channel and the backup detection channel carry out voice detection, so that a main detection result and a backup detection result can be obtained, and the artificial intelligent data detection device can carry out awakening judgment based on a comprehensive detection result of the main detection result and the backup detection result, namely a total detection result. And when the artificial intelligence data detection device stops running or closes the backup detection channel, the main detection channel carries out voice detection, so that a main detection result can be obtained, and the artificial intelligence data detection device can be awakened and judged based on the main detection result. Thus, the accuracy of the wake-up is improved based on the improvement of the accuracy of the voice recognition.

In the embodiment of the invention, the detection results of the awakening words of the main detection channel and the backup detection channel are synthesized, and after the comprehensive processing, the total detection result is output.

Illustratively, a simple implementation of the detection result integration process is: when the backup detection path is not running (t)_i-1+K～t_i-K) using only the main detection result of the main detection path; when the main path and the backup path run simultaneously (t)_i-K～t_i+ K) the higher of the detection results in the main detection path and the backup detection path is used. Assuming that the primary detection result is z (t), the backup detection result is b (t), and the total detection result after the comprehensive processing is s (t), namely, formula (8):

s(t)＝z(t)，t∈(t_i-1+K～t_i-K)

s(t)＝max_z，b(z(t)，b(t))，t∈(t_i-K～t_i+K) (8)

it should be noted that the comprehensive processing may also be mean operation, geometric average or weighting algorithm, and the like, and the embodiment of the present invention is not limited.

In the embodiment of the invention, after the total detection result is obtained, the artificial intelligent data detection device can compare the total detection result with the preset awakening threshold to make awakening judgment.

In some embodiments of the present invention, based on the implementation of the speech detection model reset described in the foregoing embodiments, referring to fig. 12, fig. 12 is an optional flowchart of the artificial intelligence data detection method provided in the embodiments of the present invention, and fig. 12 shows that after S104, the artificial intelligence data detection apparatus may perform speech recognition using the reset speech detection model. Particular implementations may also perform S105-107.

The following were used:

s105, in the voice detection based on at least one direction branch, respectively carrying out voice recognition on at least one direction branch according to the reset voice detection model to obtain at least one current detection result.

And S106, comprehensively processing at least one current detection result to obtain a comprehensive detection result.

And S107, when the comprehensive detection result is larger than a preset awakening threshold, identifying an awakening word and starting an awakening function.

In the embodiment of the present invention, there may be a speech detection architecture with multiple directional branches, and the foregoing embodiments describe a speech detection model architecture in one direction.

In some embodiments of the present invention, the voice detection architecture of multiple directional branches (at least one direction) may distribute microphone array signals in different directional branches through a microphone array, and audio data to be detected is input and then transmitted in voice detection of the multi-directional branches, and each directional branch obtains a detection result through voice detection, so that at least one detection result is obtained through voice detection of the multi-directional branches (as shown in fig. 13).

In the embodiment of the present invention, each directional branch is provided with a single-channel speech detection model, which is the speech detection model in the above embodiment.

Therefore, in the voice detection based on at least one direction branch, the artificial intelligence data detection device respectively performs voice recognition on at least one direction branch according to the reset voice detection model (the reset single-channel voice detection model), so as to obtain at least one current detection result, performs comprehensive processing on at least one current detection result, obtains a comprehensive detection result, performs awakening judgment based on the comprehensive detection result and a preset awakening threshold, namely identifies an awakening word when the comprehensive detection result is greater than the preset awakening threshold, and starts an awakening function.

The reset single-channel speech detection model for each direction branch is obtained in accordance with all the reset procedures in the speech detection model described in the previous embodiment.

That is to say, the artificial intelligent data detection apparatus can directly and simply implement the reset of the voice detection model at the reset time point in the previous embodiment, and each directional branch in fig. 13 can be used independently, that is, each directional branch resets the directional branch according to the detection result of itself, or the single-channel voice detection model in all directional branches can be reset uniformly according to the maximum value of the detection result in each directional branch.

Illustratively, as shown in fig. 14, the procedure of detecting the wake-up word and detecting the reset of the multi-directional branch in fig. 13 is performed by using one detection path. As shown in fig. 15, for one directional branch as an example, the processes of detecting wakeup words and detecting reset for the multi-directional branch in fig. 13 are performed by using one main detection path (single-channel wakeup word detection) and one backup detection path (backup single-channel wakeup word detection). As shown in fig. 16, in the process of detecting the wake-up word for the multi-directional branches, one detection path and at least two detection paths may be used in cooperation for branches in different directions.

It should be noted that the reset determination manners in fig. 14 to 16 can be adopted for each directional branch, and the embodiment of the present invention does not limit which specific direction of the branch that can be reset is. The detailed description has been described in the foregoing embodiments, and is not repeated herein.

In some embodiments of the present invention, in a scenario of a main detection path and a backup detection path, all direction branches are subjected to a reset and backup operation in turn, and at an arbitrarily selected reset time point t_iResetting the ith% N branch, wherein N is the number of the branches, and "%" represents the remainder taking operation; or, at an arbitrary reset time point t_iSelecting the branch with the lowest current detection result at t_i+1The reset and backup operations are performed at all times, and the embodiments of the present invention are not limited.

In the following, an exemplary application of the embodiment of the present invention in an actual application scenario of using a smart speaker to perform wake-up word detection will be described, taking a reset mode of at least two detection paths as an example.

As shown in fig. 17, a user utters "hi, small and four" audio data 1 (audio data to be detected) at a time 1, the audio data 1 is received by the smart speaker, the smart speaker performs wake-up detection and reset detection on the audio data 1, the smart speaker determines that the time 1 is compared with a preset preheat time point and a preset reset time point, and obtains that the time 1 reaches the preset preheat time point, and then resets and starts a backup detection path, in which case, the smart speaker performs wake-up recognition by using a main detection path and a backup detection path to obtain a main detection result and a backup detection result; comprehensively processing the main detection result and the backup detection result to obtain a total detection result; and when the total detection result is larger than the preset awakening threshold, recognizing that the audio data to be detected is an awakening word, starting an awakening function, and outputting the voice prompt of 'I' to the user. Therefore, the user can know that the next voice command can be carried out, and the intelligent sound box is controlled to realize certain application function. In the embodiment of the invention, the certain application function can be the application function of the intelligent sound box, and the application function of other terminals in a local area network can be controlled by the server.

For example, as shown in fig. 18, after the smart speaker is awakened, the audio data 2 of "turning on the television" is received, after the smart speaker has undergone the foregoing reset detection and awakening determination, the awakening function of turning on the television is started, so that a television start instruction is generated to the server, and the server controls the television to be turned on through the network according to the television start instruction, and displays a prompt "turning on the television" on the interface of the television.

Embodiments of the present invention provide a storage medium, which stores a computer readable storage medium, and has executable instructions stored therein, which when executed by a processor, will cause the processor to execute the artificial intelligence data detection method provided by embodiments of the present invention.

In some embodiments, the storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. An artificial intelligence data detection method, comprising:

acquiring audio data to be detected;

when the detected detection paths comprise a main detection path and a backup detection path, judging a reset time point based on the current time point and a preset time condition, controlling the reset of the main detection path and controlling the reset and start of the backup detection path through a reset and start controller when the reset time point is reached, and obtaining a reset voice detection model of each detection path; the reset operation is used for ensuring that the standby time before the awakening operation is less than or equal to the historical memory time of the awakening word algorithm; the main detection channel and the backup detection channel are detection channels capable of performing voice recognition in a voice detection model;

2. The method of claim 1, wherein when the detected detection paths include a main detection path and a backup detection path, controlling the reset of the main detection path and controlling the reset and the start of the backup detection path by the reset and start controller to obtain the post-reset voice detection model of each detection path comprises:

when the detected detection path comprises a main detection path and a backup detection path, acquiring a current time point;

3. The method according to claim 2, wherein when the preset reset time point is reached after the preset warm-up time period, the reset of the main detection path is controlled by the reset and start controller, and the main detection path is reset, so that after the reset voice detection model of the main detection path is obtained, the method further comprises:

4. The method of claim 2,

the preset reset time points are time sequences with preset time length intervals;

5. The method of claim 2, wherein said using said primary detection path and said backup detection path for speech recognition comprises:

receiving audio data to be detected;

6. The method of claim 3, wherein said employing said primary detection path for speech recognition comprises:

receiving audio data to be detected;

7. A data detection apparatus, comprising:

the acquisition unit is used for acquiring audio data to be detected;

the reset unit is used for judging a reset time point based on the current time point and a preset time condition when the detected detection path comprises a main detection path and a backup detection path, controlling the reset of the main detection path and controlling the reset and start of the backup detection path through the reset and start controller when the reset time point arrives, and obtaining a reset voice detection model of each detection path; the reset operation is used for ensuring that the standby time before the awakening operation is less than or equal to the historical memory time of the awakening word algorithm; the main detection channel and the backup detection channel are detection channels which can perform voice recognition in a voice detection model;

the identification unit is used for identifying the audio data to be detected of a main detection channel and a backup detection channel by using the reset voice detection model to obtain a main detection result of the main detection channel and a backup detection result of the backup detection channel; and after the main detection result and the backup detection result are comprehensively processed, outputting a total detection result.

8. The apparatus of claim 7,

the acquisition unit is further used for acquiring a voice detection model, wherein the voice detection model is a corresponding relation between audio data of at least one detection channel with history accumulation characteristics and a voice recognition result; when the detected detection path comprises a main detection path and a backup detection path, acquiring a current time point;

9. The apparatus of claim 8,

the identification unit is further used for controlling the reset of the main detection channel through the reset and start controller when the preset reset time point is reached after the preset preheating time period, resetting the main detection channel to obtain a reset back voice detection model of the main detection channel, closing the backup detection channel when the preset reset time point passes through the preset preheating time period, and adopting the main detection channel to perform voice recognition.

10. The apparatus of claim 8,

11. The apparatus of claim 8,

the receiving unit is used for receiving the audio data to be detected;

the identification unit is used for respectively carrying out voice identification on the audio data to be detected by adopting the main detection channel and the backup detection channel to obtain a main detection result and a backup detection result; comprehensively processing the main detection result and the backup detection result to obtain a total detection result; when the total detection result is larger than a preset awakening threshold, recognizing that the audio data to be detected is an awakening word, and starting an awakening function;

alternatively, the first and second electrodes may be,

the receiving unit is used for receiving the audio data to be detected;

12. A data detection apparatus, comprising:

a memory to store executable data detection instructions;

a processor for implementing the method of any one of claims 1 to 6 when executing executable data detection instructions stored in the memory.

13. A computer-readable storage medium having stored thereon executable data detection instructions for causing a processor to perform the method of any one of claims 1 to 6 when executed.