CN114495923A - Intelligent control system implementation method and device, electronic equipment and storage medium - Google Patents

Intelligent control system implementation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114495923A
CN114495923A CN202111625375.3A CN202111625375A CN114495923A CN 114495923 A CN114495923 A CN 114495923A CN 202111625375 A CN202111625375 A CN 202111625375A CN 114495923 A CN114495923 A CN 114495923A
Authority
CN
China
Prior art keywords
voice
end processing
processing function
control system
intelligent control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111625375.3A
Other languages
Chinese (zh)
Inventor
徐木水
汪木金
李鑫
李峥
李鹏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111625375.3A priority Critical patent/CN114495923A/en
Publication of CN114495923A publication Critical patent/CN114495923A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The present disclosure provides an intelligent control system implementation method, apparatus, electronic device and storage medium, relating to the artificial intelligence fields of intelligent voice, deep learning and intelligent traffic, wherein the method may include: completing a first voice front-end processing function by utilizing a voice chip in the intelligent control system; a main control processor in the intelligent control system is used for completing a second voice front-end processing function; the first voice front-end processing function and the second voice front-end processing function jointly form the voice front-end processing function of the intelligent control system, and the first voice front-end processing function and the second voice front-end processing function are different voice front-end processing functions. By applying the scheme disclosed by the invention, the computational stress of the main control processor can be reduced, the voice interaction effect is improved and the like.

Description

Intelligent control system implementation method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for implementing an intelligent control system, an electronic device, and a storage medium in the fields of intelligent speech, deep learning, and intelligent transportation.
Background
With the increasing demand of voice functions of the intelligent cabin, the computational demand on the vehicle machine master control in the vehicle machine system is higher and higher.
At present, all voice front-end processing functions are completed on a vehicle main control processor, so that great computational pressure is caused to the vehicle main control processor, the problems of jamming, slow interactive response and the like are likely to be caused, and the voice interaction effect is further influenced.
Disclosure of Invention
The disclosure provides an intelligent control system implementation method, an intelligent control system implementation device, electronic equipment and a storage medium.
An intelligent control system implementation method comprises the following steps:
completing a first voice front-end processing function by utilizing a voice chip in the intelligent control system;
finishing a second voice front-end processing function by utilizing a main control processor in the intelligent control system;
wherein the first voice front-end processing function and the second voice front-end processing function jointly constitute the voice front-end processing function of the intelligent control system, and the first voice front-end processing function and the second voice front-end processing function are different voice front-end processing functions.
An intelligent control system comprising: a main control processor and a voice chip;
the voice chip is used for completing a first voice front-end processing function;
the main control processor is used for finishing the processing function of the second voice front end;
the first voice front-end processing function and the second voice front-end processing function jointly form the voice front-end processing function of the intelligent control system, and the first voice front-end processing function and the second voice front-end processing function are different voice front-end processing functions.
An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described above.
A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method as described above.
A computer program product comprising computer programs/instructions which, when executed by a processor, implement a method as described above.
One embodiment in the above disclosure has the following advantages or benefits: partial voice front-end processing functions finished on the main control processor can be transplanted to the voice chip, so that the computational stress of the main control processor is reduced, the problems of jamming, slow interactive response and the like are avoided as much as possible, and the voice interactive effect and the like are correspondingly improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of an embodiment of a method for implementing an intelligent control system according to the present disclosure;
fig. 2 is a schematic frame diagram of the car machine system according to the present disclosure;
fig. 3 is a schematic diagram of data flow corresponding to the car machine system according to the present disclosure;
FIG. 4 is a schematic diagram illustrating an exemplary configuration of an intelligent control system 400 according to the present disclosure;
FIG. 5 shows a schematic block diagram of an electronic device 500 that may be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In addition, it should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Fig. 1 is a flowchart of an embodiment of an intelligent control system implementation method according to the present disclosure. As shown in fig. 1, the following detailed implementation is included.
In step 101, a first voice front-end processing function is completed by using a voice chip in the intelligent control system.
In step 102, a master control processor in the intelligent control system is used to complete a second voice front-end processing function, the first voice front-end processing function and the second voice front-end processing function together form the voice front-end processing function of the intelligent control system, and the first voice front-end processing function and the second voice front-end processing function are different voice front-end processing functions.
It can be seen that, in the above-mentioned scheme of the method embodiment, part of the voice front-end processing functions completed on the main control processor can be transplanted onto the voice chip, thereby reducing the computational stress of the main control processor, further avoiding the problems of stutter and slow interactive response as much as possible, and correspondingly improving the voice interaction effect.
Preferably, the intelligent control system may be a car machine system, and the main control processor may be a car machine main control processor, which will be described below as an example.
The voice chip can be a special voice chip, a dual-core High-Fidelity 4(HIFI4, High-Fidelity4) Digital Signal Processor (DSP) architecture, a user-defined instruction set and rich vector floating point arithmetic units can be adopted, so that the voice chip is more suitable for neural network calculation and the like, conforms to the vehicle standard and can be applied to various scenes such as intelligent home furnishing, intelligent vehicle-mounted and the like.
In the traditional mode, all the voice front-end processing functions are completed on an Advanced reduced instruction set computer (ARM) processor controlled by a vehicle Machine.
In an embodiment of the present disclosure, the first voice front-end processing function performed by the voice chip may include: noise reduction, Echo cancellation (AEC), and wake-up detection, the second voice front-end processing function performed by the car owner control processor may include: voice Activity Detection (VAD), Voice Activity Detection (Voice Activity Detection), and the like.
The functions of noise reduction, echo cancellation, awakening detection and the like with higher computational power requirements can be transplanted to the voice chip, so that the computational power pressure of the vehicle main control processor is remarkably reduced.
In one embodiment of the present disclosure, the echo cancellation may include: linear echo cancellation and/or model echo cancellation, the wake-up detection may include: wakeup word detection and command word detection.
That is to say, when echo cancellation is performed, only linear echo cancellation may be performed, only model echo cancellation may be performed, and both linear echo cancellation and model echo cancellation may be performed. However, generally, linear echo cancellation is performed, and whether to perform model echo cancellation can be determined according to actual needs. The model echo cancellation means that echo cancellation can be performed by using a deep learning model obtained by pre-training.
In addition, the wake-up detection may include wake-up word detection and command word detection, where the wake-up word refers to a vocabulary for waking up the device, and the command word generally refers to some vehicle-mounted control commands, such as opening a skylight, navigating to a place name, playing a next song, and the like.
In an embodiment of the present disclosure, for the main control processor of the vehicle, a multi-tone speech separation algorithm based on a complex Convolutional Neural Network (CNN) may be adopted to implement tone separation, and the algorithm may be a deep learning model obtained by training in advance.
In a traditional mode, a traditional voice Signal processing method is usually adopted to carry out voice blind separation, in the scheme disclosed by the disclosure, a multi-sound zone voice separation algorithm based on a complex number CNN can be adopted to realize sound zone separation, so that voice separation of any mixed sound source such as a main driving source, a secondary driving source, a left-right source, a right-right source and the like can be realized, the isolation of a sound zone in a vehicle is really realized, and the separation Signal-to-Noise Ratio (SNR) can reach more than 15db, so that the problems of false recognition and the like caused by sound zone leakage are avoided.
In one embodiment of the present disclosure, for the vehicle master processor, a Blind Source Separation (BSS) based voice activity detection algorithm may be employed to implement voice activity detection.
In the conventional method, a plurality of wake-up/voice activity detection instances are usually created so as to respectively perform wake-up detection and voice activity detection on the separated data of different sound zones, thereby increasing the expenses of a memory and a Central Processing Unit (CPU) of the car machine main control.
In an embodiment of the present disclosure, information interaction may be performed between the voice chip and the car owner control processor through General-Purpose Input/Output (GPIO) interrupt and a predetermined interface, where the predetermined interface may include: an audio interface and/or a serial communication interface.
The audio Interface may be an Integrated Circuit built-in audio bus (I2S) Interface or a Time Division Multiplexing (TDM) Interface, and the Serial communication Interface may be a Serial Peripheral Interface (SPI) Interface or the like.
When the voice chip detects that awakening occurs, the GPIO can be triggered to interrupt to inform the vehicle main control processor, and awakening information, such as awakening words and awakening point position information, can be acquired through interfaces such as the SPI and the like after the vehicle main control processor.
Through the processing, simple and efficient information interaction between the voice chip and the vehicle main control processor can be realized.
In addition, in the conventional mode, for data acquired through an android Audio (android Audio) link, the data can be waken up for detection and the like after a series of complex processing, so that a large voice interaction time delay is introduced, the wakening response speed is low, and the like.
Based on the foregoing introduction, fig. 2 is a schematic diagram of a framework of the car machine system according to the present disclosure, and fig. 3 is a schematic diagram of a data flow corresponding to the car machine system according to the present disclosure.
As mentioned above, the first voice front-end processing function performed by the voice chip may include: the second voice front-end processing function finished by the vehicle main control processor can comprise: separation of sound zones and detection of voice activity. As shown in fig. 2 and 3, preferably, the echo cancellation may include linear echo cancellation and model echo cancellation, the wake-up detection may include wake-up word detection and instruction word detection, and in addition, the car owner processor may implement voice zone separation using a multi-tone zone voice separation algorithm based on complex CNN, and may implement voice activity detection using a BSS-based voice activity detection algorithm.
It should be noted that the voice front-end processing function completed by the voice chip and the car main control processor is only an example, and is not used to limit the technical solution of the present disclosure. As shown in fig. 2 and fig. 3, in practical applications, some other functions may be further included according to practical needs.
As shown in fig. 2 and fig. 3, data acquisition, that is, audio data acquisition, may be implemented in the voice chip, and the acquired data may be preprocessed, where the preprocessing specifically includes what contents may be determined according to actual needs, for example, processing such as dc removal, sampling rate adjustment, and deburring may be included, and in addition, the voice chip may also perform dereverberation processing, so that a result obtained by performing dereverberation processing on a model echo cancellation result is uploaded to the vehicle main control processor.
As shown in fig. 2 and fig. 3, the main control processor of the vehicle may further support a self-defined wake-up function, that is, may support a user to define a wake-up word, and the voice chip may perform detection based on a preset default wake-up word when performing detection of the wake-up word.
The voice chip can be integrally modeled end to end with a vehicle main control processor, and information interaction can be carried out through GPIO interruption and a predetermined interface, wherein the predetermined interface can comprise: an audio interface and/or a serial communication interface.
As shown in fig. 2 and fig. 3, both the linear echo cancellation result and the model echo cancellation result can be uploaded to the main control processor of the vehicle, and each functional module in the main control processor of the vehicle can select whether to use the linear echo cancellation result or the model echo cancellation result according to actual needs. For example, the custom wake-up function may use linear echo cancellation results.
As shown in fig. 2 and 3, the linear echo cancellation result may be uploaded to the main control processor of the vehicle through an audio interface, for example, through an I2S interface or a TDM interface, and the model echo cancellation result may be uploaded to the main control processor of the vehicle through an audio interface or a serial communication interface, for example, through an I2S interface, a TDM interface, or an SPI interface. The echo cancellation result of the model can be uploaded to a main control processor of the vehicle, and can also be used for awakening detection and the like. When the voice chip detects that awakening occurs, the GPIO interrupt can be triggered to notify the vehicle main control processor, awakening information can be acquired through the SPI after the vehicle main control processor, and an RPC event (RPC module) can be used for notifying a service layer to awaken triggering and the like.
As shown in fig. 2 and fig. 3, a Speech Recognition Software Development Kit (SDK) is a service layer, and can acquire multi-tone region Recognition data (or ASR data) through an android data link, acquire a Speech activity detection result from an RPC module, and further implement Automatic Speech Recognition (ASR) according to the acquired data/information.
It is noted that while for simplicity of explanation, the foregoing method embodiments are described as a series of acts, those skilled in the art will appreciate that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required for the disclosure.
In a word, by adopting the scheme of the embodiment of the method disclosed by the invention, part of the voice front-end processing function finished on the main control processor of the vehicle can be transplanted to the voice chip, so that the computational stress of the main control processor of the vehicle is reduced, the problems of blocking, slow interactive response and the like are avoided as much as possible, the voice interaction effect is improved, and the sound zone separation and voice activity detection mode is improved, so that the problem of false recognition caused by sound zone leakage is avoided, the expenses of a memory and a CPU (Central processing Unit) are reduced, and the like.
The above is a description of embodiments of the method, and the embodiments of the apparatus are further described below.
Fig. 4 is a schematic diagram of a structure of an intelligent control system 400 according to an embodiment of the present disclosure. As shown in fig. 4, includes: a main control processor 401 and a voice chip 402.
And a voice chip 402 for performing a first voice front-end processing function.
And a main control processor 401, configured to complete a second voice front-end processing function, where the first voice front-end processing function and the second voice front-end processing function jointly form a voice front-end processing function of the intelligent control system, and the first voice front-end processing function and the second voice front-end processing function are different voice front-end processing functions.
It can be seen that, in the above-mentioned device embodiment, part of the voice front-end processing functions completed on the main control processor can be transplanted onto the voice chip, so as to reduce the computational stress of the main control processor, further avoid the problems of stutter and slow interactive response as much as possible, and correspondingly improve the voice interaction effect.
In the scheme of the present disclosure, a voice chip 402 is further introduced into the intelligent control system 400, and part of the voice front-end processing functions are transplanted to the voice chip 402.
In one embodiment of the present disclosure, the first voice front-end processing function performed by the voice chip 402 may include: noise reduction, echo cancellation, wake-up detection, etc., the second voice front-end processing function performed by the main control processor 401 may include: voice zone separation, voice activity detection, etc.
In one embodiment of the present disclosure, the echo cancellation may include: linear echo cancellation and/or model echo cancellation, the wake-up detection may include: wakeup word detection and command word detection.
That is to say, when performing echo cancellation, only linear echo cancellation may be performed, only model echo cancellation may be performed, both linear echo cancellation and model echo cancellation may be performed, and which method is specifically adopted may be determined according to actual needs. However, generally, linear echo cancellation is performed, and whether to perform model echo cancellation can be determined according to actual needs.
In addition, the wake-up detection may include wake-up word detection and command word detection, wherein the wake-up word refers to a vocabulary for waking up the device, and the command word refers to some control commands.
In one embodiment of the present disclosure, for the main control processor 401, a multi-phoneme speech separation algorithm based on complex CNN may be adopted to implement phoneme separation, and the algorithm may be a deep learning model obtained by training in advance.
In a traditional mode, a traditional voice signal processing method is usually adopted to carry out voice blind separation, in the scheme disclosed by the invention, a multi-sound-zone voice separation algorithm based on a plurality of CNNs can be adopted to realize sound zone separation, and a vehicle is taken as an example, so that voice separation of any mixed sound source such as a main driving source, an auxiliary driving source, a left-right mixed sound source, a right-right mixed sound source and the like can be realized, in-vehicle sound zone isolation is really realized, the separation signal-to-noise ratio can reach more than 15db, and the problems of misidentification and the like caused by sound zone leakage are avoided.
In one embodiment of the present disclosure, voice activity detection may also be implemented for the master processor 401 using a blind source separation based voice activity detection algorithm.
In the scheme of the disclosure, voice activity detection can be realized by adopting a voice activity detection algorithm based on blind source separation, namely, the data of different sound zones can be sent to the voice activity detection algorithm based on blind source separation together, the algorithm outputs the detection results of the start point and the tail point of the voice activity of different sound zones, and a multi-sound zone processing logic is added to realize the interactive effect of the multi-sound zone, and the larger memory and CPU overhead caused by the creation of a plurality of examples are avoided.
In addition, in an embodiment of the present disclosure, information interaction between the voice chip 402 and the master processor 401 may be performed through GPIO interrupt and a predetermined interface, where the predetermined interface may include: an audio interface and/or a serial communication interface.
The audio interface may be an I2S or TDM interface, the serial communication interface may be an SPI interface, and the like.
When the voice chip 402 detects that the wake-up occurs, the main control processor 401 may be notified by triggering GPIO interrupt, and the main control processor 401 may then obtain wake-up information through interfaces such as the SPI, and accordingly complete subsequent processing.
The specific work flow of the embodiment of the apparatus shown in fig. 4 can refer to the related description of the foregoing method embodiments.
Preferably, the intelligent control system may be a vehicle machine system, and the main control processor may be a vehicle machine main control processor. By adopting the scheme of the embodiment of the device disclosed by the invention, part of the voice front-end processing function finished on the main control processor of the vehicle can be transplanted to the voice chip, so that the computational stress of the main control processor of the vehicle is reduced, the problems of blocking, slow interactive response and the like are avoided as much as possible, the voice interaction effect is improved, and the sound zone separation and voice activity detection modes are improved, so that the problem of error identification caused by sound zone leakage is avoided, the expenses of a memory and a CPU (Central processing Unit) are reduced, in addition, the scheme of the embodiment of the device disclosed by the invention can be used in a cross-platform System manner, can be quickly applied to the vehicle machines with different hardware and different operating systems (OS, operating System), and has no perception difference for service SDK (software development kit) or service APP (APP), so that the device can be quickly translated and the like.
The scheme disclosed by the disclosure can be applied to the field of artificial intelligence, in particular to the fields of intelligent voice, deep learning, intelligent traffic and the like. Artificial intelligence is a subject for studying a computer to simulate some thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning and the like) of a human, and has a hardware technology and a software technology, the artificial intelligence hardware technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, big data processing and the like, and the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge graph technology and the like.
In the embodiment of the present disclosure, the voice is not specific to a voice of a specific user, and cannot reflect personal information of the specific user, and in addition, an execution subject of the car machine system implementation method may obtain the voice through various public and legal compliance manners, such as obtaining from the user after authorization of the user.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 5 shows a schematic block diagram of an electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 501 performs the various methods and processes described above, such as the methods described in this disclosure. For example, in some embodiments, the methods described in this disclosure may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the methods described in the present disclosure may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured by any other suitable means (e.g., by means of firmware) to perform the methods described by the present disclosure.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server combining a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (15)

1. An intelligent control system implementation method comprises the following steps:
completing a first voice front-end processing function by utilizing a voice chip in the intelligent control system;
finishing a second voice front-end processing function by utilizing a main control processor in the intelligent control system;
the first voice front-end processing function and the second voice front-end processing function jointly form the voice front-end processing function of the intelligent control system, and the first voice front-end processing function and the second voice front-end processing function are different voice front-end processing functions.
2. The method of claim 1, wherein,
the first speech front-end processing function comprises: noise reduction, echo cancellation and wake-up detection;
the second speech front-end processing function comprises: separation of sound zones and detection of voice activity.
3. The method of claim 2, wherein,
the echo cancellation includes: linear echo cancellation and/or model echo cancellation;
the wake-up detection comprises: wakeup word detection and command word detection.
4. The method of claim 2, wherein,
the sound zone separation comprises: and the sound zone separation is realized by adopting a multi-sound zone voice separation algorithm based on a complex convolution neural network.
5. The method of claim 2, wherein,
the voice activity detection comprises: and voice activity detection is realized by adopting a voice activity detection algorithm based on blind source separation.
6. The method according to any one of claims 1 to 5,
the voice chip and the main control processor carry out information interaction through general input and output interruption and a preset interface, wherein the preset interface comprises: an audio interface and/or a serial communication interface.
7. An intelligent control system comprising: a main control processor and a voice chip;
the voice chip is used for completing a first voice front-end processing function;
the main control processor is used for finishing the processing function of the second voice front end;
wherein the first voice front-end processing function and the second voice front-end processing function jointly constitute the voice front-end processing function of the intelligent control system, and the first voice front-end processing function and the second voice front-end processing function are different voice front-end processing functions.
8. The intelligent control system of claim 7 wherein,
the first speech front-end processing function comprises: noise reduction, echo cancellation and wake-up detection;
the second speech front-end processing function comprises: separation of sound zones and detection of voice activity.
9. The intelligent control system of claim 8 wherein,
the echo cancellation includes: linear echo cancellation and/or model echo cancellation;
the wake-up detection comprises: wakeup word detection and command word detection.
10. The intelligent control system of claim 8 wherein,
and the main control processor adopts a multi-sound-zone voice separation algorithm based on a complex convolution neural network to realize sound zone separation.
11. The intelligent control system of claim 8 wherein,
and the main control processor adopts a voice activity detection algorithm based on blind source separation to realize the voice activity detection.
12. The intelligent control system of any one of claims 7 to 11,
the voice chip and the main control processor carry out information interaction through general input and output interruption and a preset interface, wherein the preset interface comprises: an audio interface and/or a serial communication interface.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-6.
15. A computer program product comprising a computer program/instructions which, when executed by a processor, implement the method of any one of claims 1-6.
CN202111625375.3A 2021-12-28 2021-12-28 Intelligent control system implementation method and device, electronic equipment and storage medium Pending CN114495923A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111625375.3A CN114495923A (en) 2021-12-28 2021-12-28 Intelligent control system implementation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111625375.3A CN114495923A (en) 2021-12-28 2021-12-28 Intelligent control system implementation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114495923A true CN114495923A (en) 2022-05-13

Family

ID=81496456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111625375.3A Pending CN114495923A (en) 2021-12-28 2021-12-28 Intelligent control system implementation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114495923A (en)

Similar Documents

Publication Publication Date Title
US11848008B2 (en) Artificial intelligence-based wakeup word detection method and apparatus, device, and medium
CN108520743B (en) Voice control method of intelligent device, intelligent device and computer readable medium
TWI455112B (en) Speech processing apparatus and electronic device
KR102553234B1 (en) Voice data processing method, device and intelligent vehicle
JP7213943B2 (en) Audio processing method, device, device and storage medium for in-vehicle equipment
EP3923272A1 (en) Method and apparatus for adapting a wake-up model
JP7158217B2 (en) Speech recognition method, device and server
CN113674746B (en) Man-machine interaction method, device, equipment and storage medium
CN113436611B (en) Test method and device for vehicle-mounted voice equipment, electronic equipment and storage medium
JP6875819B2 (en) Acoustic model input data normalization device and method, and voice recognition device
CN111091819A (en) Voice recognition device and method, voice interaction system and method
EP4044178A2 (en) Method and apparatus of performing voice wake-up in multiple speech zones, method and apparatus of performing speech recognition in multiple speech zones, device, and storage medium
CN112562742A (en) Voice processing method and device
CN111833870A (en) Awakening method and device of vehicle-mounted voice system, vehicle and medium
CN113096692B (en) Voice detection method and device, equipment and storage medium
CN113658586A (en) Training method of voice recognition model, voice interaction method and device
CN113611316A (en) Man-machine interaction method, device, equipment and storage medium
EP4030424B1 (en) Method and apparatus of processing voice for vehicle, electronic device and medium
CN114495923A (en) Intelligent control system implementation method and device, electronic equipment and storage medium
CN114647610B (en) Voice chip implementation method, voice chip and related equipment
CN114399992B (en) Voice instruction response method, device and storage medium
CN112017651A (en) Voice control method and device of electronic equipment, computer equipment and storage medium
CN114333017A (en) Dynamic pickup method and device, electronic equipment and storage medium
CN114220430A (en) Multi-sound-zone voice interaction method, device, equipment and storage medium
CN114201225A (en) Method and device for awakening function of vehicle machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination