CN116453523B - High-concurrency voice AI node overall processing method and device - Google Patents

High-concurrency voice AI node overall processing method and device Download PDF

Info

Publication number
CN116453523B
CN116453523B CN202310720865.4A CN202310720865A CN116453523B CN 116453523 B CN116453523 B CN 116453523B CN 202310720865 A CN202310720865 A CN 202310720865A CN 116453523 B CN116453523 B CN 116453523B
Authority
CN
China
Prior art keywords
concurrent
complexity
voice
signal
signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310720865.4A
Other languages
Chinese (zh)
Other versions
CN116453523A (en
Inventor
艾勇
王磊
张静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Borui Tianxia Technology Co ltd
Original Assignee
Shenzhen Borui Tianxia Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Borui Tianxia Technology Co ltd filed Critical Shenzhen Borui Tianxia Technology Co ltd
Priority to CN202310720865.4A priority Critical patent/CN116453523B/en
Publication of CN116453523A publication Critical patent/CN116453523A/en
Application granted granted Critical
Publication of CN116453523B publication Critical patent/CN116453523B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a method and a device for overall processing of high-concurrency voice AI nodes, and relates to the field of voice processing, wherein the method comprises the following steps: acquiring a batch concurrent voice request; outputting concurrent voice signals corresponding to the batch concurrent voice requests; inputting the concurrent voice signals into a complexity recognition module to acquire signal complexity; splitting concurrent voice signals according to the voice splitting module and the signal complexity to obtain a signal splitting result; and inputting the signal splitting result into a channel configuration module, generating a plurality of concurrent processing nodes, and carrying out multichannel parallel processing on the concurrent voice signals based on the plurality of concurrent processing nodes. The method solves the technical problems of insufficient recognition accuracy for high-concurrency voice signals and low recognition efficiency for the high-concurrency voice signals in the prior art. The method and the device have the advantages of improving the recognition accuracy of the high-concurrency voice signals, and improving the recognition efficiency and recognition quality of the high-concurrency voice signals.

Description

High-concurrency voice AI node overall processing method and device
Technical Field
The application relates to the field of voice processing, in particular to a method and a device for overall processing of high-concurrency voice AI nodes.
Background
With the continued development of artificial intelligence, voice request processing is evolving toward complications. When multiple voice signals requesting processing are received simultaneously, the multiple voice signals are often referred to as high-concurrence voice signals. The high concurrency voice signal has the characteristics of more voice signals, strong complexity of the voice signals and the like. How to perform overall processing on high concurrency voice signals is widely focused by people.
In the prior art, the technical problems of insufficient recognition accuracy for high-concurrency voice signals and low recognition efficiency for the high-concurrency voice signals exist.
Disclosure of Invention
The application provides a method and a device for overall processing of high-concurrency voice AI nodes. The method solves the technical problems of insufficient recognition accuracy for high-concurrency voice signals and low recognition efficiency for the high-concurrency voice signals in the prior art. The multi-channel parallel processing is carried out on the concurrent voice signals through the plurality of concurrent processing nodes, the recognition accuracy of the high concurrent voice signals is improved, and the technical effects of the recognition efficiency and the recognition quality of the high concurrent voice signals are improved.
In view of the above problems, the application provides a method and a device for overall processing of high-concurrency voice AI nodes.
In a first aspect, the present application provides a method for processing high-concurrency voice AI node overall, where the method is applied to a device for processing high-concurrency voice AI node overall, and the method includes: connecting a first server to obtain a batch of concurrent voice requests; according to the batch concurrent voice requests, acquiring voice signals carried by each voice request, and outputting concurrent voice signals corresponding to the batch concurrent voice requests; constructing a concurrent processing channel model, wherein the concurrent processing channel model comprises a complexity recognition module, a voice splitting module and a channel configuration module; inputting the concurrent voice signals into the complexity recognition module, and carrying out signal processing complexity analysis on each signal according to the complexity recognition module to acquire signal complexity; splitting the concurrent voice signals according to the signal complexity by the voice splitting module to obtain a signal splitting result; inputting the signal splitting result into the channel configuration module to generate a plurality of concurrent processing nodes, wherein each concurrent processing node corresponds to one sub-channel; and carrying out multichannel parallel processing on the concurrent voice signals based on the plurality of concurrent processing nodes.
In a second aspect, the present application further provides a device for processing high concurrency voice AI node pool, where the device includes: the concurrent voice request acquisition module is used for connecting the first server to acquire batch concurrent voice requests; the concurrent voice signal output module is used for acquiring voice signals carried by each voice request according to the batch of concurrent voice requests and outputting concurrent voice signals corresponding to the batch of concurrent voice requests; the system comprises a building module, a processing module and a channel configuration module, wherein the building module is used for building a concurrent processing channel model, and the concurrent processing channel model comprises a complexity identification module, a voice splitting module and a channel configuration module; the signal processing complexity analysis module is used for inputting the concurrent voice signals into the complexity recognition module, and carrying out signal processing complexity analysis on each signal according to the complexity recognition module to acquire signal complexity; the signal splitting module is used for splitting the concurrent voice signals according to the voice splitting module and the signal complexity to obtain a signal splitting result; the processing node generation module is used for inputting the signal splitting result into the channel configuration module to generate a plurality of concurrent processing nodes, wherein each concurrent processing node corresponds to one sub-channel; and the parallel processing module is used for carrying out multichannel parallel processing on the concurrent voice signals based on the plurality of concurrent processing nodes.
One or more technical schemes provided by the application have at least the following technical effects or advantages:
acquiring a batch of concurrent voice requests through a first server; inputting concurrent voice signals in the batch of concurrent voice requests into a complexity recognition module, and carrying out signal processing complexity analysis on the concurrent voice signals according to the complexity recognition module to obtain signal complexity; splitting concurrent voice signals according to the signal complexity to obtain a signal splitting result; inputting the signal splitting result into a channel configuration module to generate a plurality of concurrent processing nodes; and carrying out multichannel parallel processing on the concurrent voice signals according to the plurality of concurrent processing nodes. The multi-channel parallel processing is carried out on the concurrent voice signals through the plurality of concurrent processing nodes, the recognition accuracy of the high concurrent voice signals is improved, and the technical effects of the recognition efficiency and the recognition quality of the high concurrent voice signals are improved.
The foregoing description is merely an overview of the present application and is provided to enable understanding of the present application and other objects, features and advantages of the present application, as embodied in the following specific examples.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments of the present disclosure will be briefly described below. It is apparent that the figures in the following description relate only to some embodiments of the present disclosure and are not limiting of the present disclosure.
FIG. 1 is a flow chart of the method for processing high concurrency voice AI node overall planning in the present application;
FIG. 2 is a schematic flow chart of acquiring signal complexity in the high concurrency voice AI node overall processing method of the application;
FIG. 3 is a schematic flow chart of determining N sub-channels connected by N concurrent signals in the high-concurrency voice AI node overall processing method;
fig. 4 is a schematic structural diagram of the voice AI node overall processing device for high concurrency.
Reference numerals illustrate: the system comprises a concurrent voice request acquisition module 11, a concurrent voice signal output module 12, a construction module 13, a signal processing complexity analysis module 14, a signal splitting module 15, a processing node generation module 16 and a parallel processing module 17.
Detailed Description
The application provides a method and a device for overall processing of high-concurrency voice AI nodes. The method solves the technical problems of insufficient recognition accuracy for high-concurrency voice signals and low recognition efficiency for the high-concurrency voice signals in the prior art. The multi-channel parallel processing is carried out on the concurrent voice signals through the plurality of concurrent processing nodes, the recognition accuracy of the high concurrent voice signals is improved, and the technical effects of the recognition efficiency and the recognition quality of the high concurrent voice signals are improved.
Referring to fig. 1, the present application provides a method for processing high-concurrency voice AI node overall, wherein the method is applied to a device for processing high-concurrency voice AI node overall, and the method specifically includes the following steps:
step S100: connecting a first server to obtain a batch of concurrent voice requests;
step S200: according to the batch concurrent voice requests, acquiring voice signals carried by each voice request, and outputting concurrent voice signals corresponding to the batch concurrent voice requests;
specifically, a first server is connected, and a batch concurrent voice request is received through the first server. The first server is in communication connection with the voice AI node overall processing device aiming at high concurrency. The first server may be any voice signal server having voice signal receiving and voice signal processing functions in the prior art. The batch concurrent voice request includes a plurality of voice requests. Each voice request includes a concurrent voice signal.
Step S300: constructing a concurrent processing channel model, wherein the concurrent processing channel model comprises a complexity recognition module, a voice splitting module and a channel configuration module;
step S400: inputting the concurrent voice signals into the complexity recognition module, and carrying out signal processing complexity analysis on each signal according to the complexity recognition module to acquire signal complexity;
further, as shown in fig. 2, step S400 of the present application further includes:
step S410: constructing a three-layer fully-connected neural network, and training the concurrent voice signals by utilizing the neural network to obtain a complexity recognition module embedded with a complexity recognition model;
step S420: the method comprises the steps of outputting background complexity for identifying the background environment complexity of a voice signal, interference complexity for identifying the interference degree of the voice signal and sound source complexity for identifying a sound source emitted by the voice signal according to a complexity identification module;
step S430: training according to the background complexity, the interference complexity and the sound source complexity, and outputting signal complexity for identifying convergence.
Specifically, historical data query is performed based on a plurality of concurrent voice signals in a batch of concurrent voice requests, and a plurality of groups of construction data are obtained. Each group of construction data comprises a history concurrent voice signal, and a history background complexity parameter, a history interference complexity parameter and a history sound source complexity parameter corresponding to the history concurrent voice signal. Then, 70% of the random data information in the plurality of sets of construction data is divided into training data sets. Random 30% of the data information in the plurality of sets of build data is divided into test data sets. Based on the fully connected neural network, cross-monitoring training is carried out on the training data set, and a complexity recognition model is obtained. And taking the test data set as input information, inputting a complexity identification model, carrying out parameter updating on the complexity identification model through the test data set, and embedding the complexity identification model into a complexity identification module of the concurrent processing channel model. The fully-connected neural network is also called as a multi-layer perceptron, and is an artificial neural network structure with a simpler connection mode. The fully-connected neural network is a feedforward neural network consisting of an input layer, a hidden layer and an output layer. Also, there may be multiple neurons in the hidden layer. The complexity identification model comprises an input layer, a hidden layer and an output layer.
Further, a plurality of concurrent voice signals in the batch of concurrent voice requests are used as input information and are input into a complexity recognition module in the concurrent processing channel model, background complexity, interference complexity and sound source complexity recognition is carried out on the plurality of concurrent voice signals through the complexity recognition module in the complexity recognition module, and a plurality of signal complexity corresponding to the plurality of concurrent voice signals is obtained. Each signal complexity comprises a background complexity parameter, an interference complexity parameter and a sound source complexity parameter corresponding to each concurrent voice signal. The background complexity parameter is data information for characterizing the background ambient complexity of the concurrent speech signal. The higher the background complexity parameter, the greater the background environment complexity of the corresponding concurrent speech signal. The interference complexity parameter is data information for characterizing the degree of interference suffered by the concurrent speech signal during transmission. The greater the interference complexity parameter, the higher the interference degree of the corresponding concurrent voice signals in the transmission process. The sound source complexity parameter is data information for characterizing the sound source complexity of the concurrent speech signal. The more the sound source types of the concurrent voice signals, the higher the sound source complexity of the concurrent voice signals, and the larger the corresponding sound source complexity parameters.
The method and the device achieve the technical effects that the complexity recognition module is used for carrying out signal processing complexity analysis on a plurality of concurrent voice signals in the batch concurrent voice requests to obtain a plurality of accurate signal complexities, so that the accuracy of denoising the batch concurrent voice requests is improved.
Step S500: splitting the concurrent voice signals according to the signal complexity by the voice splitting module to obtain a signal splitting result;
further, the step S500 of the present application further includes:
step S510: based on the concurrent voice signals, N concurrent signals with the signal complexity larger than the preset signal complexity are obtained, wherein N is a positive integer larger than or equal to 0;
specifically, a plurality of concurrent voice signals in a batch of concurrent voice requests and a plurality of signal complexity corresponding to the plurality of concurrent voice signals are transmitted to a voice splitting module in a concurrent processing channel model, wherein the voice splitting module comprises preset signal complexity. The preset signal complexity comprises a preset background complexity parameter threshold, an interference complexity parameter threshold and a sound source complexity parameter threshold.
Further, whether the signal complexity corresponding to each concurrent voice signal is larger than the preset signal complexity is judged respectively. If any one of the background complexity parameter, the interference complexity parameter and the sound source complexity parameter of the concurrent voice signal is larger than the corresponding background complexity parameter threshold, the interference complexity parameter threshold and the sound source complexity parameter threshold, the signal complexity of the concurrent voice signal is larger than the preset signal complexity, and the concurrent voice signal is added to the N concurrent signals. Wherein N is a positive integer greater than or equal to 0. The N concurrent signals comprise a plurality of concurrent voice signals with signal complexity greater than the preset signal complexity.
Step S520: setting a first leaf node by taking the N concurrent signals as a first splitting result;
further, step S520 of the present application further includes:
step S521: constructing a triplet list according to the background complexity, the interference complexity and the sound source complexity corresponding to each voice signal in the concurrent voice signal set;
step S522: connecting the triplet list to obtain N triples corresponding to the N concurrent signals;
specifically, a triplet list is constructed based on a plurality of signal complexities corresponding to a plurality of concurrent speech signals. The triplet list includes a plurality of triples corresponding to the plurality of concurrent voice signals. Each triplet includes a background complexity parameter, an interference complexity parameter, and a sound source complexity parameter corresponding to each concurrent speech signal. And then, inputting the N concurrent signals into a triplet list, and matching the N concurrent signals through the triplet list to obtain N triples corresponding to the N concurrent signals.
Step S523: determining N sub-channels connected by the N concurrent signals by calling the N triples;
further, as shown in fig. 3, step S523 of the present application further includes:
step S5231: identifying the N triples, and determining a first identification index, wherein the first identification index is the index with the largest complexity among the background complexity, the interference complexity and the sound source complexity;
step S5232: obtaining N first identification indexes by using the N concurrent signals;
step S5233: acquiring N matched sub-nodes of the first leaf node according to the N first identification indexes, wherein the first leaf node comprises sub-nodes set based on the background complexity, the interference complexity and the sound source complexity;
step S5234: and connecting the N sub-channels by the N matched sub-nodes to perform multi-channel parallel processing.
Step S524: and carrying out multichannel parallel processing on the N concurrent signals according to the N sub-channels.
Specifically, the maximum complexity parameter in each triplet among the N triples is set as the first identification index, and N first identification indexes are obtained. Wherein each first identification index comprises a maximum complexity parameter within each of the N triples.
Further, setting the N concurrent signals as a first splitting result, and generating a first leaf node according to the first splitting result and the N first identification indexes. The first split result includes N concurrent signals. The first leaf node includes N matching child nodes. Each matching sub-node comprises one concurrent signal in the first splitting result and a first identification index corresponding to the concurrent signal. And then, inputting the N matched sub-nodes into a channel configuration module in the concurrent processing channel model, matching N sub-channels corresponding to the N matched sub-nodes according to the channel configuration module, and carrying out multi-channel parallel processing on the N concurrent signals through the N sub-channels. The channel configuration module comprises a plurality of preset sub-channels, and each preset sub-channel comprises a preset first identification index and a preset channel processing parameter. The preset first identification index comprises a preset background complexity parameter, a preset interference complexity parameter and a preset sound source complexity parameter. The preset channel processing parameters comprise preset voice denoising parameters and preset voice separation parameters corresponding to preset first identification indexes. The N sub-channels comprise N preset sub-channels corresponding to the N matched sub-nodes in the channel configuration module. The multichannel parallel processing comprises voice denoising and voice separation of N concurrent signals through N sub-channels.
The technical effects of matching N sub-channels corresponding to N concurrent signals through the channel configuration module and improving the accuracy of denoising and noise separation on high concurrent voice signals are achieved.
Step S530: setting a second leaf node by taking the rest concurrent signals except the N concurrent signals in the concurrent voice signals as a second splitting result;
further, step S530 of the present application further includes:
step S531: acquiring the residual concurrent signals except the N concurrent signals in the concurrent voice signals;
specifically, a plurality of concurrent voice signals except for the N concurrent signals in the batch concurrent voice requests are set as residual concurrent signals, and the residual concurrent signals are added to the second splitting result. The residual concurrent signals comprise a plurality of concurrent voice signals except N concurrent signals in the batch of concurrent voice requests. The second split result includes the remaining concurrent signals.
Step S532: identifying the signal processing amount of the residual concurrent signals, if the signal processing amount is larger than the preset signal processing amount, performing sub-node configuration on the second leaf nodes, and outputting M second sub-nodes, wherein M is a positive integer larger than or equal to 0;
further, step S532 of the present application further includes:
step S5321: acquiring the database storage performance of the first server and the parallel capacity of a processor;
step S5322: and carrying out load balancing identification according to the storage performance of the database and the parallel capacity of the processor, generating a first constraint condition, and carrying out child node configuration on the second leaf node according to the first constraint condition.
Step S533: and carrying out multichannel parallel processing on the residual concurrent signals by using the M second sub-nodes.
Step S540: and configuring the channel configuration module according to the first leaf node and the second leaf node serving as primary nodes.
Step S600: inputting the signal splitting result into the channel configuration module to generate a plurality of concurrent processing nodes, wherein each concurrent processing node corresponds to one sub-channel;
step S700: and carrying out multichannel parallel processing on the concurrent voice signals based on the plurality of concurrent processing nodes.
Specifically, the signal processing amount is obtained by performing signal processing amount identification on the remaining concurrent signals. Judging whether the signal processing amount is larger than the preset signal processing amount, if so, inputting the residual concurrent signals into a triplet list, and matching the residual concurrent signals through the triplet list to obtain residual signal triples. Wherein the signal processing amount includes the number of signals of the remaining concurrent signals. The preset signal processing amount comprises a preset determined signal quantity threshold value of the residual concurrent signals. And the remaining signal triplets comprise triplets corresponding to the remaining concurrent signals in the triplet list.
Further, the data query is carried out on the first server, the database storage performance and the processor parallel capacity are obtained, the load balancing identification is carried out on the database storage performance and the processor parallel capacity, and the first constraint condition is generated. Wherein the first constraint includes a maximum number of concurrent speech signals that the first server is capable of processing simultaneously. Illustratively, when load balancing identification is performed on the database storage performance and the parallel capacity of the processor, historical data query is performed according to the database storage performance and the parallel capacity of the processor, so that a load balancing identification database is obtained. The load balancing identification database comprises a plurality of groups of load balancing identification data. Each set of load balancing identification data comprises historical database storage performance, historical processor parallel capacity and historical processing maximum concurrent voice signal quantity. And taking the database storage performance and the parallel capacity of the processor as input information, inputting the input information into a load balancing identification database, and obtaining a first constraint condition.
And further, carrying out child node configuration on the residual concurrent signals based on the first constraint condition and the residual signal triplets to obtain a second leaf node. The second leaf node includes M second child nodes. And then, inputting the M second sub-nodes and the remaining signal triplets into a channel configuration module in the concurrent processing channel model, matching M sub-channels corresponding to the M second sub-nodes according to the channel configuration module, and carrying out multi-channel parallel processing on the remaining concurrent signals through the M sub-channels. The signal splitting result comprises a first splitting result and a second splitting result. The plurality of concurrent processing nodes comprise M second sub-nodes and N matched sub-nodes. M is a positive integer greater than or equal to 0. Each second child node includes a plurality of concurrent voice signals from the remaining concurrent signals. Each of the M sub-channels includes a preset sub-channel within the channel configuration module corresponding to each of the second sub-nodes. The multi-channel parallel processing comprises voice denoising and voice separation of the residual concurrent signals through M sub-channels.
Illustratively, when the remaining concurrent signals are sub-node configured based on the first constraint condition and the remaining signal triplet, historical data query is performed based on the remaining concurrent signals, and a plurality of node configuration data sets are obtained. Each node configuration dataset includes a historical first constraint, a historical residual signal triplet, a historical residual concurrent signal, a historical second leaf node. Based on the convolutional neural network, the node configuration data sets are continuously self-trained and learned to a convergence state, and a residual signal node configuration model is obtained. And inputting the first constraint condition, the residual signal triples and the residual concurrent signals into a residual signal node configuration model to obtain a second leaf node. Convolutional neural networks are a class of feedforward neural networks that involve convolutional computations and have a deep structure. The convolutional neural network has characteristic learning capability and can carry out translation invariant classification on input information according to a hierarchical structure of the convolutional neural network. The residual signal node configuration model comprises an input layer, an implicit layer and an output layer. The residual signal node configuration model has the function of carrying out sub-node configuration on residual concurrent signals according to the first constraint condition and the residual signal triplets.
The multi-channel parallel processing is carried out on the concurrent voice signals through the plurality of concurrent processing nodes, so that the accuracy of voice denoising and voice separation on the concurrent voice signals is improved, and the technical effect of high recognition efficiency of the concurrent voice signals is improved.
In summary, the overall processing method for the high-concurrency voice AI node provided by the application has the following technical effects:
1. acquiring a batch of concurrent voice requests through a first server; inputting concurrent voice signals in the batch of concurrent voice requests into a complexity recognition module, and carrying out signal processing complexity analysis on the concurrent voice signals according to the complexity recognition module to obtain signal complexity; splitting concurrent voice signals according to the signal complexity to obtain a signal splitting result; inputting the signal splitting result into a channel configuration module to generate a plurality of concurrent processing nodes; and carrying out multichannel parallel processing on the concurrent voice signals according to the plurality of concurrent processing nodes. The multi-channel parallel processing is carried out on the concurrent voice signals through the plurality of concurrent processing nodes, the recognition accuracy of the high concurrent voice signals is improved, and the technical effects of the recognition efficiency and the recognition quality of the high concurrent voice signals are improved.
2. N sub-channels corresponding to the N concurrent signals are matched through the channel configuration module, so that the accuracy of denoising and noise separation of the high concurrent voice signals is improved.
3. N sub-channels corresponding to the N concurrent signals are matched through the channel configuration module, so that the accuracy of denoising and noise separation of the high concurrent voice signals is improved.
In a second embodiment, based on the same inventive concept as the method for processing high-concurrency voice AI nodes in the foregoing embodiment, the present application further provides a device for processing high-concurrency voice AI nodes, referring to fig. 4, where the device includes:
the concurrent voice request acquisition module 11 is used for connecting a first server to acquire batch concurrent voice requests;
the concurrent voice signal output module 12 is configured to obtain a voice signal carried by each voice request according to the batch of concurrent voice requests, and output a concurrent voice signal corresponding to the batch of concurrent voice requests;
the construction module 13 is used for constructing a concurrent processing channel model, and the concurrent processing channel model comprises a complexity identification module, a voice splitting module and a channel configuration module;
the signal processing complexity analysis module 14 is configured to input the concurrent speech signal into the complexity recognition module, and perform signal processing complexity analysis on each signal according to the complexity recognition module to obtain a signal complexity;
the signal splitting module 15 is configured to split the concurrent speech signal according to the signal complexity by using the signal splitting module 15, so as to obtain a signal splitting result;
the processing node generating module 16 is configured to generate a plurality of concurrent processing nodes, where each concurrent processing node corresponds to one sub-channel, by inputting the signal splitting result into the channel configuration module by the processing node generating module 16;
the parallel processing module 17 is configured to perform multi-channel parallel processing on the concurrent speech signals based on the multiple concurrent processing nodes.
Further, the system further comprises:
the first execution module is used for building a three-layer fully-connected neural network, and training the concurrent voice signals by utilizing the neural network to obtain a complexity recognition module embedded with a complexity recognition model;
the complexity output module is used for outputting background complexity for identifying the background environment complexity of the voice signal, interference complexity for identifying the interference degree of the voice signal and sound source complexity for identifying the sound source sent by the voice signal according to the complexity identification module;
and the second execution module is used for training with the background complexity, the interference complexity and the sound source complexity and outputting the signal complexity for identifying convergence.
Further, the system further comprises:
the concurrent signal acquisition module is used for acquiring N concurrent signals with the signal complexity larger than the preset signal complexity based on the concurrent voice signals, wherein N is a positive integer larger than or equal to 0;
the first leaf node setting module is used for setting a first leaf node by taking the N concurrent signals as a first splitting result;
the second leaf node setting module is used for setting a second leaf node by taking the rest concurrent signals except the N concurrent signals in the concurrent voice signals as a second splitting result;
and the third execution module is used for configuring the channel configuration module according to the first leaf node and the second leaf node serving as primary nodes.
Further, the system further comprises:
the list construction module is used for constructing a triplet list according to the background complexity, the interference complexity and the sound source complexity corresponding to each voice signal in the concurrent voice signal set;
the triplet acquisition module is used for connecting the triplet list to obtain N triples corresponding to the N concurrent signals;
the fourth execution module is used for determining N sub-channels connected by the N concurrent signals by calling the N triples;
and the fifth execution module is used for carrying out multichannel parallel processing on the N concurrent signals according to the N sub-channels.
Further, the system further comprises:
the first identification index acquisition module is used for identifying the N triples and determining a first identification index, wherein the first identification index is the index with the largest complexity among the background complexity, the interference complexity and the sound source complexity;
the sixth execution module is used for obtaining N first identification indexes by using the N concurrent signals;
the matching sub-node acquisition module is used for acquiring N matching sub-nodes of the first leaf node according to the N first identification indexes, wherein the first leaf node comprises sub-nodes set based on the background complexity, the interference complexity and the sound source complexity;
and the seventh execution module is used for connecting the N sub-channels with the N matched sub-nodes to perform multi-channel parallel processing.
Further, the system further comprises:
the residual concurrent signal determining module is used for acquiring residual concurrent signals except the N concurrent signals in the concurrent voice signals;
the second sub-node acquisition module is used for identifying the signal processing amount of the residual concurrent signals, if the signal processing amount is larger than the preset signal processing amount, performing sub-node configuration on the second leaf node, and outputting M second sub-nodes, wherein M is a positive integer larger than or equal to 0;
and the eighth execution module is used for carrying out multichannel parallel processing on the residual concurrent signals by the M second sub-nodes.
Further, the system further comprises:
the server characteristic acquisition module is used for acquiring the database storage performance and the parallel capacity of the processor of the first server;
and the load balancing identification module is used for carrying out load balancing identification according to the storage performance of the database and the parallel capacity of the processor, generating a first constraint condition and carrying out child node configuration on the second leaf node according to the first constraint condition.
The high-concurrency-oriented voice AI node overall processing device provided by the embodiment of the application can execute the high-concurrency-oriented voice AI node overall processing method provided by any embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method.
All the included modules are only divided according to the functional logic, but are not limited to the above-mentioned division, so long as the corresponding functions can be realized; in addition, the specific names of the functional modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application.
The application provides a method for overall processing of high-concurrency voice AI nodes, wherein the method is applied to an overall processing device of the high-concurrency voice AI nodes, and the method comprises the following steps: acquiring a batch of concurrent voice requests through a first server; inputting concurrent voice signals in the batch of concurrent voice requests into a complexity recognition module, and carrying out signal processing complexity analysis on the concurrent voice signals according to the complexity recognition module to obtain signal complexity; splitting concurrent voice signals according to the signal complexity to obtain a signal splitting result; inputting the signal splitting result into a channel configuration module to generate a plurality of concurrent processing nodes; and carrying out multichannel parallel processing on the concurrent voice signals according to the plurality of concurrent processing nodes. The method solves the technical problems of insufficient recognition accuracy for high-concurrency voice signals and low recognition efficiency for the high-concurrency voice signals in the prior art. The multi-channel parallel processing is carried out on the concurrent voice signals through the plurality of concurrent processing nodes, the recognition accuracy of the high concurrent voice signals is improved, and the technical effects of the recognition efficiency and the recognition quality of the high concurrent voice signals are improved.
Note that the above is only a preferred embodiment of the present application and the technical principle applied. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, while the application has been described in connection with the above embodiments, the application is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the application, which is set forth in the following claims.

Claims (7)

1. The method for overall processing of the high-concurrency voice AI node is characterized by comprising the following steps:
connecting a first server to obtain a batch of concurrent voice requests;
according to the batch concurrent voice requests, acquiring voice signals carried by each voice request, and outputting concurrent voice signals corresponding to the batch concurrent voice requests;
constructing a concurrent processing channel model, wherein the concurrent processing channel model comprises a complexity recognition module, a voice splitting module and a channel configuration module;
inputting the concurrent voice signals into the complexity recognition module, and carrying out signal processing complexity analysis on each signal according to the complexity recognition module to acquire signal complexity;
splitting the concurrent voice signals according to the signal complexity by the voice splitting module to obtain a signal splitting result;
inputting the signal splitting result into the channel configuration module to generate a plurality of concurrent processing nodes, wherein each concurrent processing node corresponds to one sub-channel;
carrying out multichannel parallel processing on the concurrent voice signals based on the plurality of concurrent processing nodes;
the complexity recognition module analyzes the signal processing complexity of each signal to obtain the signal complexity, and the method comprises the following steps:
constructing a three-layer fully-connected neural network, and training the concurrent voice signals by utilizing the neural network to obtain a complexity recognition module embedded with a complexity recognition model;
the method comprises the steps of outputting background complexity for identifying the background environment complexity of a voice signal, interference complexity for identifying the interference degree of the voice signal and sound source complexity for identifying a sound source emitted by the voice signal according to a complexity identification module;
training according to the background complexity, the interference complexity and the sound source complexity, and outputting signal complexity for identifying convergence.
2. The method of claim 1, wherein the method further comprises:
based on the concurrent voice signals, N concurrent signals with the signal complexity larger than the preset signal complexity are obtained, wherein N is a positive integer larger than or equal to 0;
setting a first leaf node by taking the N concurrent signals as a first splitting result;
setting a second leaf node by taking the rest concurrent signals except the N concurrent signals in the concurrent voice signals as a second splitting result;
and configuring the channel configuration module according to the first leaf node and the second leaf node serving as primary nodes.
3. The method of claim 2, wherein the first leaf node is set with the N concurrent signals as a first split result, the method further comprising:
constructing a triplet list according to the background complexity, the interference complexity and the sound source complexity corresponding to each voice signal in the concurrent voice signal set;
connecting the triplet list to obtain N triples corresponding to the N concurrent signals;
determining N sub-channels connected by the N concurrent signals by calling the N triples;
and carrying out multichannel parallel processing on the N concurrent signals according to the N sub-channels.
4. The method of claim 3, wherein determining N sub-channels of the N concurrent signal connections comprises:
identifying the N triples, and determining a first identification index, wherein the first identification index is the index with the largest complexity among the background complexity, the interference complexity and the sound source complexity;
obtaining N first identification indexes by using the N concurrent signals;
acquiring N matched sub-nodes of the first leaf node according to the N first identification indexes, wherein the first leaf node comprises sub-nodes set based on the background complexity, the interference complexity and the sound source complexity;
and connecting the N sub-channels by the N matched sub-nodes to perform multi-channel parallel processing.
5. The method of claim 2, wherein the remaining concurrent signals other than the N concurrent signals in the concurrent voice signals are used as a second split result, the method further comprising:
acquiring the residual concurrent signals except the N concurrent signals in the concurrent voice signals;
identifying the signal processing amount of the residual concurrent signals, if the signal processing amount is larger than the preset signal processing amount, performing sub-node configuration on the second leaf nodes, and outputting M second sub-nodes, wherein M is a positive integer larger than or equal to 0;
and carrying out multichannel parallel processing on the residual concurrent signals by using the M second sub-nodes.
6. The method of claim 5, wherein the method further comprises:
acquiring the database storage performance of the first server and the parallel capacity of a processor;
and carrying out load balancing identification according to the storage performance of the database and the parallel capacity of the processor, generating a first constraint condition, and carrying out child node configuration on the second leaf node according to the first constraint condition.
7. A voice AI node orchestration processing device for high concurrency, wherein the device is configured to perform the method of any one of claims 1-6, the device comprising:
the concurrent voice request acquisition module is used for connecting the first server to acquire batch concurrent voice requests;
the concurrent voice signal output module is used for acquiring voice signals carried by each voice request according to the batch of concurrent voice requests and outputting concurrent voice signals corresponding to the batch of concurrent voice requests;
the system comprises a building module, a processing module and a channel configuration module, wherein the building module is used for building a concurrent processing channel model, and the concurrent processing channel model comprises a complexity identification module, a voice splitting module and a channel configuration module;
the signal processing complexity analysis module is used for inputting the concurrent voice signals into the complexity recognition module, and carrying out signal processing complexity analysis on each signal according to the complexity recognition module to acquire signal complexity;
the signal splitting module is used for splitting the concurrent voice signals according to the voice splitting module and the signal complexity to obtain a signal splitting result;
the processing node generation module is used for inputting the signal splitting result into the channel configuration module to generate a plurality of concurrent processing nodes, wherein each concurrent processing node corresponds to one sub-channel;
the parallel processing module is used for carrying out multichannel parallel processing on the concurrent voice signals based on the plurality of concurrent processing nodes;
the first execution module is used for building a three-layer fully-connected neural network, and training the concurrent voice signals by utilizing the neural network to obtain a complexity recognition module embedded with a complexity recognition model;
the complexity output module is used for outputting background complexity for identifying the background environment complexity of the voice signal, interference complexity for identifying the interference degree of the voice signal and sound source complexity for identifying the sound source sent by the voice signal according to the complexity identification module;
and the second execution module is used for training with the background complexity, the interference complexity and the sound source complexity and outputting the signal complexity for identifying convergence.
CN202310720865.4A 2023-06-19 2023-06-19 High-concurrency voice AI node overall processing method and device Active CN116453523B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310720865.4A CN116453523B (en) 2023-06-19 2023-06-19 High-concurrency voice AI node overall processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310720865.4A CN116453523B (en) 2023-06-19 2023-06-19 High-concurrency voice AI node overall processing method and device

Publications (2)

Publication Number Publication Date
CN116453523A CN116453523A (en) 2023-07-18
CN116453523B true CN116453523B (en) 2023-09-08

Family

ID=87130603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310720865.4A Active CN116453523B (en) 2023-06-19 2023-06-19 High-concurrency voice AI node overall processing method and device

Country Status (1)

Country Link
CN (1) CN116453523B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103270508A (en) * 2010-09-08 2013-08-28 Dts(英属维尔京群岛)有限公司 Spatial audio encoding and reproduction of diffuse sound
CN111415675A (en) * 2020-02-14 2020-07-14 北京声智科技有限公司 Audio signal processing method, device, equipment and storage medium
CN112309392A (en) * 2020-04-27 2021-02-02 江苏理工学院 Voice control integrated intelligent household system and method thereof
CN112712812A (en) * 2020-12-24 2021-04-27 腾讯音乐娱乐科技(深圳)有限公司 Audio signal generation method, device, equipment and storage medium
CN114783459A (en) * 2022-03-28 2022-07-22 腾讯科技(深圳)有限公司 Voice separation method and device, electronic equipment and storage medium
CN114822531A (en) * 2022-03-30 2022-07-29 湖南大友恒实业有限公司 Liquid crystal television based on AI voice intelligent control
CN115062124A (en) * 2022-05-27 2022-09-16 普强时代(珠海横琴)信息技术有限公司 Questionnaire data processing method and device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190045038A (en) * 2017-10-23 2019-05-02 삼성전자주식회사 Method and apparatus for speech recognition
CN109559734B (en) * 2018-12-18 2022-02-18 百度在线网络技术(北京)有限公司 Acceleration method and device for acoustic model training

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103270508A (en) * 2010-09-08 2013-08-28 Dts(英属维尔京群岛)有限公司 Spatial audio encoding and reproduction of diffuse sound
CN111415675A (en) * 2020-02-14 2020-07-14 北京声智科技有限公司 Audio signal processing method, device, equipment and storage medium
CN112309392A (en) * 2020-04-27 2021-02-02 江苏理工学院 Voice control integrated intelligent household system and method thereof
CN112712812A (en) * 2020-12-24 2021-04-27 腾讯音乐娱乐科技(深圳)有限公司 Audio signal generation method, device, equipment and storage medium
CN114783459A (en) * 2022-03-28 2022-07-22 腾讯科技(深圳)有限公司 Voice separation method and device, electronic equipment and storage medium
CN114822531A (en) * 2022-03-30 2022-07-29 湖南大友恒实业有限公司 Liquid crystal television based on AI voice intelligent control
CN115062124A (en) * 2022-05-27 2022-09-16 普强时代(珠海横琴)信息技术有限公司 Questionnaire data processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116453523A (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN110728360B (en) Micro-energy device energy identification method based on BP neural network
Sakar et al. Growing and pruning neural tree networks
Sun Rule-base structure identification in an adaptive-network-based fuzzy inference system
CN112714032B (en) Wireless network protocol knowledge graph construction analysis method, system, equipment and medium
Richardson et al. Automated discovery of linear feedback models
CN114818703B (en) Multi-intention recognition method and system based on BERT language model and TextCNN model
CN111488946A (en) Radar servo system fault diagnosis method based on information fusion
JPH08227408A (en) Neural network
CN115578248A (en) Generalized enhanced image classification algorithm based on style guidance
Bi et al. Knowledge transfer for out-of-knowledge-base entities: Improving graph-neural-network-based embedding using convolutional layers
CN116453523B (en) High-concurrency voice AI node overall processing method and device
CN112836030B (en) Intelligent dialogue system and method
CN111582384B (en) Image countermeasure sample generation method
CN111897809A (en) Command information system data generation method based on generation countermeasure network
CN116541166A (en) Super-computing power scheduling server and resource management method
CN111639680A (en) Identity recognition method based on expert feedback mechanism
CN115130617B (en) Detection method for continuous increase of self-adaptive satellite data mode
CN115798515A (en) Transform-based sound scene classification method
CN115545960A (en) Electronic information data interaction system and method
Mousa et al. Identification the modulation type in cognitive radio network based on Alexnet architecture
CN116340573B (en) Data scheduling method and system of intelligent platform architecture
CN111368060A (en) Self-learning method, device and system for conversation robot, electronic equipment and medium
CN113344169B (en) Novel tractor fault diagnosis system and fault diagnosis method
Liang et al. Deep latent position model for node clustering in graphs
CN116610820A (en) Knowledge graph entity alignment method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant