CN110351445B - High-concurrency VOIP recording service system based on intelligent voice recognition - Google Patents
High-concurrency VOIP recording service system based on intelligent voice recognition Download PDFInfo
- Publication number
- CN110351445B CN110351445B CN201910530307.5A CN201910530307A CN110351445B CN 110351445 B CN110351445 B CN 110351445B CN 201910530307 A CN201910530307 A CN 201910530307A CN 110351445 B CN110351445 B CN 110351445B
- Authority
- CN
- China
- Prior art keywords
- recording
- voice
- media stream
- media
- session
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 claims abstract description 101
- 230000008569 process Effects 0.000 claims abstract description 79
- 239000000872 buffer Substances 0.000 claims abstract description 34
- 238000000605 extraction Methods 0.000 claims abstract description 9
- 230000007246 mechanism Effects 0.000 claims description 46
- 238000012545 processing Methods 0.000 claims description 43
- 238000004891 communication Methods 0.000 claims description 27
- 238000007726 management method Methods 0.000 claims description 23
- 230000011664 signaling Effects 0.000 claims description 10
- 230000004044 response Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- 230000000903 blocking effect Effects 0.000 claims description 5
- 238000012544 monitoring process Methods 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 claims description 4
- 239000007853 buffer solution Substances 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 4
- 238000013523 data management Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 230000003139 buffering effect Effects 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 241001672694 Citrus reticulata Species 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 101001072091 Homo sapiens ProSAAS Proteins 0.000 description 1
- 102100036366 ProSAAS Human genes 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/542—Event management; Broadcasting; Multicasting; Notifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/547—Remote procedure calls [RPC]; Web services
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/16—Storage of analogue signals in digital stores using an arrangement comprising analogue/digital [A/D] converters, digital memories and digital/analogue [D/A] converters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/42221—Conversation recording systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M7/00—Arrangements for interconnection between switching centres
- H04M7/006—Networks other than PSTN/ISDN providing telephone service, e.g. Voice over Internet Protocol (VoIP), including next generation networks with a packet-switched transport layer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/547—Messaging middleware
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/548—Queue
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Telephonic Communication Services (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention relates to a high-concurrency VOIP recording service system based on intelligent voice recognition, which comprises a recording service module, a voice recognition module and a voice recognition module, wherein the recording service module is used for directly decoding and storing audio of left and right sound channels at a device layer and a chip layer to obtain recording audio; the buffer register is configured on a software layer and is subjected to time synchronization, and when audio concurrency higher than the current capacity occurs, the capacity of the buffer register is increased as required; the recording file management module is used for inputting the recording audio decoded and stored by the decoding storage module into a buffer queue; the voice recognition engine is used for coding and decoding the recorded audio and assembling the voice media data packet into correct voice media stream data through feature extraction; the voice media stream data is transmitted to the background service system through the MQ message queue middleware, and the scheme can not only process more than 200 session processes at the same time, but also improve the recording quality.
Description
Technical Field
The invention relates to the field of recording service, in particular to a high-concurrency VOIP recording service system based on intelligent voice recognition.
Background
With the rapid development of IT technology, the traditional PSTN telephone Network has been unable to meet the communication requirement, especially after the occurrence of VOIP, VOIP (Voice over Internet protocol) simply digitizes the analog signal (Voice) and transmits IT in the form of Data Packet (Data Packet) in real time on the IP Network (IP Network). Enterprises adopt VOIP technology to gradually replace call center services based on PSTN lines so as to meet the requirements of convenient, uniform and cheap communication. However, with the demand and industry upgrade brought by the mobile internet technology, the traditional recording service has been difficult to meet the urgent demands of customers for high concurrency, quick identification, lower-cost operation using machine learning to replace the manpower of a call center, and the like.
The prior art has the following disadvantages: in the conventional solution at present, an AI engine is generally responsible for recognition and decoding of audio input and feature extraction, so as to enter a recognition stage, while the decoding process itself consumes system resources, which is expensive in time, and when multiple paths of voice inputs occur simultaneously, the consumption of the system resources is very large, thereby causing the problems of poor recording quality and unsmooth call, which cannot process voice recognition in large batch, and can only achieve 200 concurrent capabilities at most.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a high-concurrency VOIP recording service system based on intelligent voice recognition, which can not only process more than 200 session processes simultaneously, but also improve the recording quality.
The purpose of the invention is realized by the following technical scheme:
high concurrency VOIP recording service system based on intelligent speech recognition, the system includes:
the recording service module is used for directly decoding and storing the audio of the left and right sound channels at the equipment and chip layers to obtain recording audio;
the buffer register is configured on a software layer and is subjected to time synchronization, and when audio concurrency higher than the current capacity occurs, the capacity of the buffer register is increased as required;
the recording file management module is used for inputting the recording audio decoded and stored by the decoding storage module into a buffer queue;
the voice recognition engine is used for coding and decoding the recorded audio and assembling the voice media data packet into correct voice media stream data through feature extraction;
and the voice media stream data is transmitted to the background service system through the MQ message queue middleware.
Furthermore, the recording service module adopts a subprocess cluster module to introduce a multi-process working mode, and each process runs in a single thread; recording core service is divided into two types of main process and working process, a subprocess cluster is realized by adopting a cluster mechanism of node.js, and the main technical points are subprocess state monitoring, a reliable interprocess communication mechanism and a load scheduling mechanism.
Furthermore, the reliable interprocess communication mechanism adopts a cluster interprocess communication mechanism to transmit data among the subprocesses and assists in self-defining an RPC mechanism, and the interprocess communication mechanism comprises the following steps:
1) a father process investigates a pipe function to create a pipe, obtains two file descriptors and points to two ends of the pipe;
2) a parent process investigates fork to create a child process, and the child process has two file descriptors and points to two ends of the same pipeline;
3) and the parent process closes reading, and the child process closes writing, so that the parent process writes information into the pipeline, and the child process reads from the pipeline.
Further, the custom RPC mechanism defines two operations, namely request/response and event notification;
request response:
the method is characterized in that the method is sent by an RPC caller, a call parameter is coded by an RPC framework layer and then transmitted to an RPC processor by an inter-process communication mechanism for processing, and after the processing of the RPC processor is completed, a processing result is coded by the RPC framework layer and then transmitted to the RPC caller by the inter-process communication mechanism for subsequent processing.
Event notification:
the event notification is realized by calling an RPC notification interface by an event source to transmit event parameters, and the RPC framework layer encodes the event parameters and transmits the encoded event parameters to an event listener by an inter-process communication mechanism to be processed.
Furthermore, the load scheduling mechanism realizes the scheduling of the work process by using a minimum load mode, the load calculation adopts a hook mechanism of RPC, the execution condition of the load distributed to the worker by the master is analyzed, and the current load of the worker is calculated according to a calculation strategy;
the load scheduling mechanism of the sub-process cluster calculates the working load of each working process, the newly added load is always put into the working process with the minimum load, a recording session is established after a recording core service receives a request initiated by equipment, each session consists of two interactive states of a signaling channel and a media channel, the signaling channel receives a control instruction of the equipment side to the session, the media channel receives media stream data transmitted by the equipment side, and the recording session management module distributes the signaling session and the media session into different processes.
Furthermore, the recording service module is configured with a media stream management module, after receiving the message that the recording session is successfully established, the device sends the media stream to a media session SOCKET established by the recording service, the media stream management module performs rapid verification of the media packet and extraction of packet structure information, and after the media packet structure information is successfully extracted, the media stream can be delivered to a recording engine to convert the voice media stream on the network into a voice file on a disk;
the media stream management module compresses and packetizes the recording stream on the basis of a G729 protocol, the recording service adopts a real-time transmission mode, a worker is called to process the media stream after a main process receives the media stream, the worker uses packetization processing when processing the media stream, an incoming/outgoing call occupies one channel to transmit the media stream, the incoming/outgoing call stream is packetized in the media stream module, and the incoming/outgoing call media stream is not directly merged when storing a file but is directly stored as an original byte code.
Further, the recording engine uses the jitter buffer to smoothly process the problem of packet loss and disorder of voice data packets, and replaces the packet loss and silence with 0db voice based on the specification of the recording system without performing optimization processing of voice.
Further, the Jitter buffer processor stores the state information of the current recording voice stream, and stores a segment of the voice stream in the memory, when receiving an RTP packet, the Jitter buffer processor first needs to extract the meta-information of the RTP packet to determine which part of the voice stream in the RTP packet so as to place the RTP packet at the proper position of the current segment, and after the current segment is finished, the Jitter buffer processor transfers the voice segment to the file memory to be stored on the disk, and then prepares to process the RTP message of the next segment.
Besides the jitter buffer, the recording engine also comprises a file memory for storing the audio of the left and right sound channels which are directly decoded and stored on the equipment and chip layers to obtain the recorded audio, and the file memory determines the path and the name of the recording file corresponding to the session according to the configuration and determines when the writing operation and the closing operation of the session recording file are performed.
Further, the recording engine uses a 4-level directory to divide the storage space of the recording file; the method comprises the steps of firstly, storing sound recording files generated by each VoIP calling device separately, then installing the date, hour and minute of the beginning of a sound recording session, storing the actual sound recording files in a minute directory of the level 4, and simultaneously realizing a quantity limiter to avoid the situation that the directory of the level of a single minute is too large, so as to support the storage and management of the sound recording files of the level of tens of millions per day.
Further, the background service system comprises a device access system, a concurrent buffer system, a data management system, a service system and an application system;
the equipment access system uses an asynchronous non-blocking IO model to be responsible for processing interaction with equipment and completing processing and distribution of recorded streams, all business processes are not required to be returned after being processed, the equipment access system is used for processing accessed streaming media in a concentrative way, after the processing is completed, the upper layer application is informed through a message queue to continue subsequent business processing, and the access system can be released;
the concurrency buffer system uses MQ to isolate problems of overlong link, overlong service processing, high concurrency and the like, and completes data forwarding through a data format designed in a standard way;
the service system is responsible for services abstracted by the system and performs uniform abstraction on an upper layer;
the database uses two databases of Mysql and MongoDB according to the requirement, and the two databases are separately stored according to the service requirement.
The invention has the beneficial effects that: compared with the traditional recording service, the scheme directly decodes and stores the audio input of the left and right sound channels at the equipment and chip layers, the buffer is designed and time synchronization is well done at the software layer, the step needs to be completed together with hardware, and when thousands of audio concurrences occur, the buffer capacity is only increased as required, so that the voice recognition can be processed more efficiently and in real time.
Drawings
FIG. 1 is a schematic overall flow diagram of the present invention;
FIG. 2 is a schematic diagram of the composition of a platform layer according to the present invention;
FIG. 3 is a flow chart of the RPC mechanism of the present invention;
FIG. 4 is a schematic diagram of the voice recognition service module according to the present invention.
Detailed Description
The technical solution of the present invention is further described in detail with reference to the following specific examples, but the scope of the present invention is not limited to the following.
The technical challenge of the invention mainly lies in how to complete the simultaneous recording, storage and speech recognition translation for high-concurrency speech input. In the conventional solution at present, the AI engine is generally responsible for the recognition and decoding of the audio input and the feature extraction, so as to enter the recognition stage, and the decoding process itself consumes system resources, which is very terrorist in terms of time overhead cost.
As shown in fig. 1, in the system, in order to meet the real-time high-concurrency scene requirement, audio inputs of left and right channels are directly decoded and stored at a device layer and a chip layer, buffers are designed at a software layer and time synchronization is well performed, the steps need to be completed together with hardware, and when thousands of audio concurrencies occur, the speech recognition can be processed more efficiently and in real time only by increasing the capacity of the buffers as required. The system is composed of an equipment layer, a network layer, a platform layer and an application layer, and the improvement technology of the scheme is mainly embodied in the platform layer.
As shown in fig. 2, the platform layer is mainly composed of three major components, which are a recording service, a voice recognition service and a service management system.
Recording service module
The internal architecture of the recording service module is divided into 5 main modules, including a subprocess cluster, media stream management, recording session management, a recording engine and an encoding and decoding adapter.
The recording core service is a multi-process working mode, and each process is operated in a single thread mode.
The subprocess cluster module introduces a multi-process working mode and divides the recording core service into a main process and a working process. Js cluster is realized by a cluster mechanism of nodes, and the main technical points are subprocess state monitoring, a reliable interprocess communication mechanism and a load scheduling mechanism.
The reliable interprocess communication mechanism adopts the cluster interprocess communication mechanism (namely the pipe mechanism of the operating system) to transmit the data among the subprocesses, and assists in self-defining the RPC mechanism, thereby improving the convenience and reliability of the recording service.
Each process has different user address spaces, the global variable of any process cannot be seen in the other process, so data to be exchanged among the processes must pass through a kernel, a buffer area is opened in the kernel, the process 1 copies the data from the user space to the kernel buffer area, the process 2 reads the data from the kernel buffer area, and the mechanism provided by the kernel is called inter-process communication.
A pipe is one of the most basic IPC mechanisms, created by the pipe function. The conduit acts between processes with blood relationship and is passed through fork. When the pipe function is called, a buffer area (called a pipe) is opened in the kernel for communication, the buffer area has a reading end and a writing end, and then two file descriptors are transmitted to a user program through files parameters, the files [0] points to the reading end of the pipe, and the files [1] points to the writing end of the pipe (which is well remembered as if 0 is standard input 1 and standard output). The pipeline appears to the user program as an open file, either by read (files [0]), or by write (files [1]), and the reading and writing of data to this file is actually in the read-write kernel buffer. The pipe function call successfully returns 0, and the call failure returns-1.
The method comprises the following steps of using a pipe mechanism to realize process communication:
the father process investigates the pipe function to create a pipe, obtains two file descriptors and points to two ends of the pipe
The father process investigates fork to create a child process, and the child process has two file descriptors which are identical and points to two ends of the same pipeline
And the parent process closes reading, and the child process closes writing, so that the parent process writes information into the pipeline, and the child process reads from the pipeline.
As shown in FIG. 3, the voice recording service system further enhances convenience and reliability by customizing a set of RPC protocols, providing requests/responses and event machines.
The "event dispatcher" and the "communication endpoint a" and the "communication endpoint B" in fig. 3 belong to the communication framework layer of the RPC. The RPC protocol of the recording service defines two operations: request/response and event notification.
Request response:
the RPC framework layer encodes the calling parameters and transmits the encoded parameters to an inter-process communication mechanism to be transmitted to an RPC processor for processing. After the RPC processor finishes processing, the RPC framework layer encodes the processing result and transfers the processing result to the RPC caller by the inter-process communication mechanism for subsequent processing.
Event notification:
the event notification is realized by calling an RPC notification interface by an event source to transmit event parameters, and the RPC framework layer encodes the event parameters and transmits the encoded event parameters to an event listener by an inter-process communication mechanism to be processed.
And the sub-process state monitoring is realized by monitoring the process exit event of the cluster module, and the sub-process is restarted immediately after the process exit event is received.
The load scheduling mechanism of the subprocess cluster realizes the scheduling of the work process by using a minimum load mode, and the load calculation adopts a hook mechanism of RPC. The current workload of the worker is calculated according to a calculation strategy by analyzing the execution condition of the master for distributing the load to the worker (work process).
And the load scheduling mechanism of the subprocess cluster calculates the working load of each working process and always puts the newly added load into the working process with the minimum load. The recording session is established by the recording core service after receiving the request initiated by the equipment, and each path of session is composed of two interactive states of a signaling channel and a media channel. The signaling channel receives the control instruction of the device side to the session, and the media channel receives the media stream data transmitted by the device side. The recording session management module distributes the signaling session and the media session into different processes, and successfully solves two main requirements of centralized control required by the signaling session and large throughput required by the media session.
The recording service host process tells the sub-process to create a media session through an RPC mechanism. After the media session is successfully created, the recording service host process also needs to subscribe to an end event of the media session by using an RPC mechanism to monitor the working state of the media session.
After receiving the message that the recording session is successfully established, the device end sends the media stream to a media session SOCKET established by the recording server end, the media stream management module performs quick verification of the media packet and extraction of the packet structure information, and after the media packet structure information is successfully extracted, the media stream can be sent to a recording engine to convert the voice media stream on the network into a voice file on a disk.
The media stream management module compresses and packetizes the recording stream based on the protocols such as G729 (standard audio protocol). The recording service adopts a real-time transmission mode, after a main process receives a media stream, a worker is called to process the media stream, when the worker processes the media stream, the traditional processing mode (media stream executable voice file) is abandoned, the worker uses the sub-packet processing which is self-designed by the worker, an incoming/outgoing call occupies one channel to perform media stream transmission, the incoming/outgoing call stream is sub-packet processed in a media stream module, the incoming/outgoing call media stream is not directly merged when storing the file, but is directly stored as an original byte code, and thus, the mode of independently storing the incoming/outgoing call bytes greatly improves the processing efficiency and the system concurrency quantity of the recording service. When the recording file is played, the system internally combines the recording file and then outputs and plays the recording file.
And installing a recording processing path in the media stream management, transmitting the received voice media packets layer by layer, verifying and checking the packets in each layer according to the capability range of each layer, filtering illegal packets, and finally filtering to obtain effective media stream RTP packets.
Media data transmitted over a network typically suffers from packet loss, misordering, and the like. In order to save bandwidth, a silence packet is sometimes used to replace a period of ultra-low decibel voice data stream, and after receiving a media data packet, a recording engine module needs to identify and repair the conditions, and then the voice media data packet can be assembled into correct voice media stream data. In order to avoid slow disk IO blocking the process from running during storage, an asynchronous file IO mechanism provided by an operating system is used.
The recording engine uses the jitter buffer to smoothly process the problem of packet loss and disorder of the voice data packets. Based on the specification of a recording system, 0db sound is used for replacing packet loss and silence, and optimization processing of voice is not performed.
The Jitter buffer processor stores the state information of the current recorded voice stream and stores a segment of the voice stream in the memory. When receiving an RTP packet, it is first necessary to extract meta-information of the RTP packet to decide which part of the speech stream of the RTP packet to put the RTP packet at the proper position of the current segment. After the current segment is finished, the jitter buffer processor gives the voice segment to the file memory to be stored on the disk, and then the RTP message of the next segment is prepared to be processed.
In addition to the jitterbuffer, another important component of the sound recording engine is the file storage.
The file memory determines the path and name of the recording file corresponding to the session according to the configuration, and determines when the writing operation and closing operation of the session recording file are performed.
The recording engine divides the storage space of the recording file by a 4-level directory. The recording file generated by each VoIP calling device is stored separately (the device recording file directory is named by the device serial number), then the date, hour and minute of the beginning of the recording session are installed, and the actual recording file is stored in the minute directory of the 4 th level. A number limiter is also implemented to avoid a single minute level directory being too large. Thus, tens of millions of sound recording file storage and management are supported.
The specific working process comprises the following steps: after receiving the recording request, the recording session management directly decodes and stores the audio of the left and right sound channels in a file memory at the equipment and chip layer through a recording engine.
Meanwhile, the buffer register is configured on a software layer and time synchronization is performed, and when audio concurrency higher than the current capacity occurs, the capacity of the buffer register is increased as required;
the recording file management module is used for inputting the recording audio decoded and stored by the decoding storage module into a buffer queue;
the voice recognition engine is used for coding and decoding the recorded audio and assembling the voice media data packet into correct voice media stream data through feature extraction;
and the voice media stream data is transmitted to the background service system through the MQ message queue middleware.
The voice recognition engine is a voice recognition module based on artificial intelligence, and the voice recognition service module is shown in fig. 4, so that the packaging and scheduling of AI capabilities are realized, the AI service capabilities of a third party can be docked and referred, and meanwhile, the AI capabilities of a company can be used, so that the scene requirements of recording services are fully met.
The module finally needs to support Chinese and English speech recognition:
the Chinese only supports Mandarin, the format for acquiring the voice comprises formats such as PCM, WAV and the like, the Mandarin identification supports 8k/16k of sampling rate and 16bits of sampling depth;
the voice recognition comprises real-time voice recognition and file voice recognition;
the speech recognition outputs characters, and the recognition accuracy rate of the Putonghua reaches 95%;
background business system
The background service system mainly solves the high concurrency requirement in the real sense through the optimization of the message middleware and the bottom database.
The congestion caused by high concurrency scenes is solved in an asynchronous mode;
the old protocol is compatible, and all the current integrated gateway equipment can be smoothly accessed into the new recording service;
the expansibility is good, and the subsequent protocol analysis of the friend equipment and the adaptation of the general protocol of the Internet of things can be supported;
in the system architecture, the system pressure caused by the fact that the pressure borne by the system mainly comes from the access of a large number of devices is considered, so that an asynchronous non-blocking IO model is used in the access system, the thread execution efficiency is improved, and the system processing concurrency capability is increased. And in order to prevent the system from being punctured by high concurrency and further increase the processing capacity of the access system, the system is integrally divided into a hierarchy and designed into a hierarchy:
system for controlling a power supply | Technology stack | Description of the invention |
Device access system | Netty+Zookeeper | Distributed communication architecture based on asynchronous non-blocking, supporting dynamic capacity expansion |
Concurrent buffer system | RabbitMQ | Message queue clustering to reduce high concurrency puncture risk |
Data management | Mysql+MongoDB+Redis | Multiple data classified storage, database concurrent link reduction and data security improvement |
Business system | Springboot+Mybatis plus+Shiro+druid | SAAS-based multi-tenant system, unified equipment management design and standardized API design |
Application system | NodeJs+react | Normative front-end applications, independently deployable |
The equipment access system is responsible for processing interaction with the equipment, completing processing and distribution of the recorded stream, returning after all service flows are processed, the equipment access system is used for processing the accessed stream media in a concentrative way, and informing the upper layer application to continue subsequent service processing through a message queue after the processing is completed, so that the access system can release the processed stream media;
the concurrent buffering uses MQ to isolate problems of overlong link, overlong service processing, high concurrency and the like, and data forwarding is completed through a data format designed in a standard way;
the service system only concerns the service abstracted by the system, and uniform abstraction is carried out on the upper layer.
The database uses two databases of Mysql and MongoDB according to the requirement, and the two databases are separately stored according to the service requirement. The characteristics of MongoDB, such as easiness in starting, high capacity and quick response, are fully exerted, quick response is required when the MongoDB is used in a system, the storage with large data volume is required, and the concurrent processing capacity of the system is improved.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (8)
1. High concurrency VOIP recording service system based on intelligent speech recognition is characterized in that the system comprises:
the recording service module is used for directly decoding and storing the audio of the left and right sound channels at the equipment and chip layers to obtain recording audio;
the buffer register is configured on a software layer and is subjected to time synchronization, and when audio concurrency higher than the current capacity occurs, the capacity of the buffer register is increased as required;
the recording file management module is used for inputting the recording audio decoded and stored by the decoding storage module into a buffer queue;
the voice recognition engine is used for coding and decoding the recorded audio and assembling the voice media data packet into correct voice media stream data through feature extraction;
the voice media stream data is transmitted to the background service system through the MQ message queue middleware;
the recording service module adopts a subprocess cluster module to introduce a multi-process working mode, and each process is operated by a single thread; recording core service is divided into two types of main process and working process, a subprocess cluster is realized by a cluster mechanism of node.js, and the technical points are subprocess state monitoring, reliable interprocess communication mechanism and load scheduling mechanism;
the reliable interprocess communication mechanism adopts a cluster interprocess communication mechanism to transmit data among the subprocesses and assists in self-defining an RPC mechanism, and the interprocess communication mechanism comprises the following steps:
1) a father process investigates a pipe function to create a pipe, obtains two file descriptors and points to two ends of the pipe;
2) a parent process investigates fork to create a child process, and the child process has two file descriptors and points to two ends of the same pipeline;
3) and the parent process closes reading, and the child process closes writing, so that the parent process writes information into the pipeline, and the child process reads from the pipeline.
2. The intelligent voice recognition based highly concurrent VOIP recording service system according to claim 1, wherein the custom RPC mechanism defines two operations, request/response and event notification;
request response:
the RPC framework layer encodes a processing result and transmits the processing result to the RPC caller for subsequent processing;
event notification:
the event notification is realized by calling an RPC notification interface by an event source to transmit event parameters, and the RPC framework layer encodes the event parameters and transmits the encoded event parameters to an event listener by an inter-process communication mechanism to be processed.
3. The high-concurrency VOIP recording service system based on intelligent voice recognition as claimed in claim 2, wherein the load scheduling mechanism realizes scheduling of a work process by using a minimum load mode, a hook mechanism of RPC is adopted for load calculation, and the current load of a worker is calculated according to a calculation strategy by analyzing the execution condition of distributing the load to the worker by a master;
the load scheduling mechanism of the sub-process cluster calculates the working load of each working process, the newly added load is always put into the working process with the minimum load, a recording session is established after a recording core service receives a request initiated by equipment, each session consists of two interactive states of a signaling channel and a media channel, the signaling channel receives a control instruction of the equipment side to the session, the media channel receives media stream data transmitted by the equipment side, and the recording session management module distributes the signaling session and the media session into different processes.
4. The high-concurrency VOIP recording service system based on intelligent voice recognition as claimed in claim 3, wherein the recording service module is configured with a media stream management module, after receiving the message that the recording session is successfully established, the device sends the media stream to the media session SOCKET established by the recording service, the media stream management module performs fast verification of the media packet and extraction of packet structure information, and after successfully extracting the media packet structure information, the media stream can be delivered to a recording engine to convert the voice media stream on the network into a voice file on a disk;
the media stream management module compresses and packetizes the recording stream on the basis of a G729 protocol, the recording service adopts a real-time transmission mode, a worker is called to process the media stream after a main process receives the media stream, the worker uses packetization processing when processing the media stream, an incoming/outgoing call occupies one channel to transmit the media stream, the incoming/outgoing call stream is packetized in the media stream module, and the incoming/outgoing call media stream is not directly merged when storing a file but is directly stored as an original byte code.
5. The system of claim 4, wherein the recording engine uses a jitter buffer to smooth the packet loss and disorder problem of the voice data packets, and based on the specification of the recording system, replaces the packet loss and silence with 0db voice without performing voice optimization.
6. The system of claim 5, wherein the Jitter buffer processor stores the state information of the current recording voice stream and stores a segment of the voice stream in the memory, when receiving the RTP packet, it first needs to extract the meta information of the RTP packet to determine which part of the voice stream in the RTP packet so as to place the RTP packet in the proper position of the current segment, after the current segment is finished, the Jitter buffer processor delivers the voice segment to the file memory to be stored on the disk, and then prepares to process the RTP packet of the next segment;
besides the jitter buffer, the recording engine also comprises a file memory for storing the audio of the left and right sound channels which are directly decoded and stored on the equipment and chip layers to obtain the recorded audio, and the file memory determines the path and the name of the recording file corresponding to the session according to the configuration and determines when the writing operation and the closing operation of the session recording file are performed.
7. The intelligent voice recognition-based highly concurrent VOIP recording service system according to claim 6, wherein the recording engine divides the storage space of the recording file with a 4-level directory; the method comprises the steps of firstly, storing sound recording files generated by each VoIP calling device separately, then installing the date, hour and minute of the beginning of a sound recording session, storing the actual sound recording files in a minute directory of the level 4, and simultaneously realizing a quantity limiter to avoid the situation that the directory of the level of a single minute is too large, so as to support the storage and management of the sound recording files of the level of tens of millions per day.
8. The intelligent voice recognition-based highly concurrent VOIP recording service system according to claim 7, wherein the background service system includes a device access system, a concurrent buffering system, a data management system, a service system, an application system;
the equipment access system uses an asynchronous non-blocking IO model to be responsible for processing interaction with equipment and completing processing and distribution of recorded streams, all business processes are not required to be returned after being processed, the equipment access system is used for processing accessed streaming media in a concentrative way, after the processing is completed, the upper layer application is informed through a message queue to continue subsequent business processing, and the access system can be released;
the concurrency buffer system uses MQ to isolate overlong link, overlong service processing and high concurrency problem, and completes data forwarding through a data format designed in a standard way;
the service system is responsible for services abstracted by the system and performs uniform abstraction on an upper layer;
the database uses two databases of Mysql and MongoDB according to the requirement, and the two databases are separately stored according to the service requirement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910530307.5A CN110351445B (en) | 2019-06-19 | 2019-06-19 | High-concurrency VOIP recording service system based on intelligent voice recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910530307.5A CN110351445B (en) | 2019-06-19 | 2019-06-19 | High-concurrency VOIP recording service system based on intelligent voice recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110351445A CN110351445A (en) | 2019-10-18 |
CN110351445B true CN110351445B (en) | 2020-09-29 |
Family
ID=68182336
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910530307.5A Active CN110351445B (en) | 2019-06-19 | 2019-06-19 | High-concurrency VOIP recording service system based on intelligent voice recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110351445B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111583941B (en) * | 2020-05-07 | 2024-01-16 | 珠海格力电器股份有限公司 | Household appliance recording method and device, storage medium and household appliance |
CN111752717B (en) * | 2020-07-08 | 2021-08-31 | 广州爱浦路网络技术有限公司 | SMF intelligent expansion method and device and SMF session establishment communication method |
CN112817659B (en) * | 2021-02-02 | 2023-06-02 | 金陵科技学院 | Speech loading pre-judging method of speech gateway |
CN113380220B (en) * | 2021-06-10 | 2024-05-14 | 深圳市同行者科技有限公司 | Speech synthesis coding method and device |
CN113630512B (en) * | 2021-08-04 | 2023-10-13 | 宁波菊风系统软件有限公司 | Rich media call mobile terminal system and application method thereof |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6498791B2 (en) * | 1998-04-03 | 2002-12-24 | Vertical Networks, Inc. | Systems and methods for multiple mode voice and data communications using intelligently bridged TDM and packet buses and methods for performing telephony and data functions using the same |
JP3346395B2 (en) * | 2000-10-27 | 2002-11-18 | 日本ビクター株式会社 | Optical recording medium and audio decoding device |
US8755515B1 (en) * | 2008-09-29 | 2014-06-17 | Wai Wu | Parallel signal processing system and method |
US8688445B2 (en) * | 2008-12-10 | 2014-04-01 | Adobe Systems Incorporated | Multi-core processing for parallel speech-to-text processing |
CN101859565A (en) * | 2010-06-11 | 2010-10-13 | 深圳创维-Rgb电子有限公司 | System and method for realizing voice recognition on television |
CN102932562B (en) * | 2012-10-29 | 2016-01-20 | 携程计算机技术(上海)有限公司 | A kind of IP-based call center way of recording and system |
CN103905670A (en) * | 2012-12-28 | 2014-07-02 | 鸿富锦精密工业(深圳)有限公司 | VoIP recording method and device |
CN107767873A (en) * | 2017-10-20 | 2018-03-06 | 广东电网有限责任公司惠州供电局 | A kind of fast and accurately offline speech recognition equipment and method |
CN109150885B (en) * | 2018-08-29 | 2021-05-11 | 承启通(福建)科技有限公司 | Call data processing method and device |
-
2019
- 2019-06-19 CN CN201910530307.5A patent/CN110351445B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110351445A (en) | 2019-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110351445B (en) | High-concurrency VOIP recording service system based on intelligent voice recognition | |
CN100426266C (en) | Media session framework using protocol independent control module direct and manage application and service servers | |
US7120585B2 (en) | Remote server object architecture for speech recognition | |
US7508815B2 (en) | Method and system for facilitating network troubleshooting | |
CN109417583B (en) | System and method for transcribing audio signal into text in real time | |
US20030133545A1 (en) | Data processing system and method | |
WO2010129056A2 (en) | System and method for speech processing and speech to text | |
US7376710B1 (en) | Methods and systems for providing access to stored audio data over a network | |
US20030009334A1 (en) | Speech processing board for high volume speech processing applications | |
CN109976933A (en) | A kind of log processing method and device | |
EP3314841B1 (en) | Method for preserving media plane quality | |
CN114879930A (en) | Audio output optimization method for android compatible environment | |
CN117768855A (en) | Short message sending method, device, equipment and storage medium | |
CN101163172B (en) | High-capacity media broadcasting system and method for mobile phone or fixed telephone switching system | |
US8219403B2 (en) | Device and method for the creation of a voice browser functionality | |
CN111083422B (en) | Video networking scheduling conference exception analysis method, device, equipment, system and medium | |
WO2010130193A1 (en) | Device, method for controlling audio media packet transmission and audio media server | |
CN110517674A (en) | A kind of method of speech processing, device and storage medium | |
CN100576863C (en) | Dynamic resource management method and media gateway thereof and Media Gateway Controller | |
CN111176607A (en) | Voice interaction system and method based on power business | |
WO2022183841A1 (en) | Decoding method and device, and computer readable storage medium | |
CN109522110A (en) | A kind of multiple task management system and method based on view networking | |
CN102752466A (en) | Intelligent phone notification system in converged communication | |
CN114205555B (en) | Intelligent video customer service information processing method, system, equipment and medium | |
CN110072149B (en) | Data processing method and device for video network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP02 | Change in the address of a patent holder | ||
CP02 | Change in the address of a patent holder |
Address after: No.8, 6 / F, building 1, 88 xingle North Road, Xindu street, Xindu District, Chengdu, Sichuan 610000 Patentee after: Chengdu kangshengsi Technology Co.,Ltd. Address before: 610000 building a, 319 Xingye Avenue, Xindu Industrial Zone, Xindu District, Chengdu City, Sichuan Province Patentee before: Chengdu kangshengsi Technology Co.,Ltd. |