CN110351445B - High-concurrency VOIP recording service system based on intelligent voice recognition - Google Patents

High-concurrency VOIP recording service system based on intelligent voice recognition Download PDF

Info

Publication number
CN110351445B
CN110351445B CN201910530307.5A CN201910530307A CN110351445B CN 110351445 B CN110351445 B CN 110351445B CN 201910530307 A CN201910530307 A CN 201910530307A CN 110351445 B CN110351445 B CN 110351445B
Authority
CN
China
Prior art keywords
recording
voice
media stream
media
session
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910530307.5A
Other languages
Chinese (zh)
Other versions
CN110351445A (en
Inventor
袁熹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Kangshengsi Technology Co ltd
Original Assignee
Chengdu Kangshengsi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Kangshengsi Technology Co ltd filed Critical Chengdu Kangshengsi Technology Co ltd
Priority to CN201910530307.5A priority Critical patent/CN110351445B/en
Publication of CN110351445A publication Critical patent/CN110351445A/en
Application granted granted Critical
Publication of CN110351445B publication Critical patent/CN110351445B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/16Storage of analogue signals in digital stores using an arrangement comprising analogue/digital [A/D] converters, digital memories and digital/analogue [D/A] converters 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/42221Conversation recording systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M7/00Arrangements for interconnection between switching centres
    • H04M7/006Networks other than PSTN/ISDN providing telephone service, e.g. Voice over Internet Protocol (VoIP), including next generation networks with a packet-switched transport layer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/547Messaging middleware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Telephonic Communication Services (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a high-concurrency VOIP recording service system based on intelligent voice recognition, which comprises a recording service module, a voice recognition module and a voice recognition module, wherein the recording service module is used for directly decoding and storing audio of left and right sound channels at a device layer and a chip layer to obtain recording audio; the buffer register is configured on a software layer and is subjected to time synchronization, and when audio concurrency higher than the current capacity occurs, the capacity of the buffer register is increased as required; the recording file management module is used for inputting the recording audio decoded and stored by the decoding storage module into a buffer queue; the voice recognition engine is used for coding and decoding the recorded audio and assembling the voice media data packet into correct voice media stream data through feature extraction; the voice media stream data is transmitted to the background service system through the MQ message queue middleware, and the scheme can not only process more than 200 session processes at the same time, but also improve the recording quality.

Description

High-concurrency VOIP recording service system based on intelligent voice recognition
Technical Field
The invention relates to the field of recording service, in particular to a high-concurrency VOIP recording service system based on intelligent voice recognition.
Background
With the rapid development of IT technology, the traditional PSTN telephone Network has been unable to meet the communication requirement, especially after the occurrence of VOIP, VOIP (Voice over Internet protocol) simply digitizes the analog signal (Voice) and transmits IT in the form of Data Packet (Data Packet) in real time on the IP Network (IP Network). Enterprises adopt VOIP technology to gradually replace call center services based on PSTN lines so as to meet the requirements of convenient, uniform and cheap communication. However, with the demand and industry upgrade brought by the mobile internet technology, the traditional recording service has been difficult to meet the urgent demands of customers for high concurrency, quick identification, lower-cost operation using machine learning to replace the manpower of a call center, and the like.
The prior art has the following disadvantages: in the conventional solution at present, an AI engine is generally responsible for recognition and decoding of audio input and feature extraction, so as to enter a recognition stage, while the decoding process itself consumes system resources, which is expensive in time, and when multiple paths of voice inputs occur simultaneously, the consumption of the system resources is very large, thereby causing the problems of poor recording quality and unsmooth call, which cannot process voice recognition in large batch, and can only achieve 200 concurrent capabilities at most.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a high-concurrency VOIP recording service system based on intelligent voice recognition, which can not only process more than 200 session processes simultaneously, but also improve the recording quality.
The purpose of the invention is realized by the following technical scheme:
high concurrency VOIP recording service system based on intelligent speech recognition, the system includes:
the recording service module is used for directly decoding and storing the audio of the left and right sound channels at the equipment and chip layers to obtain recording audio;
the buffer register is configured on a software layer and is subjected to time synchronization, and when audio concurrency higher than the current capacity occurs, the capacity of the buffer register is increased as required;
the recording file management module is used for inputting the recording audio decoded and stored by the decoding storage module into a buffer queue;
the voice recognition engine is used for coding and decoding the recorded audio and assembling the voice media data packet into correct voice media stream data through feature extraction;
and the voice media stream data is transmitted to the background service system through the MQ message queue middleware.
Furthermore, the recording service module adopts a subprocess cluster module to introduce a multi-process working mode, and each process runs in a single thread; recording core service is divided into two types of main process and working process, a subprocess cluster is realized by adopting a cluster mechanism of node.js, and the main technical points are subprocess state monitoring, a reliable interprocess communication mechanism and a load scheduling mechanism.
Furthermore, the reliable interprocess communication mechanism adopts a cluster interprocess communication mechanism to transmit data among the subprocesses and assists in self-defining an RPC mechanism, and the interprocess communication mechanism comprises the following steps:
1) a father process investigates a pipe function to create a pipe, obtains two file descriptors and points to two ends of the pipe;
2) a parent process investigates fork to create a child process, and the child process has two file descriptors and points to two ends of the same pipeline;
3) and the parent process closes reading, and the child process closes writing, so that the parent process writes information into the pipeline, and the child process reads from the pipeline.
Further, the custom RPC mechanism defines two operations, namely request/response and event notification;
request response:
the method is characterized in that the method is sent by an RPC caller, a call parameter is coded by an RPC framework layer and then transmitted to an RPC processor by an inter-process communication mechanism for processing, and after the processing of the RPC processor is completed, a processing result is coded by the RPC framework layer and then transmitted to the RPC caller by the inter-process communication mechanism for subsequent processing.
Event notification:
the event notification is realized by calling an RPC notification interface by an event source to transmit event parameters, and the RPC framework layer encodes the event parameters and transmits the encoded event parameters to an event listener by an inter-process communication mechanism to be processed.
Furthermore, the load scheduling mechanism realizes the scheduling of the work process by using a minimum load mode, the load calculation adopts a hook mechanism of RPC, the execution condition of the load distributed to the worker by the master is analyzed, and the current load of the worker is calculated according to a calculation strategy;
the load scheduling mechanism of the sub-process cluster calculates the working load of each working process, the newly added load is always put into the working process with the minimum load, a recording session is established after a recording core service receives a request initiated by equipment, each session consists of two interactive states of a signaling channel and a media channel, the signaling channel receives a control instruction of the equipment side to the session, the media channel receives media stream data transmitted by the equipment side, and the recording session management module distributes the signaling session and the media session into different processes.
Furthermore, the recording service module is configured with a media stream management module, after receiving the message that the recording session is successfully established, the device sends the media stream to a media session SOCKET established by the recording service, the media stream management module performs rapid verification of the media packet and extraction of packet structure information, and after the media packet structure information is successfully extracted, the media stream can be delivered to a recording engine to convert the voice media stream on the network into a voice file on a disk;
the media stream management module compresses and packetizes the recording stream on the basis of a G729 protocol, the recording service adopts a real-time transmission mode, a worker is called to process the media stream after a main process receives the media stream, the worker uses packetization processing when processing the media stream, an incoming/outgoing call occupies one channel to transmit the media stream, the incoming/outgoing call stream is packetized in the media stream module, and the incoming/outgoing call media stream is not directly merged when storing a file but is directly stored as an original byte code.
Further, the recording engine uses the jitter buffer to smoothly process the problem of packet loss and disorder of voice data packets, and replaces the packet loss and silence with 0db voice based on the specification of the recording system without performing optimization processing of voice.
Further, the Jitter buffer processor stores the state information of the current recording voice stream, and stores a segment of the voice stream in the memory, when receiving an RTP packet, the Jitter buffer processor first needs to extract the meta-information of the RTP packet to determine which part of the voice stream in the RTP packet so as to place the RTP packet at the proper position of the current segment, and after the current segment is finished, the Jitter buffer processor transfers the voice segment to the file memory to be stored on the disk, and then prepares to process the RTP message of the next segment.
Besides the jitter buffer, the recording engine also comprises a file memory for storing the audio of the left and right sound channels which are directly decoded and stored on the equipment and chip layers to obtain the recorded audio, and the file memory determines the path and the name of the recording file corresponding to the session according to the configuration and determines when the writing operation and the closing operation of the session recording file are performed.
Further, the recording engine uses a 4-level directory to divide the storage space of the recording file; the method comprises the steps of firstly, storing sound recording files generated by each VoIP calling device separately, then installing the date, hour and minute of the beginning of a sound recording session, storing the actual sound recording files in a minute directory of the level 4, and simultaneously realizing a quantity limiter to avoid the situation that the directory of the level of a single minute is too large, so as to support the storage and management of the sound recording files of the level of tens of millions per day.
Further, the background service system comprises a device access system, a concurrent buffer system, a data management system, a service system and an application system;
the equipment access system uses an asynchronous non-blocking IO model to be responsible for processing interaction with equipment and completing processing and distribution of recorded streams, all business processes are not required to be returned after being processed, the equipment access system is used for processing accessed streaming media in a concentrative way, after the processing is completed, the upper layer application is informed through a message queue to continue subsequent business processing, and the access system can be released;
the concurrency buffer system uses MQ to isolate problems of overlong link, overlong service processing, high concurrency and the like, and completes data forwarding through a data format designed in a standard way;
the service system is responsible for services abstracted by the system and performs uniform abstraction on an upper layer;
the database uses two databases of Mysql and MongoDB according to the requirement, and the two databases are separately stored according to the service requirement.
The invention has the beneficial effects that: compared with the traditional recording service, the scheme directly decodes and stores the audio input of the left and right sound channels at the equipment and chip layers, the buffer is designed and time synchronization is well done at the software layer, the step needs to be completed together with hardware, and when thousands of audio concurrences occur, the buffer capacity is only increased as required, so that the voice recognition can be processed more efficiently and in real time.
Drawings
FIG. 1 is a schematic overall flow diagram of the present invention;
FIG. 2 is a schematic diagram of the composition of a platform layer according to the present invention;
FIG. 3 is a flow chart of the RPC mechanism of the present invention;
FIG. 4 is a schematic diagram of the voice recognition service module according to the present invention.
Detailed Description
The technical solution of the present invention is further described in detail with reference to the following specific examples, but the scope of the present invention is not limited to the following.
The technical challenge of the invention mainly lies in how to complete the simultaneous recording, storage and speech recognition translation for high-concurrency speech input. In the conventional solution at present, the AI engine is generally responsible for the recognition and decoding of the audio input and the feature extraction, so as to enter the recognition stage, and the decoding process itself consumes system resources, which is very terrorist in terms of time overhead cost.
As shown in fig. 1, in the system, in order to meet the real-time high-concurrency scene requirement, audio inputs of left and right channels are directly decoded and stored at a device layer and a chip layer, buffers are designed at a software layer and time synchronization is well performed, the steps need to be completed together with hardware, and when thousands of audio concurrencies occur, the speech recognition can be processed more efficiently and in real time only by increasing the capacity of the buffers as required. The system is composed of an equipment layer, a network layer, a platform layer and an application layer, and the improvement technology of the scheme is mainly embodied in the platform layer.
As shown in fig. 2, the platform layer is mainly composed of three major components, which are a recording service, a voice recognition service and a service management system.
Recording service module
The internal architecture of the recording service module is divided into 5 main modules, including a subprocess cluster, media stream management, recording session management, a recording engine and an encoding and decoding adapter.
The recording core service is a multi-process working mode, and each process is operated in a single thread mode.
The subprocess cluster module introduces a multi-process working mode and divides the recording core service into a main process and a working process. Js cluster is realized by a cluster mechanism of nodes, and the main technical points are subprocess state monitoring, a reliable interprocess communication mechanism and a load scheduling mechanism.
The reliable interprocess communication mechanism adopts the cluster interprocess communication mechanism (namely the pipe mechanism of the operating system) to transmit the data among the subprocesses, and assists in self-defining the RPC mechanism, thereby improving the convenience and reliability of the recording service.
Each process has different user address spaces, the global variable of any process cannot be seen in the other process, so data to be exchanged among the processes must pass through a kernel, a buffer area is opened in the kernel, the process 1 copies the data from the user space to the kernel buffer area, the process 2 reads the data from the kernel buffer area, and the mechanism provided by the kernel is called inter-process communication.
A pipe is one of the most basic IPC mechanisms, created by the pipe function. The conduit acts between processes with blood relationship and is passed through fork. When the pipe function is called, a buffer area (called a pipe) is opened in the kernel for communication, the buffer area has a reading end and a writing end, and then two file descriptors are transmitted to a user program through files parameters, the files [0] points to the reading end of the pipe, and the files [1] points to the writing end of the pipe (which is well remembered as if 0 is standard input 1 and standard output). The pipeline appears to the user program as an open file, either by read (files [0]), or by write (files [1]), and the reading and writing of data to this file is actually in the read-write kernel buffer. The pipe function call successfully returns 0, and the call failure returns-1.
The method comprises the following steps of using a pipe mechanism to realize process communication:
the father process investigates the pipe function to create a pipe, obtains two file descriptors and points to two ends of the pipe
The father process investigates fork to create a child process, and the child process has two file descriptors which are identical and points to two ends of the same pipeline
And the parent process closes reading, and the child process closes writing, so that the parent process writes information into the pipeline, and the child process reads from the pipeline.
As shown in FIG. 3, the voice recording service system further enhances convenience and reliability by customizing a set of RPC protocols, providing requests/responses and event machines.
The "event dispatcher" and the "communication endpoint a" and the "communication endpoint B" in fig. 3 belong to the communication framework layer of the RPC. The RPC protocol of the recording service defines two operations: request/response and event notification.
Request response:
the RPC framework layer encodes the calling parameters and transmits the encoded parameters to an inter-process communication mechanism to be transmitted to an RPC processor for processing. After the RPC processor finishes processing, the RPC framework layer encodes the processing result and transfers the processing result to the RPC caller by the inter-process communication mechanism for subsequent processing.
Event notification:
the event notification is realized by calling an RPC notification interface by an event source to transmit event parameters, and the RPC framework layer encodes the event parameters and transmits the encoded event parameters to an event listener by an inter-process communication mechanism to be processed.
And the sub-process state monitoring is realized by monitoring the process exit event of the cluster module, and the sub-process is restarted immediately after the process exit event is received.
The load scheduling mechanism of the subprocess cluster realizes the scheduling of the work process by using a minimum load mode, and the load calculation adopts a hook mechanism of RPC. The current workload of the worker is calculated according to a calculation strategy by analyzing the execution condition of the master for distributing the load to the worker (work process).
And the load scheduling mechanism of the subprocess cluster calculates the working load of each working process and always puts the newly added load into the working process with the minimum load. The recording session is established by the recording core service after receiving the request initiated by the equipment, and each path of session is composed of two interactive states of a signaling channel and a media channel. The signaling channel receives the control instruction of the device side to the session, and the media channel receives the media stream data transmitted by the device side. The recording session management module distributes the signaling session and the media session into different processes, and successfully solves two main requirements of centralized control required by the signaling session and large throughput required by the media session.
The recording service host process tells the sub-process to create a media session through an RPC mechanism. After the media session is successfully created, the recording service host process also needs to subscribe to an end event of the media session by using an RPC mechanism to monitor the working state of the media session.
After receiving the message that the recording session is successfully established, the device end sends the media stream to a media session SOCKET established by the recording server end, the media stream management module performs quick verification of the media packet and extraction of the packet structure information, and after the media packet structure information is successfully extracted, the media stream can be sent to a recording engine to convert the voice media stream on the network into a voice file on a disk.
The media stream management module compresses and packetizes the recording stream based on the protocols such as G729 (standard audio protocol). The recording service adopts a real-time transmission mode, after a main process receives a media stream, a worker is called to process the media stream, when the worker processes the media stream, the traditional processing mode (media stream executable voice file) is abandoned, the worker uses the sub-packet processing which is self-designed by the worker, an incoming/outgoing call occupies one channel to perform media stream transmission, the incoming/outgoing call stream is sub-packet processed in a media stream module, the incoming/outgoing call media stream is not directly merged when storing the file, but is directly stored as an original byte code, and thus, the mode of independently storing the incoming/outgoing call bytes greatly improves the processing efficiency and the system concurrency quantity of the recording service. When the recording file is played, the system internally combines the recording file and then outputs and plays the recording file.
And installing a recording processing path in the media stream management, transmitting the received voice media packets layer by layer, verifying and checking the packets in each layer according to the capability range of each layer, filtering illegal packets, and finally filtering to obtain effective media stream RTP packets.
Media data transmitted over a network typically suffers from packet loss, misordering, and the like. In order to save bandwidth, a silence packet is sometimes used to replace a period of ultra-low decibel voice data stream, and after receiving a media data packet, a recording engine module needs to identify and repair the conditions, and then the voice media data packet can be assembled into correct voice media stream data. In order to avoid slow disk IO blocking the process from running during storage, an asynchronous file IO mechanism provided by an operating system is used.
The recording engine uses the jitter buffer to smoothly process the problem of packet loss and disorder of the voice data packets. Based on the specification of a recording system, 0db sound is used for replacing packet loss and silence, and optimization processing of voice is not performed.
The Jitter buffer processor stores the state information of the current recorded voice stream and stores a segment of the voice stream in the memory. When receiving an RTP packet, it is first necessary to extract meta-information of the RTP packet to decide which part of the speech stream of the RTP packet to put the RTP packet at the proper position of the current segment. After the current segment is finished, the jitter buffer processor gives the voice segment to the file memory to be stored on the disk, and then the RTP message of the next segment is prepared to be processed.
In addition to the jitterbuffer, another important component of the sound recording engine is the file storage.
The file memory determines the path and name of the recording file corresponding to the session according to the configuration, and determines when the writing operation and closing operation of the session recording file are performed.
The recording engine divides the storage space of the recording file by a 4-level directory. The recording file generated by each VoIP calling device is stored separately (the device recording file directory is named by the device serial number), then the date, hour and minute of the beginning of the recording session are installed, and the actual recording file is stored in the minute directory of the 4 th level. A number limiter is also implemented to avoid a single minute level directory being too large. Thus, tens of millions of sound recording file storage and management are supported.
The specific working process comprises the following steps: after receiving the recording request, the recording session management directly decodes and stores the audio of the left and right sound channels in a file memory at the equipment and chip layer through a recording engine.
Meanwhile, the buffer register is configured on a software layer and time synchronization is performed, and when audio concurrency higher than the current capacity occurs, the capacity of the buffer register is increased as required;
the recording file management module is used for inputting the recording audio decoded and stored by the decoding storage module into a buffer queue;
the voice recognition engine is used for coding and decoding the recorded audio and assembling the voice media data packet into correct voice media stream data through feature extraction;
and the voice media stream data is transmitted to the background service system through the MQ message queue middleware.
The voice recognition engine is a voice recognition module based on artificial intelligence, and the voice recognition service module is shown in fig. 4, so that the packaging and scheduling of AI capabilities are realized, the AI service capabilities of a third party can be docked and referred, and meanwhile, the AI capabilities of a company can be used, so that the scene requirements of recording services are fully met.
The module finally needs to support Chinese and English speech recognition:
the Chinese only supports Mandarin, the format for acquiring the voice comprises formats such as PCM, WAV and the like, the Mandarin identification supports 8k/16k of sampling rate and 16bits of sampling depth;
the voice recognition comprises real-time voice recognition and file voice recognition;
the speech recognition outputs characters, and the recognition accuracy rate of the Putonghua reaches 95%;
background business system
The background service system mainly solves the high concurrency requirement in the real sense through the optimization of the message middleware and the bottom database.
The congestion caused by high concurrency scenes is solved in an asynchronous mode;
the old protocol is compatible, and all the current integrated gateway equipment can be smoothly accessed into the new recording service;
the expansibility is good, and the subsequent protocol analysis of the friend equipment and the adaptation of the general protocol of the Internet of things can be supported;
in the system architecture, the system pressure caused by the fact that the pressure borne by the system mainly comes from the access of a large number of devices is considered, so that an asynchronous non-blocking IO model is used in the access system, the thread execution efficiency is improved, and the system processing concurrency capability is increased. And in order to prevent the system from being punctured by high concurrency and further increase the processing capacity of the access system, the system is integrally divided into a hierarchy and designed into a hierarchy:
system for controlling a power supply Technology stack Description of the invention
Device access system Netty+Zookeeper Distributed communication architecture based on asynchronous non-blocking, supporting dynamic capacity expansion
Concurrent buffer system RabbitMQ Message queue clustering to reduce high concurrency puncture risk
Data management Mysql+MongoDB+Redis Multiple data classified storage, database concurrent link reduction and data security improvement
Business system Springboot+Mybatis plus+Shiro+druid SAAS-based multi-tenant system, unified equipment management design and standardized API design
Application system NodeJs+react Normative front-end applications, independently deployable
The equipment access system is responsible for processing interaction with the equipment, completing processing and distribution of the recorded stream, returning after all service flows are processed, the equipment access system is used for processing the accessed stream media in a concentrative way, and informing the upper layer application to continue subsequent service processing through a message queue after the processing is completed, so that the access system can release the processed stream media;
the concurrent buffering uses MQ to isolate problems of overlong link, overlong service processing, high concurrency and the like, and data forwarding is completed through a data format designed in a standard way;
the service system only concerns the service abstracted by the system, and uniform abstraction is carried out on the upper layer.
The database uses two databases of Mysql and MongoDB according to the requirement, and the two databases are separately stored according to the service requirement. The characteristics of MongoDB, such as easiness in starting, high capacity and quick response, are fully exerted, quick response is required when the MongoDB is used in a system, the storage with large data volume is required, and the concurrent processing capacity of the system is improved.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. High concurrency VOIP recording service system based on intelligent speech recognition is characterized in that the system comprises:
the recording service module is used for directly decoding and storing the audio of the left and right sound channels at the equipment and chip layers to obtain recording audio;
the buffer register is configured on a software layer and is subjected to time synchronization, and when audio concurrency higher than the current capacity occurs, the capacity of the buffer register is increased as required;
the recording file management module is used for inputting the recording audio decoded and stored by the decoding storage module into a buffer queue;
the voice recognition engine is used for coding and decoding the recorded audio and assembling the voice media data packet into correct voice media stream data through feature extraction;
the voice media stream data is transmitted to the background service system through the MQ message queue middleware;
the recording service module adopts a subprocess cluster module to introduce a multi-process working mode, and each process is operated by a single thread; recording core service is divided into two types of main process and working process, a subprocess cluster is realized by a cluster mechanism of node.js, and the technical points are subprocess state monitoring, reliable interprocess communication mechanism and load scheduling mechanism;
the reliable interprocess communication mechanism adopts a cluster interprocess communication mechanism to transmit data among the subprocesses and assists in self-defining an RPC mechanism, and the interprocess communication mechanism comprises the following steps:
1) a father process investigates a pipe function to create a pipe, obtains two file descriptors and points to two ends of the pipe;
2) a parent process investigates fork to create a child process, and the child process has two file descriptors and points to two ends of the same pipeline;
3) and the parent process closes reading, and the child process closes writing, so that the parent process writes information into the pipeline, and the child process reads from the pipeline.
2. The intelligent voice recognition based highly concurrent VOIP recording service system according to claim 1, wherein the custom RPC mechanism defines two operations, request/response and event notification;
request response:
the RPC framework layer encodes a processing result and transmits the processing result to the RPC caller for subsequent processing;
event notification:
the event notification is realized by calling an RPC notification interface by an event source to transmit event parameters, and the RPC framework layer encodes the event parameters and transmits the encoded event parameters to an event listener by an inter-process communication mechanism to be processed.
3. The high-concurrency VOIP recording service system based on intelligent voice recognition as claimed in claim 2, wherein the load scheduling mechanism realizes scheduling of a work process by using a minimum load mode, a hook mechanism of RPC is adopted for load calculation, and the current load of a worker is calculated according to a calculation strategy by analyzing the execution condition of distributing the load to the worker by a master;
the load scheduling mechanism of the sub-process cluster calculates the working load of each working process, the newly added load is always put into the working process with the minimum load, a recording session is established after a recording core service receives a request initiated by equipment, each session consists of two interactive states of a signaling channel and a media channel, the signaling channel receives a control instruction of the equipment side to the session, the media channel receives media stream data transmitted by the equipment side, and the recording session management module distributes the signaling session and the media session into different processes.
4. The high-concurrency VOIP recording service system based on intelligent voice recognition as claimed in claim 3, wherein the recording service module is configured with a media stream management module, after receiving the message that the recording session is successfully established, the device sends the media stream to the media session SOCKET established by the recording service, the media stream management module performs fast verification of the media packet and extraction of packet structure information, and after successfully extracting the media packet structure information, the media stream can be delivered to a recording engine to convert the voice media stream on the network into a voice file on a disk;
the media stream management module compresses and packetizes the recording stream on the basis of a G729 protocol, the recording service adopts a real-time transmission mode, a worker is called to process the media stream after a main process receives the media stream, the worker uses packetization processing when processing the media stream, an incoming/outgoing call occupies one channel to transmit the media stream, the incoming/outgoing call stream is packetized in the media stream module, and the incoming/outgoing call media stream is not directly merged when storing a file but is directly stored as an original byte code.
5. The system of claim 4, wherein the recording engine uses a jitter buffer to smooth the packet loss and disorder problem of the voice data packets, and based on the specification of the recording system, replaces the packet loss and silence with 0db voice without performing voice optimization.
6. The system of claim 5, wherein the Jitter buffer processor stores the state information of the current recording voice stream and stores a segment of the voice stream in the memory, when receiving the RTP packet, it first needs to extract the meta information of the RTP packet to determine which part of the voice stream in the RTP packet so as to place the RTP packet in the proper position of the current segment, after the current segment is finished, the Jitter buffer processor delivers the voice segment to the file memory to be stored on the disk, and then prepares to process the RTP packet of the next segment;
besides the jitter buffer, the recording engine also comprises a file memory for storing the audio of the left and right sound channels which are directly decoded and stored on the equipment and chip layers to obtain the recorded audio, and the file memory determines the path and the name of the recording file corresponding to the session according to the configuration and determines when the writing operation and the closing operation of the session recording file are performed.
7. The intelligent voice recognition-based highly concurrent VOIP recording service system according to claim 6, wherein the recording engine divides the storage space of the recording file with a 4-level directory; the method comprises the steps of firstly, storing sound recording files generated by each VoIP calling device separately, then installing the date, hour and minute of the beginning of a sound recording session, storing the actual sound recording files in a minute directory of the level 4, and simultaneously realizing a quantity limiter to avoid the situation that the directory of the level of a single minute is too large, so as to support the storage and management of the sound recording files of the level of tens of millions per day.
8. The intelligent voice recognition-based highly concurrent VOIP recording service system according to claim 7, wherein the background service system includes a device access system, a concurrent buffering system, a data management system, a service system, an application system;
the equipment access system uses an asynchronous non-blocking IO model to be responsible for processing interaction with equipment and completing processing and distribution of recorded streams, all business processes are not required to be returned after being processed, the equipment access system is used for processing accessed streaming media in a concentrative way, after the processing is completed, the upper layer application is informed through a message queue to continue subsequent business processing, and the access system can be released;
the concurrency buffer system uses MQ to isolate overlong link, overlong service processing and high concurrency problem, and completes data forwarding through a data format designed in a standard way;
the service system is responsible for services abstracted by the system and performs uniform abstraction on an upper layer;
the database uses two databases of Mysql and MongoDB according to the requirement, and the two databases are separately stored according to the service requirement.
CN201910530307.5A 2019-06-19 2019-06-19 High-concurrency VOIP recording service system based on intelligent voice recognition Active CN110351445B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910530307.5A CN110351445B (en) 2019-06-19 2019-06-19 High-concurrency VOIP recording service system based on intelligent voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910530307.5A CN110351445B (en) 2019-06-19 2019-06-19 High-concurrency VOIP recording service system based on intelligent voice recognition

Publications (2)

Publication Number Publication Date
CN110351445A CN110351445A (en) 2019-10-18
CN110351445B true CN110351445B (en) 2020-09-29

Family

ID=68182336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910530307.5A Active CN110351445B (en) 2019-06-19 2019-06-19 High-concurrency VOIP recording service system based on intelligent voice recognition

Country Status (1)

Country Link
CN (1) CN110351445B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583941B (en) * 2020-05-07 2024-01-16 珠海格力电器股份有限公司 Household appliance recording method and device, storage medium and household appliance
CN111752717B (en) * 2020-07-08 2021-08-31 广州爱浦路网络技术有限公司 SMF intelligent expansion method and device and SMF session establishment communication method
CN112817659B (en) * 2021-02-02 2023-06-02 金陵科技学院 Speech loading pre-judging method of speech gateway
CN113380220B (en) * 2021-06-10 2024-05-14 深圳市同行者科技有限公司 Speech synthesis coding method and device
CN113630512B (en) * 2021-08-04 2023-10-13 宁波菊风系统软件有限公司 Rich media call mobile terminal system and application method thereof

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6498791B2 (en) * 1998-04-03 2002-12-24 Vertical Networks, Inc. Systems and methods for multiple mode voice and data communications using intelligently bridged TDM and packet buses and methods for performing telephony and data functions using the same
JP3346395B2 (en) * 2000-10-27 2002-11-18 日本ビクター株式会社 Optical recording medium and audio decoding device
US8755515B1 (en) * 2008-09-29 2014-06-17 Wai Wu Parallel signal processing system and method
US8688445B2 (en) * 2008-12-10 2014-04-01 Adobe Systems Incorporated Multi-core processing for parallel speech-to-text processing
CN101859565A (en) * 2010-06-11 2010-10-13 深圳创维-Rgb电子有限公司 System and method for realizing voice recognition on television
CN102932562B (en) * 2012-10-29 2016-01-20 携程计算机技术(上海)有限公司 A kind of IP-based call center way of recording and system
CN103905670A (en) * 2012-12-28 2014-07-02 鸿富锦精密工业(深圳)有限公司 VoIP recording method and device
CN107767873A (en) * 2017-10-20 2018-03-06 广东电网有限责任公司惠州供电局 A kind of fast and accurately offline speech recognition equipment and method
CN109150885B (en) * 2018-08-29 2021-05-11 承启通(福建)科技有限公司 Call data processing method and device

Also Published As

Publication number Publication date
CN110351445A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
CN110351445B (en) High-concurrency VOIP recording service system based on intelligent voice recognition
CN100426266C (en) Media session framework using protocol independent control module direct and manage application and service servers
US7120585B2 (en) Remote server object architecture for speech recognition
US7508815B2 (en) Method and system for facilitating network troubleshooting
CN109417583B (en) System and method for transcribing audio signal into text in real time
US20030133545A1 (en) Data processing system and method
WO2010129056A2 (en) System and method for speech processing and speech to text
US7376710B1 (en) Methods and systems for providing access to stored audio data over a network
US20030009334A1 (en) Speech processing board for high volume speech processing applications
CN109976933A (en) A kind of log processing method and device
EP3314841B1 (en) Method for preserving media plane quality
CN114879930A (en) Audio output optimization method for android compatible environment
CN117768855A (en) Short message sending method, device, equipment and storage medium
CN101163172B (en) High-capacity media broadcasting system and method for mobile phone or fixed telephone switching system
US8219403B2 (en) Device and method for the creation of a voice browser functionality
CN111083422B (en) Video networking scheduling conference exception analysis method, device, equipment, system and medium
WO2010130193A1 (en) Device, method for controlling audio media packet transmission and audio media server
CN110517674A (en) A kind of method of speech processing, device and storage medium
CN100576863C (en) Dynamic resource management method and media gateway thereof and Media Gateway Controller
CN111176607A (en) Voice interaction system and method based on power business
WO2022183841A1 (en) Decoding method and device, and computer readable storage medium
CN109522110A (en) A kind of multiple task management system and method based on view networking
CN102752466A (en) Intelligent phone notification system in converged communication
CN114205555B (en) Intelligent video customer service information processing method, system, equipment and medium
CN110072149B (en) Data processing method and device for video network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: No.8, 6 / F, building 1, 88 xingle North Road, Xindu street, Xindu District, Chengdu, Sichuan 610000

Patentee after: Chengdu kangshengsi Technology Co.,Ltd.

Address before: 610000 building a, 319 Xingye Avenue, Xindu Industrial Zone, Xindu District, Chengdu City, Sichuan Province

Patentee before: Chengdu kangshengsi Technology Co.,Ltd.