CN116761030A - Multi-machine-bit synchronous audio and video recording and playing system based on image recognition algorithm - Google Patents

Multi-machine-bit synchronous audio and video recording and playing system based on image recognition algorithm Download PDF

Info

Publication number
CN116761030A
CN116761030A CN202311008485.4A CN202311008485A CN116761030A CN 116761030 A CN116761030 A CN 116761030A CN 202311008485 A CN202311008485 A CN 202311008485A CN 116761030 A CN116761030 A CN 116761030A
Authority
CN
China
Prior art keywords
module
data
network
audio
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311008485.4A
Other languages
Chinese (zh)
Other versions
CN116761030B (en
Inventor
秦明华
苗升伍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Hanwei Education Technology Co ltd
Original Assignee
Nanjing Hanwei Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Hanwei Education Technology Co ltd filed Critical Nanjing Hanwei Education Technology Co ltd
Priority to CN202311008485.4A priority Critical patent/CN116761030B/en
Publication of CN116761030A publication Critical patent/CN116761030A/en
Application granted granted Critical
Publication of CN116761030B publication Critical patent/CN116761030B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43076Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of the same content streams on multiple devices, e.g. when family members are watching the same movie on different devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4334Recording operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4341Demultiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4662Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
    • H04N21/4665Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms involving classification methods, e.g. Decision trees
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4662Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
    • H04N21/4666Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms using neural networks, e.g. processing the feedback provided by the user
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Security & Cryptography (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The invention discloses a multi-machine-bit synchronous sound and video recording and broadcasting system based on an image recognition algorithm, relates to the technical field of data information processing, and mainly solves the problems of slow data processing capacity and data processing lag of multi-machine-bit synchronous sound and video recording and broadcasting. The scheme is that the multi-machine-bit synchronous audio and video recording and broadcasting system based on the image recognition algorithm comprises a monitoring center, an annular recording and broadcasting device, an NTP server, network communication equipment and a maintenance management platform, wherein a main controller of the monitoring center searches an optimal synchronous path by adopting an optimal optimizing algorithm, a target detection model is adopted by a switch to detect data code streams in real time, an audio and video distribution model is adopted by a resolution unit to realize separation of audio and video data code streams, so that the transmission efficiency of a synchronous recording and broadcasting network is greatly improved, the abnormal data correction speed is improved, and the data information processing capability is improved.

Description

Multi-machine-bit synchronous audio and video recording and playing system based on image recognition algorithm
Technical Field
The invention relates to the technical field of data information processing, in particular to a multi-machine-bit synchronous audio and video recording and playing system based on an image recognition algorithm.
Background
Multi-camera synchronous video recording generally refers to capturing sound and video using multiple cameras or microphones during recording to achieve high quality video recording. This technique can use different cameras or microphones at different locations to capture sound and video for editing and integration during the manufacturing process. The key to achieving multi-camera synchronous video recording is to properly set the position and angle of each camera or microphone and to use the appropriate recording equipment to capture sound and video. This may require the use of specialized audio and video software to operate and manage multiple recording devices for optimal performance during manufacturing. The multi-site recording and broadcasting technique is a common and mass-used means in the production of television programs, which is similar to the live broadcasting of television, but only sends the television signals of theatres to a video recorder for recording instead of being sent to a television station for site broadcasting. The working mode is mainly used for recording programs such as dramas, concerts, dances, songs and the like which do not necessarily need instant live broadcasting. The common characteristics of such programs are long time, good consistency, strong stage feel, and general performance in theatres, thus often having audience effect. In the multi-machine-bit synchronous audio and video recording and broadcasting process, the efficiency of data information is lagged in the processing process, so that the calculation and processing of the data information are difficult to meet, and in the on-site recording and broadcasting process, the main control position is arranged at a place which is convenient to connect with each machine bit and convenient to route and is far away from sound equipment so as to avoid call interference.
In the information processing of multi-camera-bit synchronous video recording and broadcasting data, various problems, such as multi-camera-bit synchronization, are easy to occur, and in the video recording and broadcasting field, multi-camera-bit synchronization refers to that a plurality of cameras or cameras capture, record or transmit the same scene at the same time. In processing multi-bit synchronization data, it is necessary to synchronize video streams of multiple bits to ensure that they are recorded or transmitted after the same time occurs. Image coding, such as for audio-visual recorded broadcast data, generally requires coding of the image. For multi-bit synchronization data, the image needs to be encoded separately for each bit in order to be properly processed at the time of synchronization. Image processing, such as processing multi-level synchronous data, requires corresponding image processing of each level image, including video compression, clipping, volume adjustment, color space conversion, etc. In the data information processing process, the data information processing is delayed, the data processing effect is poor, and the current requirements are difficult to meet.
Disclosure of Invention
Aiming at the defects of the technology, the invention discloses a multi-machine-position synchronous audio and video recording and broadcasting system based on an image recognition algorithm, wherein a main controller of a monitoring center adopts an optimal optimizing algorithm to search an optimal synchronous path, a switch adopts a target detection model to detect data code streams in real time, and a resolution unit adopts an audio and video shunt model to realize the separation of audio and video data code streams, so that the transmission efficiency of a synchronous recording and broadcasting network is greatly improved, and the abnormal data correction speed is improved. Therefore, the synchronous sound and video recording and playing capability is further improved through the data information processing capability.
In order to achieve the technical effects, the invention adopts the following technical scheme,
a multi-machine-bit synchronous audio and video recording and broadcasting system based on an image recognition algorithm comprises a monitoring center, network communication equipment, a maintenance management platform, an annular recorder and an NTP server;
the monitoring center is used for monitoring and managing all the access devices; the monitoring center comprises a main controller, a correction center, a data center and a management center, wherein the main controller is used for adjusting the working state of each device, the data center is used for storing data and logs generated by each device, the data center comprises a sound and video storage area and a device working log storage area, the correction center is used for adjusting the asynchronization of multi-machine bit recording and broadcasting, and the management center is used for monitoring the working state of each device;
the annular recording player is used for realizing the omnibearing shooting and recording of the audio and video; the annular recorder comprises a time acquisition module, an audio and video coding module, a protocol encryption module and an editing module, wherein the time acquisition module is used for acquiring standard time of each machine position, the audio and video coding module is used for coding audio and video into a data code stream, the protocol encryption module is used for verifying authenticity of the audio and video and preventing peeping in a transmission process, and the editing module is used for converting the transmitted data code stream into the audio and video and playing the audio and video;
the NTP server is used for computer time synchronization and time correction with high accuracy;
the network communication equipment is used for realizing data transmission and communication connection; the network communication equipment comprises a router, a network bridge and a switch, wherein the router divides a network through a routing algorithm, the network bridge adopts a data link layer protocol based on MAC address identification and forwarding to expand network distance and communication means, and the switch adopts a target detection model to detect data code streams and people and objects in videos in real time so as to regenerate audio and video and forwards the audio and video to a designated port;
the maintenance management platform is used for realizing remote management and maintenance of the equipment; the maintenance management platform comprises a classification module, a maintenance module and a configuration module, wherein the classification module is used for classifying audio and video data code streams, the classification module comprises a resolution unit, a denoising unit, a filling unit and a storage unit, the denoising unit is used for removing noise in the video code streams through an anomaly detection algorithm, the filling unit supplements missing items in the audio and video data code streams through an interpolation algorithm, the resolution unit is used for realizing separation of the audio and video data code streams by adopting an audio and video splitting model, the storage unit is used for storing filled data by adopting a distributed database, the output end of the resolution unit is connected with the input end of the denoising unit, the output end of the denoising unit is connected with the input end of the filling unit, the output end of the filling unit is connected with the input end of the storage unit, the maintenance module is used for detecting and displaying working states of all machine positions, the maintenance module comprises an intelligent matching unit, a data compression unit and a data caching unit, the intelligent matching unit is used for carrying out corresponding matching training and adaptation on image quality, the network rate and the data compression unit is used for carrying out correction on data by adopting a mixed compression algorithm, the data compression unit is used for carrying out remote data correction and the data configuration parameter correction and the data is used for carrying out remote data correction and has no configuration.
As a further description of the above technical solution, the object detection model includes a feature extraction module, a learning module and an identification module, where the feature extraction module is a reb+ network, the reb+ network uses 8 convolution kernels of 3×3 and 5×5 to connect in the form of a residual network, uses 10 depth separable convolutions to expand the convolution kernels into corresponding scales, the learning module is a SPAN network, the SPAN network includes 11×11, 7×7, 5×5 and 3×3 average pooling, the step sizes are all 1, and the SPAN network uses function operation to perform channel stacking, and the identification module is 4 detection heads with different scales.
As a further description of the above technical solution, the audio/video splitting model includes an encoder, a splitter and a decoder, where the encoder converts the mixed data code stream into a feature space representation, the splitter calculates a data code stream mask of each source according to the extracted feature to obtain a corresponding feature of each source, the decoder decodes the corresponding feature of each source to obtain a data code stream of each source, the encoder performs 4 times of overlapping through two-dimensional space separable convolution to obtain a fine feature matrix, the splitter is a time convolution network so that the size of an effective window increases with the number of layers, and the decoder uses two identical one-dimensional convolutions to distinguish different channels of different people speaking.
As a further description of the above technical solution, the parameter correction model is a google netv 3-une+ model, the google netv 3-une+ model backbone network adopts a google netv3 network, the task network adopts an Unet network, the google netv3 network model includes a feature layer, a learning layer and an identification layer, the feature layer includes a3×3 convolution layer and a combined convolution layer composed of 1×7 and 7×1, the learning layer is an index module composed of 16 convolution blocks of 1×3, 3×1 and 1, the index module includes a BN layer, a maximum pooling layer and an average pooling layer, the identification layer encloses a depth residual shrinkage module, the Unet network is a coder-decoder structure, the google netv3-unet+ model first transfers the combination module of the index module and the depth shrinkage module to the Unet network coder of the Unet network, and then extracts the depth residual function according to the position of the Unet 1×3×1 and 7×1 convolution block, and finally extracts the depth residual characteristic by using a depth shrinkage function of the depth shrinkage module to the depth residual channel of the une network, and the depth shrinkage module is sequentially extracted from the depth channel to the depth channel.
As a further description of the above technical solution, the training process of the self-training model is as follows: firstly, training with label data to obtain a model, then predicting unlabeled data, adding data with high confidence to a training set, and continuing training until the model meets the requirements; the data caching unit adopts memcache and redis to cache recorded broadcast network data.
As a further description of the above technical solution, the hybrid compression algorithm model includes a dividing module, a mapping module and a replacing module, where the dividing module divides input data into the same data block according to content characteristics, the mapping module maps characters with high occurrence frequency with 4-bit codes, maps characters with low occurrence frequency with 16-bit codes, and finally the replacing module replaces repeated character strings with a hash table, an output end of the dividing module is connected with an input end of the mapping module, and an output end of the mapping module is connected with an input end of the replacing module.
As further description of the technical scheme, the main controller comprises an FPGA+DSP processing module, the DSP processing module is an acquisition chip of ATMega328 model, the DSP processing module integrates 14 paths of GPIO interfaces, 6 paths of PWM interfaces, 12 bit ADC interfaces, UART serial ports, 1 path of SPI interfaces and 1 path of I2C interfaces, the FPGA processing module is an ARTIX-7 series XC7A100T-2FGG484I chip, the FPGA processing module adopts an optimal optimizing algorithm to search an optimal synchronous path, the optimal optimizing algorithm searches a path according to a probability formula, and the probability formula is as follows:
(1)
in the formula (1), the components are as follows,probability of playing data on path (i, j) for kth recording/playback data,/for the kth recording/playback data>For the t-time path (i, j) pheromone,heuristic factor for path (i, j) at time t,>for recording and playing the path set which is allowed to be accessed in the next step of data k, s is the path set element which is allowed to be accessed,/for the next step>,/>K is a natural number, i is a row coordinate, and j is a column coordinate;
the recorded broadcast data releases pheromone on a passing path, and the pheromone updating formula is as follows:
(2)
in the formula (2), the amino acid sequence of the compound,for the degree of volatility->For the concentration of the pheromone of the path (i, j,) the +.>For the number of consecutive convergence times,is the maximum pheromone concentration, t is time, < >>Data margin>Is the variation of the pheromone concentration;
optimizing the path by updating the pheromone, wherein the optimization algorithm model is as follows:
(3)
(4)
in the formulae (3) to (4),for self-speed +.>For the best place the individual has experienced, +.>For the final position of population experience, A, B, C are correction factors, ++>For the last moment, +.>Is a velocity vector.
In summary, by adopting the technical scheme, the invention has the beneficial effects that,
the invention discloses a multi-machine-bit synchronous audio and video recording and broadcasting system based on an image recognition algorithm, which is characterized in that a main controller of a monitoring center searches an optimal synchronous path by adopting an optimal searching algorithm, a switch detects a data code stream in real time by adopting a target detection model, the multi-machine-bit synchronous audio and video recording and broadcasting capability is greatly improved by data information processing, the separation of audio and video data code streams is realized by adopting an audio and video shunt model by a resolution unit, the transmission efficiency of a synchronous recording and broadcasting network is greatly improved, and the abnormal data correction speed is improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings may be obtained from these drawings without inventive faculty for a person skilled in the art,
FIG. 1 is a schematic diagram of the overall architecture of the present invention;
FIG. 2 is a schematic diagram of an audio/video shunt model;
FIG. 3 is a schematic diagram of a target detection model structure;
FIG. 4 is a schematic diagram of a hybrid compression algorithm model;
fig. 5 is a schematic structural diagram of a maintenance management platform.
Detailed Description
The following description of the embodiments of the present disclosure will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the disclosure. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.
1-5, a multi-machine-bit synchronous audio and video recording and broadcasting system based on an image recognition algorithm comprises a monitoring center, network communication equipment, a maintenance management platform, a ring recorder and an NTP server;
the monitoring center is used for monitoring and managing all the access devices; the monitoring center comprises a main controller, a correction center, a data center and a management center, wherein the main controller is used for adjusting the working state of each device, the data center is used for storing data and logs generated by each device, the data center comprises a sound and video storage area and a device working log storage area, the correction center is used for adjusting the asynchronization of multi-machine bit recording and broadcasting, and the management center is used for monitoring the working state of each device;
the annular recording player is used for realizing the omnibearing shooting and recording of the audio and video; the annular recorder comprises a time acquisition module, an audio and video coding module, a protocol encryption module and an editing module, wherein the time acquisition module is used for acquiring standard time of each machine position, the audio and video coding module is used for coding audio and video into a data code stream, the protocol encryption module is used for verifying authenticity of the audio and video and preventing peeping in a transmission process, and the editing module is used for converting the transmitted data code stream into the audio and video and playing the audio and video;
the NTP server is used for computer time synchronization and time correction with high accuracy;
the network communication equipment is used for realizing data transmission and communication connection; the network communication equipment comprises a router, a network bridge and a switch, wherein the router divides a network through a routing algorithm, the network bridge adopts a data link layer protocol based on MAC address identification and forwarding to expand network distance and communication means, and the switch adopts a target detection model to detect data code streams and people and objects in videos in real time so as to regenerate audio and video and forwards the audio and video to a designated port;
the maintenance management platform is used for realizing remote management and maintenance of the equipment; the maintenance management platform comprises a classification module, a maintenance module and a configuration module, wherein the classification module is used for classifying audio and video data code streams, the classification module comprises a resolution unit, a denoising unit, a filling unit and a storage unit, the denoising unit is used for removing noise in the video code streams through an anomaly detection algorithm, the filling unit supplements missing items in the audio and video data code streams through an interpolation algorithm, the resolution unit is used for realizing separation of the audio and video data code streams by adopting an audio and video splitting model, the storage unit is used for storing filled data by adopting a distributed database, the output end of the resolution unit is connected with the input end of the denoising unit, the output end of the denoising unit is connected with the input end of the filling unit, the output end of the filling unit is connected with the input end of the storage unit, the maintenance module is used for detecting and displaying working states of all machine positions, the maintenance module comprises an intelligent matching unit, a data compression unit and a data caching unit, the intelligent matching unit is used for carrying out corresponding matching training and adaptation on image quality, the network rate and the data compression unit is used for carrying out correction on data by adopting a mixed compression algorithm, the data compression unit is used for carrying out remote data correction and the data configuration parameter correction and the data is used for carrying out remote data correction and has no configuration.
The monitoring center is respectively connected with the annular recorder, the NTP server, the network communication equipment and the maintenance management platform, the annular recorder is connected with the NTP server, the NTP server is connected with the network communication equipment, and the network communication equipment is connected with the maintenance management platform.
In the above embodiment, the multi-bit synchronous audio and video recording and playing condition can be greatly improved through data information processing.
In a further embodiment, the object detection model includes a feature extraction module, a learning module and an identification module, the feature extraction module is a reb+ network, the reb+ network uses 8 convolution kernels of 3×3 and 5×5 to connect in the form of a residual network, the convolution kernels are expanded into corresponding scales by using 10 depth separable convolutions, the learning module is a SPAN network, the SPAN network includes 11×11, 7×7, 5×5 and 3×3 average pooling, the step size is 1, and the SPAN network uses function operation to perform channel stacking, and the identification module is 4 detection heads of different scales.
In a further embodiment, in the process of processing the synchronous audio and video recording and playing information, in order to improve the data information processing capability, processing and calculating the recording data information are needed. In a specific application, the synchronous audio and video recording and broadcasting information refers to shooting, recording and broadcasting the same content at different time or different places. Typically, such content is produced as files in the form of video, audio, subtitles, etc., and distributed or shared periodically so that viewers can enjoy the same content at different times and places. How to process the data information is a problem to be solved.
In a further embodiment, the working principle of the object detection model is as follows: firstly, carrying out fine feature extraction on data code streams by using convolution check of 3×3 and 5×5 of REB+network, then transmitting the extracted features into a SPAN network for learning iterative training, stacking channels, and finally, carrying out identification on audio and video data code streams by using detection heads with different scales, as shown in table 1.
TABLE 1 Audio and video data code stream identification Table
Input device Target detection model Results
00111000 10 10
11011001 01 01
As can be seen from table 1, 1 represents a video code stream, 0 represents an audio code stream, and the audio and video code stream is more rapidly split through training and recognition of the target detection model, so as to facilitate subsequent processing.
In the data information processing process, files in the forms of video, audio, subtitles and the like are converted through data streams, so that the data information conversion capability and the application capability are improved.
In a further embodiment, the audio-video splitting model includes an encoder, a separator and a decoder, the encoder converts the mixed data code stream into a feature space representation, the separator calculates a data code stream mask of each source according to the extracted features to obtain corresponding features of each source, the decoder decodes the corresponding features of each source to obtain the data code stream of each source, the encoder performs 4 times of overlapping through two-dimensional space separable convolution to obtain a fine feature matrix, the separator is a time convolution network so that the size of an effective window increases along with the number of layers, and the decoder uses two identical one-dimensional convolutions to distinguish different channels of people speaking differently.
In the further data information processing process, the input data information is encoded, separated and decoded, so that the computing capacity and the application capacity of the data information are improved. The working principle of the audio frequency shunt model is as follows: the mixed data code stream is represented as a feature space by an encoder through two-dimensional space separable convolution, then the data code stream mask of each source is calculated by a separator through a time convolution network according to the extracted features to obtain the corresponding features of each source, and finally the corresponding features of each source are decoded by a decoder through two identical one-dimensional convolution to obtain the data code stream of each source.
In a further embodiment, the parameter correction model is a google netv3-unet+ model, the google netv3-unet+ model backbone network adopts a google netv3 network, the task network adopts an Unet network, the google netv3 network model comprises a feature layer, a learning layer and an identification layer, the feature layer comprises a3 x 3 convolution layer and a combined convolution layer consisting of 1 x 7 and 7 x 1, the learning layer is an acceptance module consisting of 16 convolution blocks of 1 x 3, 3 x 1 and 1 x 1, the acceptance module comprises a BN layer, a maximum pooling layer and an average pooling layer, the identification layer is a deep residual shrinkage module, the Unet network is of an encoder-decoder structure, the GoogleNet V3-Unet+ model firstly transplants a combination module of an index module and a depth residual error contraction module in GoogleNet V3 to an encoder of the Unet network, then extracts features according to the position of the Unet encoder by adopting the index module and the depth residual error contraction module with different depths, finally connects the extracted features to corresponding positions of a decoder in sequence for feature fusion, the depth residual error contraction module adopts a DRSN network, and the DRSN-CW network comprises a depth residual error network, an attention mechanism and a soft threshold function.
The principle of the parameter correction model is as follows: when parameters are detected, proper detection algorithms and models are required to be selected according to network characteristics and abnormal types so as to realize accurate detection and judgment of abnormal conditions, the data analysis can be used for realizing deep mining and analysis of abnormal data, finding out the characteristics and rules of the abnormal data, through fault diagnosis, determining the specific positions and reasons of the abnormal conditions, providing basis for subsequent repair, selecting proper repair schemes according to the specific reasons and positions of the abnormal conditions when a repair scheme is prepared, formulating corresponding repair flows, realizing comprehensive monitoring and management of the network through system monitoring, timely finding and solving the abnormal conditions, and improving the stability and reliability of the network.
In a further embodiment, the training process of the self-training model is as follows: firstly, training with label data to obtain a model, then predicting unlabeled data, adding data with high confidence to a training set, and continuing training until the model meets the requirements; the data caching unit adopts memcache and redis to cache and record network data,
the working principle of the self-training model is as follows: the device data is preprocessed and cleaned to remove useless data, so that the data quality is improved, and the key feature information in the data is extracted by extracting the features of the data. The feature extraction result is input into an abnormality detection module, the abnormal condition in data communication is identified through analysis and comparison of the data feature information, the abnormal data is presented in a visual form, a user can conveniently perform data analysis and decision, and the audio and video data code stream which is not synchronized in time is cached.
In a further embodiment, the hybrid compression algorithm model includes a dividing module, a mapping module and a replacing module, the dividing module divides input data into the same data block according to content characteristics, the mapping module maps characters with high occurrence frequency with 4-bit codes, maps characters with low occurrence frequency with 16-bit codes, and finally the replacing module replaces repeated character strings with hash tables, an output end of the dividing module is connected with an input end of the mapping module, and an output end of the mapping module is connected with an input end of the replacing module.
The working principle of the hybrid compression algorithm model is as follows: the characteristics of the target file are analyzed, including its file format, data structure, compression performance, file size, etc. This information can help to select the optimal compression algorithm and formulate an algorithm combination strategy. And selecting a proper compression algorithm according to the characteristics and compression requirements of the target file, and carrying out combined application according to different file types and characteristics. And applying the selected compression algorithm to the target file according to a preset combination strategy, and sequentially performing compression operation for a plurality of times until the required compression ratio or compression quality is achieved. In the compression process, a proper compression mode needs to be selected and optimized according to a data structure and a compression algorithm. In the decompression process, the selected compression algorithms are decompressed one by one according to the reverse order until the original target file data are obtained. In the decompression process, a corresponding decompression method is needed to be adopted, and decoding and data recovery are carried out.
Further embodiments, the main controller includes an fpga+dsp processing module, the DSP processing module is an ATMega328 type acquisition chip, the DSP processing module integrates a 14-path GPIO interface, a 6-path PWM interface, a 12-bit ADC interface, a UART serial port, A1-path SPI interface, and A1-path I2C interface, the FPGA processing module is an art ix-7 series XC7a100T-2FGG484I chip, the FPGA processing module adopts an optimal optimizing algorithm to find an optimal synchronization path, the optimal optimizing algorithm finds a path according to a probability formula, and the probability formula is:
(1)
in the formula (1), the components are as follows,probability of playing data on path (i, j) for kth recording/playback data,/for the kth recording/playback data>For the t-time path (i, j) pheromone,heuristic factor for path (i, j) at time t,>for recording and playing the path set which is allowed to be accessed in the next step of data k, s is the path set element which is allowed to be accessed,/for the next step>,/>K is a natural number, i is a row coordinate, and j is a column coordinate;
the recorded broadcast data releases pheromone on a passing path, and the pheromone updating formula is as follows:
(2)
in the formula (2), the amino acid sequence of the compound,for the degree of volatility->For the concentration of the pheromone of the path (i, j,) the +.>For the number of consecutive convergence times,is the maximum pheromone concentration, t is time, < >>Data margin>Is the variation of the pheromone concentration;
optimizing the path by updating the pheromone, wherein the optimization algorithm model is as follows:
(3)
(4)
in the formulae (3) to (4),for self-speed +.>For the best place the individual has experienced, +.>For the final position of population experience, A, B, C are correction factors, ++>For the last moment, +.>As a velocity vector of the velocity vector,
the principle of the optimal optimizing algorithm is as follows: in the process of constructing a solution, the optimal optimizing algorithm utilizes a strategy of combining deterministic selection and random selection, dynamically adjusts probability of deterministic selection in the searching process, initializes random positions and speeds of individuals in a population, acquires a solution with good strengthening performance according to a positive feedback principle, evaluates fitness of the individuals in the population, finds a historical optimal position, finds the optimal position of the population, finally updates the position and the speed, and when the evolution reaches a certain algebra, the evolution direction is basically determined, and then the information quantity on a path is dynamically adjusted, so that the optimal optimizing algorithm can accelerate algorithm convergence speed, and a local optimal solution is jumped out, as shown in a table 2.
Table 2 best optimizing path table
Node Time Results
A 100ms 5
B 50ms 3
C 20ms 1
D 60ms 4
E 30ms 2
As can be seen from Table 2, the selection priority of each node is determined by comparing the consumed time, so as to further improve the recording and broadcasting conversion and transmission efficiency and further standardize the synchronization rule of the multi-machine bit audio and video recording and broadcasting system. The research can improve the synchronous interaction capability and the data information processing capability of the data information, and has better technical effect.
While specific embodiments of the present invention have been described above, it will be understood by those skilled in the art that these specific embodiments are by way of example only, and that various omissions, substitutions, and changes in the form and details of the methods and systems described above may be made by those skilled in the art without departing from the spirit and scope of the invention. For example, it is within the scope of the present invention to combine the above-described method steps to perform substantially the same function in substantially the same way to achieve substantially the same result. Accordingly, the scope of the invention is limited only by the following claims.

Claims (7)

1. A multi-machine-bit synchronous audio and video recording and broadcasting system based on an image recognition algorithm is characterized in that: the system comprises a monitoring center, network communication equipment, a maintenance management platform, an annular recorder and an NTP server;
the monitoring center is used for monitoring and managing all the access devices; the monitoring center comprises a main controller, a correction center, a data center and a management center, wherein the main controller is used for adjusting the working state of each device, the data center is used for storing data and logs generated by each device, the data center comprises a sound and video storage area and a device working log storage area, the correction center is used for adjusting the asynchronization of multi-machine bit recording and broadcasting, and the management center is used for monitoring the working state of each device;
the annular recording player is used for realizing the omnibearing shooting and recording of the audio and video; the annular recorder comprises a time acquisition module, an audio and video coding module, a protocol encryption module and an editing module, wherein the time acquisition module is used for acquiring standard time of each machine position, the audio and video coding module is used for coding audio and video into a data code stream, the protocol encryption module is used for verifying authenticity of the audio and video and preventing peeping in a transmission process, and the editing module is used for converting the transmitted data code stream into the audio and video and playing the audio and video;
the NTP server is used for computer time synchronization and time correction with high accuracy;
the network communication equipment is used for realizing data transmission and communication connection; the network communication equipment comprises a router, a network bridge and a switch, wherein the router divides a network through a routing algorithm, the network bridge adopts a data link layer protocol based on MAC address identification and forwarding to expand network distance and communication means, and the switch adopts a target detection model to detect data code streams and people and objects in videos in real time so as to regenerate audio and video and forwards the audio and video to a designated port;
the maintenance management platform is used for realizing remote management and maintenance of the equipment; the maintenance management platform comprises a classification module, a maintenance module and a configuration module, wherein the classification module is used for classifying audio and video data code streams, the classification module comprises a resolution unit, a denoising unit, a filling unit and a storage unit, the denoising unit is used for removing noise in the video code streams through an anomaly detection algorithm, the filling unit supplements missing items in the audio and video data code streams through an interpolation algorithm, the resolution unit is used for realizing separation of the audio and video data code streams by adopting an audio and video splitting model, the storage unit is used for storing filled data by adopting a distributed database, the output end of the resolution unit is connected with the input end of the denoising unit, the output end of the denoising unit is connected with the input end of the filling unit, the output end of the filling unit is connected with the input end of the storage unit, the maintenance module is used for detecting and displaying working states of all machine positions, the maintenance module comprises an intelligent matching unit, a data compression unit and a data caching unit, the intelligent matching unit is used for carrying out corresponding matching training and adaptation on image quality, the network rate and the data compression unit is used for carrying out correction on data by adopting a mixed compression algorithm, the data compression unit is used for carrying out remote data correction and the data configuration parameter correction and the data is used for carrying out remote data correction and has no configuration.
2. The multi-bit synchronous audio and video recording and playing system based on the image recognition algorithm as set forth in claim 1, wherein: the target detection model comprises a feature extraction module, a learning module and an identification module, wherein the feature extraction module is a REB+ network, the REB+ network is connected in a residual network mode by adopting 8 convolution kernels of 3 multiplied by 3 and 5 multiplied by 5, the convolution kernels are expanded into corresponding scales by adopting 10 depth separable convolutions, the learning module is a SPAN network, the SPAN network comprises 11 multiplied by 11, 7 multiplied by 7, 5 multiplied by 5 and 3 multiplied by 3, the average pooling is carried out on the SPAN network, the step length is 1, the SPAN network adopts function operation to stack channels, and the identification module is a detection head of 4 different scales.
3. The multi-bit synchronous audio and video recording and playing system based on the image recognition algorithm as set forth in claim 1, wherein: the audio/video shunt model comprises an encoder, a separator and a decoder, wherein the encoder converts a mixed data code stream into a characteristic space representation, the separator calculates a data code stream mask of each source according to the extracted characteristic so as to obtain the corresponding characteristic of each source, the decoder decodes the corresponding characteristic of each source to obtain the data code stream of each source, the encoder carries out 4 times of overlapping through two-dimensional space separable convolution to obtain a fine characteristic matrix, the separator is a time convolution network so that the size of an effective window increases along with the increase of the number of layers, and the decoder adopts two identical one-dimensional convolutions to distinguish different channels of people speaking.
4. The multi-bit synchronous audio and video recording and playing system based on the image recognition algorithm as set forth in claim 1, wherein: the parameter correction model is a GoogleNet V3-Unet+ model, the GoogleNet V3-Unet+ model backbone network adopts a GoogleNet V3 network, the task network adopts a Unet network, the GoogleNet V3 network model comprises a feature layer, a learning layer and an identification layer, the feature layer comprises a 3X 3 convolution layer and a combined convolution layer formed by 1X 7 and 7X 1, the learning layer is an index module formed by 16 convolution blocks of 1X 3, 3X 1 and 1X 1, the index module comprises a BN layer, a maximum pooling layer and an average pooling layer, the identification layer encloses a depth residual error shrinkage module, the Unet network adopts an encoder-decoder structure, the GoogleNet V3-Unet+ model firstly transplants the combined module of the index module and the depth residual error module into an encoder of the Unet network, then extracts the feature residual error module and the depth residual error module from the deep error module to the deep error module according to the position of the Unet encoder, and finally extracts the feature residual error module by adopting different depth shrinkage mechanisms, and the depth error channel and the SN-depth error function of the deep error module are sequentially extracted, and the feature residual error is extracted from the deep error module of the deep error network.
5. The multi-bit synchronous audio and video recording and playing system based on the image recognition algorithm as set forth in claim 1, wherein: the training process of the self-training model is as follows: firstly, training with label data to obtain a model, then predicting unlabeled data, adding data with high confidence to a training set, and continuing training until the model meets the requirements; the data caching unit adopts memcache and redis to cache recorded broadcast network data.
6. The multi-bit synchronous audio and video recording and playing system based on the image recognition algorithm as set forth in claim 1, wherein: the mixed compression algorithm model comprises a dividing module, a mapping module and a replacing module, wherein the dividing module divides input data into the same data block according to content characteristics, the mapping module is used for mapping characters with high occurrence frequency by using 4-bit codes, the mapping module is used for mapping characters with low occurrence frequency by using 16-bit codes, finally the replacing module is used for replacing repeated character strings by using a hash table, the output end of the dividing module is connected with the input end of the mapping module, and the output end of the mapping module is connected with the input end of the replacing module.
7. The multi-bit synchronous audio and video recording and playing system based on the image recognition algorithm as set forth in claim 1, wherein: the main controller comprises an FPGA+DSP processing module, the DSP processing module is an acquisition chip of ATMega328 model, the DSP processing module integrates a 14-path GPIO interface, a 6-path PWM interface, a 12-bit ADC interface, a UART serial port, A1-path SPI interface and A1-path I2C interface, the FPGA processing module is an ARTIX-7 series XC7A100T-2FGG484I chip, the FPGA processing module adopts an optimal optimizing algorithm to search an optimal synchronous path, the optimal optimizing algorithm searches a path according to a probability formula, and the probability formula is:
(1)
in the formula (1), the components are as follows,probability of playing data on path (i, j) for kth recording/playback data,/for the kth recording/playback data>For the t-time path (i, j) pheromone,heuristic factor for path (i, j) at time t,>for recording and playing the path set which is allowed to be accessed in the next step of data k, s is the path set element which is allowed to be accessed,/for the next step>,/>K is a natural number, i is a row coordinate, and j is a column coordinate;
the recorded broadcast data releases pheromone on a passing path, and the pheromone updating formula is as follows:
(2)
in the formula (2), the amino acid sequence of the compound,for the degree of volatility->Is the concentration of the information element of the path (i, j)Degree (f)>For the number of consecutive convergence>Is the maximum pheromone concentration, t is time, < >>Data margin>Is the variation of the pheromone concentration;
optimizing the path by updating the pheromone, wherein the optimization algorithm model is as follows:
(3)
(4)
in the formulae (3) to (4),for self-speed +.>For the best place the individual has experienced, +.>For the final position of population experience, A, B, C are correction factors, ++>For the last moment, +.>Is a velocity vector.
CN202311008485.4A 2023-08-11 2023-08-11 Multi-machine-bit synchronous audio and video recording and playing system based on image recognition algorithm Active CN116761030B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311008485.4A CN116761030B (en) 2023-08-11 2023-08-11 Multi-machine-bit synchronous audio and video recording and playing system based on image recognition algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311008485.4A CN116761030B (en) 2023-08-11 2023-08-11 Multi-machine-bit synchronous audio and video recording and playing system based on image recognition algorithm

Publications (2)

Publication Number Publication Date
CN116761030A true CN116761030A (en) 2023-09-15
CN116761030B CN116761030B (en) 2023-10-27

Family

ID=87961217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311008485.4A Active CN116761030B (en) 2023-08-11 2023-08-11 Multi-machine-bit synchronous audio and video recording and playing system based on image recognition algorithm

Country Status (1)

Country Link
CN (1) CN116761030B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09298719A (en) * 1996-05-07 1997-11-18 Hitachi Denshi Ltd Method and device for multiplexing plural digital signals
JP2003219350A (en) * 2002-01-21 2003-07-31 Nec Corp Video server control system and composite source recording method
CN101102510A (en) * 2006-07-07 2008-01-09 乐金电子(昆山)电脑有限公司 Audio and video synchronization method for portable image terminal
CN101202876A (en) * 2006-12-15 2008-06-18 天津三星电子有限公司 Method for implementing synchronization of audio and picture by using audio frequency and video frequency composite channel in DVR
KR20130140438A (en) * 2012-06-14 2013-12-24 임주은 Blackbox for a vehicle for providing a signal detection function of the brake or the accelerator
CN104125022A (en) * 2013-11-27 2014-10-29 腾讯科技(成都)有限公司 Audio transmission delay measuring method and system
CN112468822A (en) * 2020-11-06 2021-03-09 上海钦文信息科技有限公司 Multimedia recording and broadcasting course interaction method based on video SEI message
CN113822147A (en) * 2021-08-04 2021-12-21 北京交通大学 Deep compression method for semantic task of cooperative machine

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09298719A (en) * 1996-05-07 1997-11-18 Hitachi Denshi Ltd Method and device for multiplexing plural digital signals
JP2003219350A (en) * 2002-01-21 2003-07-31 Nec Corp Video server control system and composite source recording method
CN101102510A (en) * 2006-07-07 2008-01-09 乐金电子(昆山)电脑有限公司 Audio and video synchronization method for portable image terminal
CN101202876A (en) * 2006-12-15 2008-06-18 天津三星电子有限公司 Method for implementing synchronization of audio and picture by using audio frequency and video frequency composite channel in DVR
KR20130140438A (en) * 2012-06-14 2013-12-24 임주은 Blackbox for a vehicle for providing a signal detection function of the brake or the accelerator
CN104125022A (en) * 2013-11-27 2014-10-29 腾讯科技(成都)有限公司 Audio transmission delay measuring method and system
CN112468822A (en) * 2020-11-06 2021-03-09 上海钦文信息科技有限公司 Multimedia recording and broadcasting course interaction method based on video SEI message
CN113822147A (en) * 2021-08-04 2021-12-21 北京交通大学 Deep compression method for semantic task of cooperative machine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王建林: "时间码在音视频制作中应用浅析", 有线电视技术, no. 21, pages 44 - 47 *

Also Published As

Publication number Publication date
CN116761030B (en) 2023-10-27

Similar Documents

Publication Publication Date Title
TWI826321B (en) A method for enhancing quality of media
US9210436B2 (en) Distributed video coding/decoding method, distributed video coding/decoding apparatus, and transcoding apparatus
US7859561B2 (en) Method and system for video conference
CN1235408C (en) Generating and matching hashes of multimedia content
WO2009056038A1 (en) A method and device for describing and capturing video object
US10939161B2 (en) System and method for low-latency communication over unreliable networks
CN1719909A (en) Method for measuring audio-video frequency content change
CN107533850A (en) Audio content recognition methods and device
US20120151291A1 (en) Receiving apparatus and processing method for receiving apparatus
JP2006501746A (en) Method and system for improving transmission efficiency using multiple description layer coding
CN103024517A (en) Method for synchronously playing streaming media audios and videos based on parallel processing
AU2012265335A1 (en) Audio decoding method and device
Xia et al. WiserVR: Semantic communication enabled wireless virtual reality delivery
US10225043B2 (en) Information processing apparatus, information processing method, and program
CN116761030B (en) Multi-machine-bit synchronous audio and video recording and playing system based on image recognition algorithm
KR20170067546A (en) System and method for audio signal and a video signal synchronization
US10075196B2 (en) Information processing apparatus, information processing method, and program
CN116824480A (en) Monitoring video analysis method and system based on deep stream
CN103533353A (en) Approximate video encoding system
CN113068059B (en) Video live broadcasting method, device, equipment and storage medium
US20220030233A1 (en) Interpolation filtering method and apparatus for intra-frame prediction, medium, and electronic device
KR102430177B1 (en) System for rapid management of large scale moving pictures and method thereof
He et al. MTRFN: Multiscale temporal receptive field network for compressed video action recognition at edge servers
TWI661421B (en) System and method with audio watermark
Liang et al. VISTA: Video Transmission over A Semantic Communication Approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant