CN109040645B

CN109040645B - Audio and video file transcription method and device, storage medium and server

Info

Publication number: CN109040645B
Application number: CN201810873501.9A
Authority: CN
Inventors: 刘广伟; 乔磊
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-08-02
Filing date: 2018-08-02
Publication date: 2022-06-17
Anticipated expiration: 2038-08-02
Also published as: CN109040645A

Abstract

The invention relates to the field of data processing, in particular to an audio and video file transcription method, an audio and video file transcription device, a storage medium and a server, wherein the method comprises the following steps: receiving a first network packet generated by a video call center, and writing the first network packet into a first temporary file according to a preset sequence; analyzing the first temporary file, acquiring a first network packet in the first temporary file, screening a second network packet which can be analyzed into video or audio from the first network packet according to a preset condition comprising an IP port and an extension rule, and determining the first temporary file corresponding to the second network packet as a second temporary file; and analyzing the second temporary file, and converting the second temporary file into a persistent file in a corresponding media format according to a preset media format. The method and the device can optimize each link for transcribing the audio and video stream into the persistent file, reduce the instantaneous load of the server and improve the transcription efficiency of the audio and video stream.

Description

Audio and video file transcription method and device, storage medium and server

Technical Field

The invention relates to the field of data processing, in particular to an audio and video file transcription method and device, a storage medium and a server.

Background

With the development of multimedia and internet technologies, more and more original services that need to be completed on site can be completed on line, for example, when a user is transacted with a service that needs to identify a person, the user is originally required to go to the site to realize audio and video recording and identification through a terminal on site, at present, the services can be completed on line, and the recorded audio and video needs to be stored as subsequent user data for use. At present, when an on-line execution service needs to record audio and video, a received audio and video stream is transmitted to a transcription server through an optical splitter, and the audio and video stream is analyzed and converted into a readable audio and video format on the transcription server in real time.

Disclosure of Invention

In order to overcome the technical problems, in particular to the problem that the prior art can not efficiently screen and transcribe the received audio/video stream, the following technical scheme is particularly provided:

in a first aspect, the present invention provides an audio/video file transcription method, including:

receiving a first network packet generated by a video call center, and writing the first network packet into a first temporary file according to a preset sequence;

analyzing the first temporary file, acquiring a first network packet in the first temporary file, screening a second network packet which can be analyzed into video or audio from the first network packet according to a preset condition comprising an IP port and an extension rule, and determining the first temporary file corresponding to the second network packet as a second temporary file;

and analyzing the second temporary file, and converting the second temporary file into a persistent file in a corresponding media format according to a preset media format.

Further, before parsing the first temporary file, the method further includes:

when the number of the first temporary files is larger than a first peak value, distributing the subsequent processing steps to servers in a first geographical range for execution;

and when the number of the first temporary files is larger than the second peak value, distributing the subsequent processing steps to the servers in the second geographical range for execution.

Further, after the converting the second temporary file into the persistent file in the corresponding media format according to the preset media format, the method further includes:

and counting the accuracy of the persistent file, and modifying the preset conditions including the IP port and the extension rule according to the accuracy.

removing the second temporary file.

In a second aspect, the present invention provides an audio/video file transcription apparatus, including:

a receiving module: the system comprises a first temporary file, a second temporary file and a video call center, wherein the first temporary file is used for receiving a first network packet generated by the video call center and writing the first network packet into the first temporary file according to a preset sequence;

an analysis module: the network interface device is used for analyzing the first temporary file, acquiring a first network packet in the first temporary file, screening a second network packet capable of being analyzed into video or audio from the first network packet according to a preset condition comprising an IP port and an extension rule, and determining the first temporary file corresponding to the second network packet as a second temporary file;

a transcription module: and the persistent file is used for analyzing the second temporary file and converting the second temporary file into a corresponding media format persistent file according to a preset media format.

Further, the apparatus further comprises:

a distribution module: the system comprises a parsing module, a storage module and a processing module, wherein the parsing module is used for executing that when the number of the first temporary files is larger than a first peak value, subsequent processing steps are distributed to servers in a first geographical range for execution before the first temporary files are parsed by the parsing module; and when the number of the first temporary files is larger than the second peak value, distributing the subsequent processing steps to the servers in the second geographical range for execution.

Further, the apparatus further comprises:

a statistic module: the device is used for counting the accuracy of the persistent file after the transcription module converts the second temporary file into the persistent file of the corresponding media format according to the preset media format, and modifying the preset condition containing the IP port and the extension rule according to the accuracy.

Further, the apparatus further comprises:

removing the module: the system is used for removing the second temporary file after the transcription module executes the conversion of the second temporary file into a persistent file of a corresponding media format according to a preset media format.

In a third aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the audio/video file transcription method described above.

In a fourth aspect, the present invention also provides a server comprising one or more processors, a memory, one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs are configured to perform the above-mentioned audio-video file transcription method.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides an audio and video file transcription method, which comprises the steps of processing a received first network packet generated by a video call center in a blocking and asynchronous manner, writing the first network packet into a first temporary file, separating and screening the first network packet, screening out the audio and video network packet generated by service use, and transcribing the audio and video network packet into a persistent file, wherein each step can be realized based on multi-task and multi-thread concurrency, so that the integral audio and video transcription efficiency reduction caused by the bottleneck of processing efficiency of a single step is avoided, the functions of analyzing the first temporary file and transcribing a second temporary file can realize distributed and asynchronous processing, and the time limit of network flow receiving the first network packet is not limited any more, thereby reducing the load of a server and improving the transcription efficiency.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flow chart of an embodiment of an audio/video file transcription method according to the present invention;

fig. 2 is a schematic flow chart of another embodiment of the audio-video file transcription method of the invention;

fig. 3 is a schematic view of an embodiment of the audio/video file transcription apparatus according to the present invention;

fig. 4 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, or operations, but do not preclude the presence or addition of one or more other features, integers, steps, operations, or groups thereof.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

It will be appreciated by those skilled in the art that the terms "application," "computer program" and similar terms used herein refer to the same concepts known to those skilled in the art that refer to computer software electronically-adapted to be organized into a series of computer instructions and associated data sources. Unless otherwise specified, such nomenclature is not itself limited by the programming language class, level, or operating system or platform upon which it depends. Of course, such concepts are not limited to any type of terminal.

The embodiment of the invention provides an audio and video file transcription method, as shown in fig. 1, the method comprises the following steps:

s10: receiving a first network packet generated by a video call center, and writing the first network packet into a first temporary file according to a preset sequence.

The video call center provides video service for service departments to complete remote video service transaction, such as surface signing, body checking and the like, the first network packet is generated during the operation of the video call center, and the first network packet can be the first network packet generated by the voice and video data recorded by the video call center, or the first network packet which is generated by the operation of the video call center and is irrelevant to the audio/video service, in order to avoid the omission of information, the useful information is stored to the greatest extent, in the embodiment, a first process in an audio and video file transcription system based on a video call center receives all first network packets generated by the video call center, then storing the first network packets, writing the first network packets into a first temporary file according to a preset sequence, the first network packets are written to the first temporary file, for example, in chronological order of their generation.

S20: analyzing the first temporary file, acquiring a first network packet in the first temporary file, screening a second network packet which can be analyzed into video or audio from the first network packet according to a preset condition comprising an IP port and an extension rule, and determining the first temporary file corresponding to the second network packet as a second temporary file.

After a first network packet generated by a video call center is generated into a first temporary file, in order to screen out a network packet generated by a video call center completing video service handling, in this embodiment, a first network packet in the first temporary file can be obtained by parsing the first temporary file based on a second process in an audio/video file transcription system of the video call center, then the first network packet is screened after being unpacked, specifically, the first network packet is screened according to a preset condition including an IP port and an extension rule, a second network packet capable of being parsed into video or audio is screened out from the first network packet, the video call center provides an access interface of a camera and a telephone, the video call center provides video service handling through camera shooting, for example, a user is connected to the video call center through a mobile phone, the method comprises the steps that a camera is used for shooting a video image for identity verification, at the moment, a user mobile phone is connected with a video call center through a specific IP port and then sends the video image to the video call center, and the video call center generates a first network packet; in addition, the video call center provides service handling through telephone, for example, a user makes a call through a mobile phone to carry out service handling, at the moment, the mobile phone is accessed to the video call center and then sends voice data to the video call center, the video call center generates a first network packet, second network packets which can be analyzed into video can be screened out from the first network packets through preset conditions comprising IP ports, second network packets which can be analyzed into audio can be screened out from the first network packets through preset conditions comprising extension rules, and then determining the first temporary file corresponding to the second network packet as a second temporary file, which is an implementation manner of this embodiment, when the first temporary file is analyzed, analyzing the copy of the first temporary file, and then when the first temporary file is determined to be a second temporary file, repackaging the first network packet obtained by analysis is not needed any more; in another embodiment of this embodiment, after the first temporary file is parsed, the first network packet is unpacked, and then the first network packet meeting the preset condition is obtained by screening, and then the first network packet meeting the condition is repackaged and encapsulated into the second temporary file.

Step S30: and analyzing the second temporary file, and converting the second temporary file into a persistent file in a corresponding media format according to a preset media format.

After the second temporary file of the network packet generated by the video service or the voice service is determined, in this embodiment, the second temporary file is analyzed based on a third process in an audio/video file transcription system of the video call center, and it is analyzed whether the media format of the temporary file is a voice file or a video file, and then according to the preset media format, if the second temporary file generated by the video service is a video file, the second temporary file is converted into a persistent video file according to h.265 or h.264, and if the second temporary file generated by the voice service is a voice file, the second temporary file is converted into a persistent audio file according to AAC, and the persistent file can be permanently stored in a storage medium and used as a subsequent evidence chain.

The embodiment provides an audio and video file transcription method, which comprises the steps of processing a received first network packet generated by a video call center in a blocking and asynchronous mode, writing the first network packet into a first temporary file, separating and screening the first network packet, screening out the audio and video network packet generated in the service use process, and transcribing the audio and video network packet into a persistent file, wherein each step can be realized based on multi-task and multi-thread concurrence, the integral audio and video transcription efficiency reduction caused by the bottleneck of processing efficiency of a single step is avoided, the functions of analyzing the first temporary file and transcribing a second temporary file can realize distributed and asynchronous processing, the limitation of network flow receiving the first network packet is avoided, the server load is reduced, and the transcription efficiency is improved.

In an embodiment of the present invention, before parsing the first temporary file, the method further includes:

In the embodiment of the present invention, the audio/video file transcription system of the video call center is based on distributed operation and management, that is, the audio/video file transcription system of the video call center operates on servers distributed in a plurality of areas, and preferably, a central server performs unified management on the distributed servers, in this embodiment, before parsing the first temporary file, it is determined whether the number of the first temporary files reaches a preset peak value, if the number of the first temporary files in a certain area is greater than a first peak value, the server in the area cannot process an excessively large number of the first temporary files in time, at this time, subsequent processing steps are allocated to servers within a first geographic range to be executed, for example, after a first network packet received in a cantonese area is written into the first temporary file, it is found that the number of the first temporary files in the cantonese area is greater than the first peak value, at the moment, the subsequent processing steps are distributed to a server in the Guangdong region for execution, so that the load of the server in the Guangzhou region is reduced, and the multi-region cooperative work is carried out to improve the transcription efficiency of the audio and video files; further, when the number of the first temporary files is larger than the second peak value, subsequent processing steps are distributed to a server in a second geographic range for execution, for example, after the first network packet received by the Guangzhou region is written into the first temporary file, the number of the first temporary files in the Guangzhou region is found to be larger than the second peak value, and then the subsequent processing steps are distributed to a server in the south China region for execution, so that the load of the servers in the Guangzhou and the Guangdong region is reduced, and meanwhile, the multi-region cooperative work improves the transcription efficiency of the audio and video files.

As shown in fig. 2, an embodiment of the present invention, after converting the second temporary file into a persistent file in a corresponding media format according to a preset media format, further includes:

s40: and counting the accuracy of the persistent file, and modifying the preset conditions including the IP port and the extension rule according to the accuracy.

After the second temporary file is converted into the persistent file, the accuracy of the persistent file is counted, specifically, whether a video file in the persistent file corresponds to a video service or not is counted, and whether an audio file in the persistent file corresponds to a voice service or not is counted, for example, after a certain IP port is not used by the video service due to the change of the service, and the second temporary file is converted into the video persistent file according to the preset condition of the IP port, the video persistent file cannot be opened and used, and at this time, the video persistent file is an erroneous persistent file; similarly, if a branch machine is not used by the telephone service due to the change of the service, after the second temporary file is converted into the voice persistent file according to the preset condition of the extension rule, the voice persistent file is irrelevant to the voice service required to be stored, at this moment, the voice persistent file is an error persistent file, so that the accuracy of the persistent file is counted, then the preset condition containing the IP port and the extension rule is modified according to the accuracy, the audio/video file of the service required to be stored is matched with the preset condition better, the omission of the audio/video file of the service required to be stored is avoided, and meanwhile, the files of the services not required to be stored are reduced.

In an embodiment of the present invention, after converting the second temporary file into a persistent file in a corresponding media format according to a preset media format, the method further includes:

s50: removing the second temporary file.

After the second temporary file is converted into the persistent file in the corresponding media format, the persistent file can be permanently stored in the storage medium, and at this time, in order to save the storage resource of the server, the second temporary file converted into the persistent file is removed, it can be known that the second temporary file is screened from the original first temporary file, and at this time, the first temporary file is also removed, so that the storage resource of the server is saved.

As shown in fig. 3, in another embodiment, the present invention provides an audio-video file transcription apparatus, including:

the receiving module 10: the system comprises a first temporary file, a second temporary file and a video call center, wherein the first temporary file is used for receiving a first network packet generated by the video call center and writing the first network packet into the first temporary file according to a preset sequence;

the analysis module 20: the network interface device is used for analyzing the first temporary file, acquiring a first network packet in the first temporary file, screening a second network packet capable of being analyzed into video or audio from the first network packet according to a preset condition comprising an IP port and an extension rule, and determining the first temporary file corresponding to the second network packet as a second temporary file;

the transcription module 30: and the persistent file is used for analyzing the second temporary file and converting the second temporary file into a corresponding media format persistent file according to a preset media format.

In this embodiment, the audio/video file transcription device includes a receiving module 10, an analyzing module 20, and a transcription module 30, which may asynchronously execute the functions, where after the receiving module 10 writes a first network packet generated by a video call center into a first temporary file, the analyzing module 20 may obtain the first temporary file at a certain interval, then analyze the first temporary file, screen out a second network packet that may be analyzed as video or audio from the first network packet according to a preset condition including an IP port and an extension rule, and determine a first temporary file corresponding to the second network packet as a second temporary file; similarly, after the parsing module 20 determines the second temporary file, the transcribing module 30 obtains and analyzes the second temporary file according to a predetermined time interval, converts the second temporary file into a persistent file in a corresponding media format according to a preset media format, and does not need to parse and transcribe the received first network packet into the persistent file in real time, so that the receiving module 10, the parsing module 20, and the transcribing module 30 respectively complete their own functions in an asynchronous processing manner, and they do not mutually affect each other.

Specifically, the receiving module 10 receives all first network packets generated by the video call center, then stores all the first network packets, and writes the first network packets into a first temporary file according to a predetermined sequence; the parsing module 20 parses the first temporary file to obtain a first network packet in the first temporary file, then screens the first network packet, screens the first network packet according to a preset condition including an IP port and an extension rule, screens a second network packet that can be parsed into video or audio from the first network packet, screens the second network packet that can be parsed into video from the first network packet according to the preset condition including the IP port, screens the second network packet that can be parsed into audio from the first network packet according to the preset condition including the extension rule, and then determines a first temporary file corresponding to the second network packet as a second temporary file; after the parsing module 20 determines the second temporary file of the network packet generated by the video service or the voice service, the transcribing module 30 analyzes the second temporary file, analyzes whether the media format of the temporary file is a voice file or a video file, and then converts the second temporary file into a persistent video file according to h.265 or h.264 if the second temporary file is generated by the video service according to a preset media format, and converts the second temporary file into a persistent audio file according to AAC if the second temporary file is generated by the voice service, and the persistent file can be permanently stored in a storage medium and used as a subsequent evidence chain.

In an embodiment of the invention, the apparatus further comprises:

a distribution module: the system is used for distributing subsequent processing steps to servers in a first geographical range for execution when the number of the first temporary files is larger than a first peak value before the parsing module 20 parses the first temporary files; and when the number of the first temporary files is larger than the second peak value, distributing the subsequent processing steps to the servers in the second geographical range for execution.

In an embodiment of the invention, the apparatus further comprises:

the statistical module 40: the method and the device are used for counting the accuracy of the persistent file after the transcription module 30 converts the second temporary file into the persistent file in the corresponding media format according to the preset media format, and modifying the preset condition containing the IP port and the extension rule according to the accuracy.

In an embodiment of the invention, the apparatus further comprises:

removing the module: for removing the second temporary file after the dubbing module 30 executes the conversion of the second temporary file into the persistent file in the corresponding media format according to the preset media format.

In another embodiment, the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the audio and video file transcription method described in the above embodiments. The computer-readable storage medium includes, but is not limited to, any type of disk including floppy disks, hard disks, optical disks, CD-ROMs, and magneto-optical disks, ROMs (Read-Only memories), RAMs (Random AcceSS memories), EPROMs (EraSable Programmable Read-Only memories), EEPROMs (Electrically EraSable Programmable Read-Only memories), flash memories, magnetic cards, or optical cards. That is, a storage device includes any medium that stores or transmits information in a form readable by a device (e.g., a computer, a cellular phone), and may be a read-only memory, a magnetic or optical disk, or the like.

The computer-readable storage medium provided by the embodiment of the invention can receive a first network packet generated by a video call center, and write the first network packet into a first temporary file according to a predetermined sequence; analyzing the first temporary file, acquiring a first network packet in the first temporary file, screening a second network packet which can be analyzed into video or audio from the first network packet according to a preset condition comprising an IP port and an extension rule, and determining the first temporary file corresponding to the second network packet as a second temporary file; and analyzing the second temporary file, and converting the second temporary file into a persistent file in a corresponding media format according to a preset media format. The method comprises the steps of processing a received first network packet generated by a video call center in a blocking and asynchronous mode, writing the first network packet into a first temporary file, separating and screening the first network packet, screening out the audio and video network packet generated in the service use process, and transcribing the audio and video network packet into a persistent file, wherein each step can be realized based on multi-task and multi-thread concurrency, the integral audio and video transcription efficiency reduction caused by the bottleneck of the processing efficiency of a single step is avoided, the functions of analyzing the first temporary file and transcribing a second temporary file can realize distributed and asynchronous processing, the limitation of the time effect of receiving the network stream of the first network packet is avoided, the server load is reduced, and the transcription efficiency is improved.

The computer-readable storage medium provided by the embodiment of the present invention can implement the above-mentioned embodiment of the audio/video file transcription method, and for specific function implementation, reference is made to the description in the embodiment of the method, which is not described herein again.

In addition, in another embodiment, the present invention further provides a server, as shown in fig. 4, the server includes a processor 403, a memory 405, an input unit 407, a display unit 409, and the like. Those skilled in the art will appreciate that the structural elements shown in fig. 4 do not constitute a limitation of all servers and may include more or fewer components than those shown, or some combination of components. The memory 405 may be used to store the computer program 401 and the functional modules, and the processor 403 executes the computer program 401 stored in the memory 405, thereby executing various functional applications of the device and data processing. The memory 405 may be an internal memory or an external memory, or include both internal and external memories. The memory may comprise read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), flash memory, or random access memory. The external memory may include a hard disk, a floppy disk, a ZIP disk, a usb-disk, a magnetic tape, etc. The disclosed memory includes, but is not limited to, these types of memory. The memory 405 disclosed herein is provided by way of example only and not by way of limitation.

The input unit 407 is configured to receive signal input and user input, and the input unit 407 may include a touch panel and other input devices, where the touch panel may collect touch operations of a user on or near the touch panel (for example, operations of a user on or near the touch panel using any suitable object or accessory such as a finger, a stylus pen, etc.) and drive a corresponding connection device according to a preset program; other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., play control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like. The display unit 409 may be used to display information input by a user or information provided to a user and various menus of the computer device. The display unit 409 may take the form of a liquid crystal display, an organic light emitting diode, or the like. The processor 403 is a control center of the computer device, connects various parts of the entire computer using various interfaces and lines, and performs various functions and processes data by operating or executing software programs and/or modules stored in the memory 403 and calling data stored in the memory. In one embodiment, the server includes one or more processors 403, one or more memories 405, one or more computer programs 401, wherein the one or more computer programs 401 are stored in the memory 405 and configured to be executed by the one or more processors 103, and the one or more computer programs 401 are configured to perform the audio-video file transcription method described in the above embodiments. One or more processors 403 shown in fig. 4 can execute, implement, the functions of receiving module 10, parsing module 20, transcribing module 30, statistics module 40 shown in fig. 3.

The server provided by the embodiment of the invention can receive the first network packet generated by the video call center and write the first network packet into the first temporary file according to the preset sequence; analyzing the first temporary file, acquiring a first network packet in the first temporary file, screening a second network packet which can be analyzed into video or audio from the first network packet according to a preset condition comprising an IP port and an extension rule, and determining the first temporary file corresponding to the second network packet as a second temporary file; and analyzing the second temporary file, and converting the second temporary file into a persistent file in a corresponding media format according to a preset media format. The method comprises the steps of processing a received first network packet generated by a video call center in a blocking and asynchronous mode, writing the first network packet into a first temporary file, separating and screening the first network packet, screening out the audio and video network packets generated in the service using process, and transcribing the audio and video network packets into a persistent file, wherein each step can be realized based on multi-task and multi-thread concurrency, the phenomenon that the overall audio and video transcription efficiency is reduced due to the bottleneck of processing efficiency of a single step is avoided, the function of analyzing the first temporary file and the function of transcribing a second temporary file can realize distributed and asynchronous processing, the limitation of the time efficiency of network flow receiving the first network packet is avoided, the server load is reduced, and the transcription efficiency is improved.

The server provided by the embodiment of the present invention can implement the embodiment of the provided speech video file transcription method, and for specific function implementation, reference is made to the description in the embodiment of the method, which is not described herein again.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An audio-video file transcription method is characterized by comprising the following steps:

processing a received first network packet generated by a video call center in a blocking and asynchronous mode, and writing the first network packet into a first temporary file according to a preset sequence;

analyzing the second temporary file, and converting the second temporary file into a persistent file of a corresponding media format according to a preset media format;

wherein, each step is realized based on multi-task and multi-thread concurrence.

2. The method of claim 1, wherein prior to parsing the first temporary file, further comprising:

3. The method of claim 1, wherein after converting the second temporary file into a persistent file in a corresponding media format according to a preset media format, the method further comprises:

4. The method of claim 1, wherein after converting the second temporary file into a persistent file in a corresponding media format according to a preset media format, the method further comprises:

removing the second temporary file.

5. An audio-video file transcription apparatus, comprising:

a receiving module: the device comprises a first temporary file, a second temporary file and a third temporary file, wherein the first temporary file is used for writing a first network packet generated by a video call center into the first temporary file in a blocking and asynchronous mode;

a transcription module: the persistent file is used for analyzing the second temporary file and converting the second temporary file into a corresponding media format according to a preset media format;

6. The apparatus of claim 5, further comprising:

a distribution module: the analysis module is used for executing the following processing steps distributed to the servers in the first geographical range to be executed when the number of the first temporary files is larger than a first peak value before the analysis module analyzes the first temporary files; and when the number of the first temporary files is larger than the second peak value, distributing the subsequent processing steps to the servers in the second geographical range for execution.

7. The apparatus of claim 5, further comprising:

8. The apparatus of claim 5, further comprising:

9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, implements the audio-video file transcription method according to any one of claims 1 to 4.

10. A server, comprising:

one or more processors;

a memory;

one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, the one or more computer programs being configured to perform the audio-video file transcription method as claimed in any one of claims 1 to 4.