The invention content is as follows:
the invention aims to solve the problems of low coding efficiency, few supported paths, high power consumption, incapability of fully utilizing each coding device along with the change of user flow and the like in the process of massive video coding. In order to solve the defects of the traditional method, the invention provides a multi-process coding method based on events and a shared memory mechanism through the research of various coding methods and hardware interfaces, realizes the isolation between different users at the upper layer and improves the stability of a coding system by adding a virtual layer between upper application and bottom hardware, and provides a scheduling method for improving the concurrency between processes as much as possible through shared memory and a mutual exclusion lock, thereby supporting multiple users to simultaneously and efficiently access bottom coding equipment, and the bottom coding equipment uses a hardware mechanism to realize the simultaneous coding of multiple paths of videos, thereby fully exploiting the performance of each coding equipment. The method comprises the following steps:
in one aspect, the present invention provides a multi-process coding method based on an event and shared memory mechanism, wherein the method includes:
setting a virtual layer between an upper application and bottom hardware;
dividing each path of coding entity into four steps of reading a stream of data to be coded, updating coding parameters, coding and closing, sequentially calling interfaces corresponding to the four steps of reading the stream of the data to be coded, updating the coding parameters, coding and closing by an upper layer application, and managing bottom hardware coding resources by a virtual layer according to the sequential calling so as to realize multi-process coding;
and each path of coding entity requires to sequentially call the interfaces corresponding to the four steps of reading the data stream to be coded, updating the coding parameters, coding and closing.
Preferably, the step of reading the data stream to be encoded includes opening a path of encoding entity, initializing and filling encoding parameters;
the step of updating the coding parameters comprises that the coding mode is changed by the user process of the upper application by changing the current coding parameters in the coding process;
the encoding step comprises that according to the set encoding parameters, the user process of the upper application circularly calls the corresponding interface of the step to encode;
and the closing step comprises closing the path coding entity, and the virtual layer releases the corresponding hardware coding resource.
Preferably, mutually exclusive calling among different paths of coding entities reads a stream of data to be coded, updates coding parameters and closes a corresponding interface.
Preferably, the virtual layer allows multiple encoding entities to perform the encoding step simultaneously, and when a new encoding entity performs the step of reading the data stream to be encoded, the virtual layer creates a new thread;
and the threads are synchronized in an event mode, wherein the producer process generates a data trigger event, the consumer process is awakened, and if the consumer process cannot acquire the event, the process enters a dormant state.
Preferably, the virtual layer allows multiple encoding entities to perform the encoding step concurrently;
setting a thread pool, and transmitting the concurrently executed coding tasks to the thread pool;
and when an idle thread exists in the thread pool, allocating an encoding task to the idle thread for execution.
Preferably, in the thread pool, the encoding task is inserted into a blocking queue, and the thread in the thread pool acquires the encoding task from the blocking queue.
Preferably, in the step of coding by using a shared memory, the code stream to be coded and the code stream after coding are transmitted among a plurality of processes;
wherein access to a given storage area between processes is mutually exclusive.
Preferably, a control lock is created to execute the mutually exclusive call;
when the bottom hardware is used for coding, different access layers need to firstly acquire an access area control lock to access and read a data stream to be coded, update coding parameters and close critical resources in the steps; and releasing the access area control lock after the access operation is completed.
Preferably, the memory sharing method further includes: when a process writes data into the shared memory area, other processes sharing the memory area can see the data;
access to a given memory area among the plurality of processes is mutually exclusive, including: when one process writes data to the shared memory area, other processes are prohibited from reading and writing the data before the process completes the write operation.
In another aspect, the present invention further provides a multi-process coding system based on events and a shared memory mechanism, where the system includes: the system comprises an upper application module, a virtual layer module and a bottom hardware module;
the upper application module is used for interacting with a user and calling interfaces corresponding to the four steps of reading a data stream to be coded, updating coding parameters, coding and closing in sequence, wherein the interfaces correspond to the coding entity;
the virtual layer module is used for managing the coding resources of the bottom layer hardware module according to the sequential calling, so as to realize multi-process coding;
the bottom hardware module is used for realizing operation and coding.
Preferably, the virtual layer module creates a control lock,
when the bottom hardware module executes coding, different processes need to firstly acquire an access area control lock to access and read the data stream to be coded, update coding parameters and close critical resources in the steps; and releasing the access area control lock after the access operation is completed.
Preferably, the system further comprises a storage module, the storage module comprising a shared access area;
when a process writes data into the shared memory area, other processes sharing the memory area can see the data;
when one process writes data to the shared memory area, other processes are prohibited from reading and writing the data before the process completes the write operation.
In still another aspect, the present invention further provides an event-based and shared-memory mechanism multi-process encoding apparatus, wherein the apparatus comprises at least one processor and at least one readable and writable storage device;
the read-write storage device comprises a shared access area; the shared access area satisfies: when a process writes data into the shared memory area, other processes sharing the memory area can see the data; when one process writes data into a shared memory area, other processes are prohibited from reading and writing the data before the process completes the write operation;
the readable and writable storage device stores instruction code that the processor calls to perform any of the methods described above.
Compared with the prior art, the technical scheme of the invention has the characteristics of high performance, high flux, high energy efficiency, easiness in maintenance, easiness in expansion, easiness in optimization and the like. The isolation between different upper-layer users is realized, the stability of a coding system is improved, multiple users can simultaneously and efficiently access the bottom-layer coding equipment, and the bottom-layer coding equipment realizes the simultaneous coding of multiple paths of videos by using a hardware mechanism, so that the performance of each coding equipment is fully exploited.
The specific implementation mode is as follows:
the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
It should be noted that the embodiments in all the embodiments of the present invention can be referred to each other or used in combination with each other, and each embodiment is only used as a preferred example, and unless explicitly stated in the present specification, the steps or embodiments of the technical solutions between the following embodiments can be invoked with each other or executed together.
Example 1
In a specific embodiment, an encoding entity is taken as an example of a video encoding object, and the implementation of the technical solution of the present invention is described in detail. It should be noted that although the data processing target in the present embodiment is video data, the data processing target to which the technical solution of the present invention is applied is not limited to video stream data, and may be applied to data processing of other multi-path encoding targets.
As shown in fig. 1, the life cycle of each path of coding entity is abstracted into four steps of reading a data stream to be coded, updating coding parameters, coding, and closing the path of video, and is packaged into four interfaces of open101, update102, encode103, and close 104.
The open101 is to open a path of encoded video, initialize and fill encoding parameters, where the parameters include video type h.264 or mpeg2, picture quality level profile, width and height information of each frame, whether the code rate mode is fixed encoding CBR or variable code rate VBR, video frame rate, number of slices (slice) included in each frame, inter-frame space (gop size), and the like. The parameter is set only once when encoding is initialized and is not modified during encoding.
The update102 is used for updating coding parameters, if a coding mode is required to be changed in the coding process, the coding parameters can be changed through the update, the parameters which can be changed include a pixel format, a coding image width, a coding image height, whether a current frame is coded into an I frame or not and a coding rate, the interface can be called for many times, the calling is carried out at least once when a coded video is initialized, and then the parameters can be set for many times.
The encode103 is used for calling the bottom layer device to encode, such as using a kiri core AI chip, a powerful and rich API, the solution supports multiple graphics platforms (graphics platforms), realizes general functions, can perform preprocessing, encoding and decoding, and conversion of different encoding formats on digital video, and can be used as the bottom layer encoding device to process video encoding.
close104 closes the video path after completing the encoding of the video path.
The calling flow charts of the four interfaces are shown in fig. 2, when encoding starts, an open interface 201 is called to open a data stream to be encoded, an update interface 202 is called at least once when an encoded video is initialized, then the update interface 202 is called again when encoding parameters are changed, if encoding parameters do not need to be changed, an encode interface 203 is called directly for encoding, and when encoding is completed, a close interface 204 is called to close the path of video.
When the coding is started, an open interface 201 is called to open a data stream to be coded, an update interface 202 is called at least once when a coded video is initialized, then the update interface 202 is called again when a coding parameter is changed, if the coding parameter does not need to be changed, an open coding interface 203 is called directly for coding, and when the coding is finished, a close interface 204 is called to close the path of video.
As shown in fig. 3, the overall framework of the system is deployed such that the underlying encoding device 301 and the user process 304 communicate with each other through a PCIe lane 302. The virtual layer 303 implementation allows multiple videos to be encoded encodings simultaneously between the PCIe channel 302 and the user process 304. The open interface, the update interface and the close interface are only called once in the whole encoding process, and the update interface only carries out a few encoding parameter transfers although being called for many times, so that the data transmission quantity of the three interfaces is very limited, one PCIe channel 302 can be shared, and the PCIe channel 302 can be accessed in a mutually exclusive locking mode in a mutually exclusive manner. The encode interface needs to transmit YUV data simultaneously with multiple videos and needs to return a BS code stream after encoding, so that one or more PCIe lanes 302 are independently used.
Fig. 4 is a detailed flowchart of the whole system, which divides the encoding step into four steps of reading a data stream to be encoded, updating encoding parameters, encoding, and closing the path of video, and encapsulates the four steps into four interfaces of open () 401, update () 402, encode () 403, and close () 404. The ith path of client process calls the interfaces of open401, update402, encode403 and close404 in sequence, the ith path of client process corresponds to one path of video coding, and the life cycle of the ith path of client process is open401, update402, encode403 and close404. Similarly, the life cycle of each video in the underlying coding service process is also open412, update413, encode414 and close415, and the open412, update413, encode414 and close415 interfaces are also required to be called sequentially.
The method requires that the interfaces of open412, update413 and close415 need to be called mutually exclusive among different paths of videos, and the implementation of the bottom layer such as a bottom layer coding driver determines that the resources used by the open412, the update413 and the close415 are critical resources, and the mutual exclusive access among different processes needs to be ensured.
The interfaces of the exclusive call open412, update413 and close415 are needed among different paths of videos, and for the requirement, an exclusive lock needs to be created for mutual exclusion access to critical resources among processes. The access lock 411 of the control area needs to be acquired between different paths of videos, and only when the lock is acquired, the critical resources in the open412, the update413 and the close415 can be accessed. After the close call ends when the encoding is complete, control region access lock 416 is released.
On the basis of ensuring that the same path of video calls the four interfaces in sequence and mutually exclusive access among different paths of videos, the concurrency among the processes is improved as much as possible. The bottom layer coding implementation allows multiple videos to be encoded at the same time by the encode414, and each time a new video open412 is generated, the bottom layer coding device creates a new thread dedicated to data interaction during coding with the client. Referring to fig. 4, the encoding process of the upper layer user process is as follows:
and step s401, the ith client process calls encode to transmit YUV data to the virtual layer service thread.
And step s402, triggering the ith video YUV writing completion event of the bottom layer coding device. And awakening the bottom layer coding equipment to carry out ith path video coding, and processing YUV by the bottom layer coding equipment.
And step s403, the client process enters a sleep state and waits for an encoding completion event of the ith video of the bottom layer encoding device.
And step s404, the thread of the ith video of the bottom layer coding device waits for a YUV write completion event of the ith client process, and if the event cannot be received, the thread of the ith video of the bottom layer coding device is blocked and then enters a dormant state, so that the resource consumption is reduced. And waiting until a YUV write completion event of the ith client process is received.
And step s405, waking up the ith video thread of the bottom layer coding equipment and carrying out coding processing.
And step s406, triggering an ith video coding write completion event after the coding is completed.
And step s407, the ith video coding completion event wakes up the ith client process in the dormancy, and the ith client process returns the BS code stream to the upper layer application to complete the coding of one frame of data.
The encoding between different paths of videos can be executed concurrently, so that one thread can be initiated for each path of video encoding.
The invention realizes the transmission of large blocks of data between the main process and the client process of the bottom coding service in a memory sharing mode, which is the fastest method for sharing data between processes and needs to be matched with process mutual exclusion and process synchronous use. The shared memory is divided according to the number of processes, and each process shares one memory, namely i-way exclusive BS/YUV data areas 406 and 407 … … … in fig. 4, because two processes access a given storage area, if one process is writing data to the shared memory area, the other process should not read or write the data before the process finishes the operation. A control area 405 is required to guarantee mutually exclusive access to each data area.
Example 2
This embodiment still uses video coding as an example to further illustrate the implementation principle of the present invention.
Embodiment mode 1: each way of the coding entity is abstracted. Abstracting the life cycle of each path of coding entity into four steps of reading a data stream to be coded, updating coding parameters, coding and closing the path of video, packaging the four steps into four interfaces of open (), update (), encode (), close (), calling the four interfaces in sequence by an upper layer user process, and managing bottom layer coding resources by a virtual layer according to the life cycle of each path.
The open interface is used for opening a path of coded video, and initializing and filling coding parameters, wherein the parameters comprise the video type of h.264 or mpeg2, the picture quality level profile, the width and height information of each frame, the code rate mode of fixed coding CBR or variable code rate VBR, the video frame rate, the number of slices (slice) contained in each frame, the size of inter-frame space (gop size) and the like. The parameter is set only once when encoding is initialized and is not modified during encoding.
The update interface is used for updating coding parameters, if an upper layer user process wants to change a coding mode in the coding process, the current coding parameters can be changed through the update, the parameters which can be changed comprise a pixel format, a coding image width, a coding image height, whether a current frame is coded into an I frame or not and a coding rate, the interface can be called for many times, the interface is called at least once when a coded video is initialized, and then the parameters can be set for many times.
The encode process is realized by the encode interface, according to the set encode parameters, the upper layer user process circularly calls the interface to encode, the virtual layer receives one frame of original data (YUV, RGB and the like) each time, encodes the original data into a corresponding BS code stream through the bottom layer encoding equipment, and transmits the encoded video code stream back to the user process.
closing the video after the video is coded, and releasing corresponding hardware coding resources by the virtual layer.
When the coding is started, an open interface is called to open a data stream to be coded, an update interface is called at least once when a coded video is initialized, then the update interface is called again when a coding parameter is changed, if the coding parameter does not need to be changed, the encode interface is called directly for coding, and when the coding is finished, a close interface is called to close the video.
The same video is required to be guaranteed to call open, update, encode and close interfaces in sequence, each user process corresponds to one video code, and the life cycle of each user process is open, update, encode and close. Similarly, the life cycle of each video in the underlying coding service process is open, update, encode and close, and it is also required that each video needs to guarantee that the interfaces of open, update, encode and close are called in sequence.
Embodiment mode 2: the method is characterized in that mutually exclusive calling open, update and close interfaces are required among different paths of videos, because the proportion of the three periods in each encoding life cycle is very small, the communication quantity with encoding hardware is small, in order to improve the utilization rate of a hardware channel and achieve high performance, the three stages with different encoding path numbers use one channel for transmission, and the implementation of the bottom layer such as a bottom layer encoding driver determines that the resources used by the open, update and close are critical resources, so that the channel resources can be mutually exclusive accessed in the three stages. Aiming at the requirement, the virtual layer needs to create a mutual exclusion lock for mutual exclusion access to critical resources among processes. The main process of the virtual layer can create corresponding encoding threads according to the number of user processes, each path of video corresponds to one thread, when encoding is carried out on the bottom layer encoding equipment, different processes need to acquire a control area access lock first, and only when the lock is taken, critical resources in open, update and close can be accessed. And releasing the control lock of the access area after the access operation is completed, so that other processes can access the critical resource.
In a specific embodiment 3, the synchronization among the processes is realized in an event mode, the producer process generates a data trigger event, and the consumer process is awakened to process the data. If the consumer process can not obtain the event, the consumer process is blocked and enters a dormant state, so that the process resource consumption is reduced.
On the basis of ensuring that the same path of video calls the four interfaces in sequence and mutually exclusive access among different paths of videos, the concurrency among the processes is improved as much as possible. The virtual layer implementation allows multiple videos to be encoded in encode at the same time. Each time a new video open is made, the virtual layer creates a new thread, which is dedicated to data interaction when encoding with the client. The process is as follows:
and calling the encode by the ith client process, preparing YUV data, triggering an ith video YUV writing completion event of the bottom layer encoding equipment after the YUV data of the ith client process is prepared, waking up the ith video encoding equipment to process YUV, and then enabling the ith video encoding equipment to enter a dormant state and waiting for the ith video encoding completion event of the bottom layer encoding equipment.
And the thread of the ith video of the bottom layer coding device waits for a YUV write completion event of the ith client process, and if the event cannot be received, the thread of the ith video of the bottom layer coding device is blocked and then enters a dormant state, so that the resource consumption is reduced. And when the YUV write completion event of the ith client process is received, waking up the ith video thread of the bottom coding equipment and carrying out coding processing, and triggering the ith video coding write completion event after the coding is completed.
And awakening the ith client process in the dormancy by the ith video coding write completion event, and applying a return BS code stream to the upper layer by the ith client process to complete the coding of the frame data.
Embodiment 4: the coding among different paths of videos can be executed concurrently, so that one thread can be initiated for each path of video coding, when the number of coding tasks to be processed is small, several threads can be created by the video coding to process the corresponding tasks, but when a large number of coding tasks exist, the problems of thread pool application are greatly relieved due to the fact that great expenses are needed for creating and destroying the threads, and in order to reduce the expenses and facilitate the realization of thread management, the invention manages a plurality of coding threads in a thread pool mode.
Instead of creating a new thread for each concurrently executing task, we can pass the concurrently executing coded tasks to a thread pool. As long as there are free threads in the pool, the encoding task is assigned to one thread for execution. Inside the thread pool, the encoding task is inserted into a Blocking Queue (Blocking Queue), and the thread in the thread pool will fetch the task in the Queue. When a new task is inserted into the queue, an idle thread will successfully fetch the task from the queue and execute it. Thread pools (Thread pools) are useful for limiting the number of threads that run at the same time in an application, because there is a corresponding performance overhead for each new Thread that starts, each Thread needs to allocate some memory to the stack, and so on.
Embodiment 5: the invention realizes the transmission of the code stream to be coded and the code stream after the coding is finished between the bottom coding service main process and the client process by a memory sharing mode, the memory sharing is a fastest method for sharing data between the processes, the access to the memory sharing area is as fast as the access to the memory area unique to the process, the access is finished without system call or other processes needing to cut into a kernel, and meanwhile, various unnecessary copies of the data are avoided. One process writes data to the shared memory area, and all processes sharing the memory area can see the content of the memory area at once. The shared memory is used, and attention is paid to mutual exclusion of access to a given storage area among a plurality of processes, so the mutual exclusion of the processes and synchronous use of the processes are required to be matched, because the processes access the given storage area, if one process is writing data to the shared memory area, other processes should not read and write the data before the process finishes the operation.
Example 3
In yet another embodiment, the present invention further provides a multi-process coding system based on events and a shared memory mechanism, which can execute the methods specifically described in embodiment 1 and embodiment 2. The system comprises: the system comprises an upper application module, a virtual layer module and a bottom hardware module;
the upper application module is used for interacting with a user and calling interfaces corresponding to the four steps of reading a data stream to be coded, updating coding parameters, coding and closing in sequence, wherein the interfaces correspond to the coding entity;
the virtual layer module is used for managing the coding resources of the bottom layer hardware module according to the sequential calling, so as to realize multi-process coding;
the bottom hardware module is used for realizing operation and coding.
Preferably, the virtual layer module creates a control lock,
when the bottom hardware module executes coding, different processes need to firstly acquire an access area control lock to access and read a data stream to be coded, update coding parameters and close critical resources in the steps; and releasing the access area control lock after the access operation is completed.
Preferably, the system further comprises a storage module, the storage module comprising a shared access area;
when a process writes data into the shared memory area, other processes sharing the memory area can see the content;
when one process writes data to the shared memory area, other processes are prohibited from reading and writing the data before the process completes the write operation.
Example 4
In another embodiment, the present invention further provides a multi-process encoding apparatus based on events and a shared memory mechanism, the apparatus comprising at least one processor and at least one readable and writable storage device;
the read-write storage device comprises a shared access area; the shared access area satisfies: when a process writes data into the shared memory area, other processes sharing the memory area can see the content; when one process writes data to a shared memory area, other processes are prohibited from reading and writing the data before the process completes the write operation;
the readable and writable storage device stores instruction codes which are called by the processor to execute the methods of embodiments 1 and 2.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium and executed by a computer to implement the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The present invention has been described with reference to the method, the block diagram, the single line diagram and the simulation diagram of the embodiments of the invention, the above description is only an embodiment of the invention, and the invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the invention disclosed herein are included in the protection scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.