CN114245134A

CN114245134A - Audio and video data generation method, device, equipment and computer readable medium

Info

Publication number: CN114245134A
Application number: CN202010938191.1A
Authority: CN
Inventors: 罗林坡; 王明君; 刘明顺; 熊宇
Original assignee: Sichuan Wanwang Xincheng Mdt Infotech Ltd
Current assignee: Sichuan Wanwang Xincheng Mdt Infotech Ltd
Priority date: 2020-09-09
Filing date: 2020-09-09
Publication date: 2022-03-25
Anticipated expiration: 2040-09-09
Also published as: CN114245134B

Abstract

The embodiment of the disclosure discloses an audio and video data generation method, an audio and video data generation device, audio and video data generation equipment and a computer readable medium. One embodiment of the method comprises: acquiring audio and a preset image; transforming a preset image to generate a transformed image; generating a video based on the transformed image; and synthesizing the audio and the video to generate audio and video data. The implementation method avoids the problems of equipment failure and the like caused by the fact that a real vehicle-mounted monitoring environment is built by using various kinds of equipment.

Description

Audio and video data generation method, device, equipment and computer readable medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to an audio and video data generation method, an audio and video data generation device, audio and video data generation equipment and a computer readable medium.

Background

With the development of wireless network construction and the development of video technology, the operation of acquiring the audio and video data of vehicle-mounted monitoring by using the wireless network becomes a very convenient industrial application. The related vehicle-mounted monitoring audio and video data acquisition method needs to build a real vehicle environment, utilizes various hardware devices to acquire a vehicle-mounted monitoring audio and video data stream, then carries out packet capturing from the vehicle-mounted monitoring video data stream, and intercepts the required audio and video stream. And finally, storing or using the obtained audio and video stream.

However, when the above-mentioned manner is adopted to acquire audio/video data, the following technical problems often exist:

firstly, acquiring audio and video data, and using multiple devices to build a real vehicle-mounted monitoring environment to cause the problems of device failure and the like;

secondly, images included in the audio and video data stream need to be detected and then acquired, so that a large amount of time is consumed for generating the audio and video data.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose audio-video data generation methods, apparatuses, devices and computer readable media to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide an audio and video data generation method, including: acquiring audio and a preset image; transforming the preset image to generate a transformed image; generating a video based on the transformed image; and synthesizing the audio and the video to generate audio and video data.

In a second aspect, some embodiments of the present disclosure provide an apparatus for generating audio-visual data, the apparatus comprising: an acquisition unit configured to acquire an audio and a preset image; a first generating unit configured to transform the preset image to generate a transformed image; a second generation unit configured to generate a video based on the converted image; and the synthesis unit is configured to synthesize the audio and the video to generate audio and video data.

In a third aspect, some embodiments of the present disclosure provide an apparatus comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method as described in the first aspect.

In a fourth aspect, some embodiments of the disclosure provide a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method as described in the first aspect.

The above embodiments of the present disclosure have the following advantages: first, audio and a preset image are acquired. Since only arbitrary audio and one preset image are acquired, the audio and the preset image used to generate the audio-visual data can be simply obtained. Then, the preset image is transformed to generate a transformed image. Transforming the image may be transforming the preset image into a desired image such that the transformed image is ready for subsequent generation of video. Then, a video can be generated using the transformed image. Since the transformed image is the desired image processed by the transformation, the video generated from the transformed image can inherit the characteristics of the transformed desired image, so that the generated video can be directly used without being processed again according to the characteristics of the desired video. Therefore, the steps of processing the whole video are saved, and equipment for processing the video is not needed. And finally, synthesizing the audio and the video to generate audio and video data. Because the audio and video data can be quickly generated through a preset image and audio, the required audio and video data do not need to be intercepted from a large amount of video streams generated by a real vehicle-mounted environment. Furthermore, a real vehicle-mounted monitoring environment and various devices required for building the real environment are not required. Therefore, the problems of equipment failure and the like caused by the fact that a real vehicle-mounted monitoring environment is built by using various kinds of equipment are solved.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a schematic view of an application scenario of an audio-video data generation method according to some embodiments of the present disclosure;

fig. 2 is a flow diagram of some embodiments of an audio-visual data generation method according to the present disclosure;

fig. 3 is a schematic block diagram of some embodiments of an audiovisual data generation arrangement in accordance with the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to the audio-video data generation method of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is a schematic diagram of an application scenario of an audio-video data generation method according to some embodiments of the present disclosure.

In the application scenario of fig. 1, first the computing device 101 may obtain audio 102 and a preset image 103. The computing device 101 may then transform the preset image 103 to generate a transformed image 104. Further, a video 105 is generated based on the converted image 104. Finally, the audio 102 and the video 105 are synthesized to generate audio/video data 106. Optionally, the computing device 101 may store and send the audio/video data 106 to the service terminal 107.

The computing device 101 may be hardware or software. When the computing device is hardware, it may be implemented as a distributed cluster composed of multiple servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices enumerated above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

It should be understood that the number of computing devices in FIG. 1 is merely illustrative. There may be any number of computing devices, as implementation needs dictate.

With continued reference to fig. 2, a flow 200 of some embodiments of an audiovisual data generation method in accordance with the present disclosure is shown. The audio and video data generation method comprises the following steps:

step 201, acquiring an audio and a preset image.

In some embodiments, an executing subject of the audio-video data generation method (for example, the computing device 101 shown in fig. 1) may acquire the audio and the preset image through a wired connection manner or a wireless connection manner. The audio may be audio of any content. The preset image may be a preset image of the surroundings of the vehicle captured by the on-board monitoring.

Step 202, transforming the preset image to generate a transformed image.

In some embodiments, based on the preset image obtained in step 201, the executing entity may transform the preset image in various ways to generate a transformed image.

As an example, the image transformation may be to perform a rotation operation on the preset image, resulting in a rotation-transformed image. For example: the rotation angle may be 90 degrees. Then, the rotation-transformed image becomes an image rotated clockwise by 90 degrees.

In some optional implementation manners of some embodiments, the executing body may transform the preset image to generate a transformed image, and may include the following steps:

in a first step, a predetermined timestamp is determined.

As an example, the predetermined time stamp may be a predetermined time point. For example: "2020.02.20".

And secondly, performing superposition transformation on the preset time stamp and the preset image to generate a transformed image. For example, the predetermined timestamp may be superimposed on the preset image to generate a transformed image.

In some optional implementation manners of some embodiments, the performing main body performs superposition transformation on the preset image and the preset timestamp to generate a transformed image, and may include the following steps:

firstly, determining coordinate values of a preset transformation position in the preset image. Wherein the coordinate value may be a predetermined position coordinate value selected from a preset image.

As an example, the coordinate values of the predetermined transformation position may be: (10, 20).

And secondly, superposing the preset timestamp to the preset image by using the position coordinate value to generate a transformed image. Specifically, the leftmost end of the predetermined timestamp image is superimposed on the preset image according to the preset conversion position, and a converted image is generated.

Step 203, generating a video based on the transformed image.

In some embodiments, the execution subject of the audio-video data generation method may perform secondary transformation on the transformed image to generate a plurality of secondary transformed images. Then, the plurality of secondary transformation images are fused to generate a video.

In some optional implementations of some embodiments, the executing subject generates a video based on the transformed image. May include the steps of:

first, a predicted image sequence is generated based on the transformed image.

As an example, the transformed image may be plane-rotated clockwise. The image is saved every rotation of one angle. Then, a plurality of rotated images can be obtained. As a predicted image sequence.

And a second step of fusing the transformed image with each predicted image in the predicted image sequence to generate a video.

As an example, the transformed image and the prediction images in the prediction image sequence are merged together in the order in which the prediction images in the prediction image sequence are generated, so that a video format in which images can be continuously played can be generated, thereby obtaining a video.

In some optional implementations of some embodiments, the performing main body generating a predicted image sequence based on the transformed image may include:

in a first step, the transformed image is segmented to generate a sequence of segmented images. Specifically, the dividing of the transformed image may be dividing the image into a plurality of sub-images having the same size.

As an example, there may be multiple pixel blocks per sub-image. The segmented images generate a sequence of segmented images in the order of segmentation.

And secondly, coding each segmented image in the segmented image sequence to generate a coded image, so as to obtain a coded image sequence.

As an example, encoding each segmented image may be an ordered encoding of individual blocks of pixels in each segmented image. And finally, obtaining the coded image sequence.

And thirdly, generating a predicted image sequence based on the coded image sequence.

As an example, pixel prediction may be performed according to each pixel block of each encoded image, and the size of the pixel value may be changed to generate a corresponding prediction pixel point set. And then generating a predicted image by using the predicted pixel point set.

In some optional implementations of some embodiments, the executing body generating the predicted image sequence based on the encoded image sequence may include:

firstly, determining the number of each pixel point in the encoded image and the pixel value corresponding to each pixel point. For example, the number of pixel points in the encoded image may be: 100. the pixel value of the first pixel point in the encoded image may be: 55.

secondly, performing predictive transformation on pixel values corresponding to each pixel point in each coded image in the coded image sequence to generate a predicted pixel value sequence by using the following formula, so as to obtain a predicted pixel value sequence set:

where a denotes the predicted sequence of pixel values.

d represents a predicted pixel value in the sequence of predicted pixel values.

n denotes the number of pixel values in the predicted sequence of pixel values.

d_nRepresenting the nth pixel value in the predicted sequence of pixel values.

p represents the pixel value of a pixel point in the encoded image.

And l represents the l-th pixel point in the coded image.

p_lAnd expressing the pixel value of the ith pixel point in the coded image.

And M represents the number of each pixel point in the coded image.

p_minAnd representing the minimum pixel value corresponding to each pixel point in the coded image.

A predictor variable representing an nth pixel value in the predicted sequence of pixel values. Specifically, the value range of the predictor variable of the pixel value in the predicted pixel value sequence may be: 1 to 4.

And thirdly, combining the predicted pixel values in each predicted pixel value sequence in the set of predicted pixel value sequences to generate a predicted image, and obtaining a predicted image sequence.

As an example, the predicted image may be generated by sequentially arranging and combining the predicted pixel values in the predicted pixel value sequence from left to right and from top to bottom in the generation order corresponding to each predicted pixel point in each predicted pixel point set.

The above formula is used as an invention point of the embodiment of the present disclosure, thereby solving the second technical problem mentioned in the background art, that is, images included in audio and video data in the audio and video data stream need to be detected and then acquired, so that a large amount of time is consumed for generating the audio and video data.

First, since the maximum value of the pixel value may be 255, the range of the predictor variable of each pixel value in the predicted pixel value sequence in the above formula may be: 1 to 4. The predictor variable for each pixel value in the set of predicted pixel values then varies with the size of the range of the pixel value. Therefore, the range of pixel value generation can be limited. Then, since the generation range of the predicted pixel value is limited, the total change between the generated prediction image and the encoded image can be within a controllable range. The generated predictive image may not need to be detected. The steps of image detection are reduced. The predictive image is generated by adding a predictive variable to the pixel value of each pixel point in the encoded image, and the predictive variable can be generated rapidly according to the pixel value in a controllable range. The desired predicted image can be obtained quickly. Thus, a large amount of time consumed to generate the audio-video can be reduced.

And 204, synthesizing the audio and the video to generate audio and video data.

In some embodiments, the execution subject may fuse the audio and the video to generate audio and video data. Specifically, the audio can be fused into the video according to the frame rate to generate audio and video data.

In some optional implementation manners of some embodiments, the synthesizing, by the execution main body, the audio and the video to generate audio and video data may include the following steps:

firstly, determining the frame number and the frame rate of the video.

As an example, the frame number may be the number of images played by the video per second, and the frame rate may be the frequency at which the video is played on the display.

And secondly, dividing the audio to generate the audio to be synthesized based on the frame number and the frame rate of the video.

As an example, the frame number of the video may be divided by the frame rate to obtain the duration of the video, and then the audio may be divided into the audio to be synthesized with the same duration.

And thirdly, combining the audio to be synthesized and the video to generate audio and video data.

As an example, audio to be synthesized with the same duration is merged into a video to obtain data that can be played simultaneously, thereby generating audio-video data. The audio and video data may be data meeting JT1078 (standard 1078 protocol) format requirements.

Optionally, the audio and video data is stored and sent to the service terminal.

In some embodiments, the execution subject may store the audio and video data and then send the audio and video data to the server. The format of the audio and video data storage can be various formats.

By way of example, the format of the audio-video data storage may be a text format, a binary format, a string format, and the like.

The above embodiments of the present disclosure have the following advantages: first, audio and a preset image are acquired. Since only arbitrary audio and one preset image are acquired, the audio and the preset image used to generate the audio-visual data can be simply obtained. Then, the preset image is transformed to generate a transformed image. Transforming the image may be transforming the preset image into a desired image such that the transformed image is ready for subsequent generation of video. Then, a video can be generated using the transformed image. Since the transformed image is the desired image processed by the transformation, the video generated from the transformed image can inherit the characteristics of the transformed desired image, so that the generated video can be directly used without being processed again according to the characteristics of the desired video. Therefore, the steps of processing the whole video are saved, and equipment for processing the video is not needed. And finally, synthesizing the audio and the video to generate audio and video data. Because the audio and video data can be quickly generated through a preset image and audio, the required audio and video data do not need to be intercepted from a large amount of video streams generated by a real vehicle-mounted environment. Furthermore, a real vehicle-mounted monitoring environment and various devices required by the building environment are not required. Therefore, the problems of equipment failure and the like caused by the fact that a real vehicle-mounted monitoring environment is built by using various kinds of equipment are solved.

With further reference to fig. 3, as an implementation of the above method for the above figures, the present disclosure provides some embodiments of an audio-video data generation apparatus, which correspond to those of the method embodiments described above in fig. 2, and which may be applied in various electronic devices.

As shown in fig. 3, the audio-visual data generation apparatus 300 of some embodiments includes: an acquisition unit 301, a first generation unit 302, a second generation unit 303, and a synthesis unit 304. Wherein, the obtaining unit 301 is configured to obtain an audio and a preset image; a first generating unit 302 configured to transform the preset image to generate a transformed image; a second generating unit 303 configured to generate a video based on the converted image; and a synthesizing unit 304 configured to synthesize the audio and the video to generate audio and video data.

Referring now to FIG. 4, a block diagram of an electronic device (e.g., computing device 101 of FIG. 1)400 suitable for use in implementing some embodiments of the present disclosure is shown. The server shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 4, electronic device 400 may include a processing device (e.g., central processing unit, graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage device 408 into a Random Access Memory (RAM) 403. In the RAM403, various programs and data necessary for the operation of the electronic apparatus 400 are also stored. The processing device 401, the ROM 402, and the RAM403 are connected to each other via a bus 404. An input/output (I/O) interface 404 is also connected to bus 404.

Generally, the following devices may be connected to the I/O interface 404: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 408 including, for example, tape, hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device 400 to communicate wirelessly or by wire with other devices to exchange data. While fig. 4 illustrates an electronic device 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 4 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 409, or from the storage device 408, or from the ROM 402. The computer program, when executed by the processing apparatus 401, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described above in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the apparatus; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring audio and a preset image; transforming the preset image to generate a transformed image; generating a video based on the transformed image; and synthesizing the audio and the video to generate audio and video data.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a first generation unit, a second generation unit, and a third generation unit. The names of these units do not in some cases constitute a limitation on the unit itself, and for example, the acquisition unit may also be described as a "unit that acquires the acquired audio and the preset image".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the technical method may be formed by replacing the above-mentioned features with (but not limited to) technical features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A method for generating audio and video data, comprising:

Get audio and preset images;

transforming the preset image to generate a transformed image;

based on the transformed image, generating a video;

The audio and the video are synthesized to generate audio and video data.

2. The method of claim 1, wherein the method further comprises:

The audio and video data are stored and sent to the service terminal.

3. The method of claim 2, wherein the transforming the preset image to generate a transformed image comprises:

determine a predetermined timestamp;

The predetermined time stamp and the preset image are superimposed and transformed to generate a transformed image.

4. The method according to claim 3, wherein the superimposing and transforming the predetermined timestamp and the preset image to generate a transformed image, comprising:

determining the coordinate value of the predetermined transformation position in the preset image;

Using the coordinate values, the predetermined timestamp is superimposed on the preset image to generate a transformed image.

5. The method of claim 4, wherein the generating a video based on the transformed image comprises:

based on the transformed image, generating a sequence of predicted images;

The transformed image is fused with each predicted image in the predicted image sequence to generate a video.

6. The method according to claim 5, wherein the audio and the video are synthesized to generate audio and video data, comprising:

determine the frame number and frame rate of the video;

based on the frame number and frame rate of the video, segment the audio to generate audio to be synthesized;

The audio to be synthesized and the video are combined to generate audio and video data.

7. The method of claim 6, wherein the generating a sequence of predicted images based on the transformed image comprises:

segmenting the transformed image to generate a sequence of segmented images;

Encoding each segmented image in the segmented image sequence to generate an encoded image to obtain an encoded image sequence;

Based on the encoded sequence of images, a sequence of predicted images is generated.

8. The method of claim 7, wherein the generating a predicted image sequence based on the encoded image sequence comprises:

Determine the number of each pixel in the encoded image and the pixel value corresponding to each pixel;

Using the following formula, perform predictive transformation on the pixel value corresponding to each pixel point in each encoded image in the encoded image sequence to generate a predicted pixel value sequence, and obtain a predicted pixel value sequence set:

Among them, A represents the predicted pixel value sequence;

d represents the predicted pixel value in the predicted pixel value sequence;

n represents the sequence number of the pixel value in the predicted pixel value sequence;

d _n represents the nth pixel value in the predicted pixel value sequence;

p represents the pixel value of the pixel in the encoded image;

l represents the lth pixel in the encoded image;

p _l represents the pixel value of the lth pixel in the encoded image;

M represents the number of each pixel in the encoded image;

p _min represents the minimum pixel value corresponding to each pixel in the encoded image;

A predictor representing the nth pixel value in the predicted sequence of pixel values.

Combining the respective predicted pixel values in each predicted pixel value sequence in the predicted pixel value sequence set to generate a predicted image to obtain a predicted image sequence.

9. A device for generating audio and video data, comprising:

an acquisition unit configured to acquire audio and preset images;

a first generating unit configured to transform the preset image to generate a transformed image;

a second generating unit configured to generate a video based on the transformed image;

The synthesis unit is configured to synthesize the audio and the video to generate audio and video data.

10. An electronic device comprising:

one or more processors;

a storage device on which one or more programs are stored,

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.