CN114245134A - Audio and video data generation method, device, equipment and computer readable medium - Google Patents
Audio and video data generation method, device, equipment and computer readable medium Download PDFInfo
- Publication number
- CN114245134A CN114245134A CN202010938191.1A CN202010938191A CN114245134A CN 114245134 A CN114245134 A CN 114245134A CN 202010938191 A CN202010938191 A CN 202010938191A CN 114245134 A CN114245134 A CN 114245134A
- Authority
- CN
- China
- Prior art keywords
- image
- audio
- generate
- sequence
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 10
- 230000001131 transforming effect Effects 0.000 claims abstract description 10
- 230000009466 transformation Effects 0.000 claims description 14
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 238000003786 synthesis reaction Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 abstract description 11
- 238000010586 diagram Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 4
- 238000013500 data storage Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/182—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Processing (AREA)
Abstract
The embodiment of the disclosure discloses an audio and video data generation method, an audio and video data generation device, audio and video data generation equipment and a computer readable medium. One embodiment of the method comprises: acquiring audio and a preset image; transforming a preset image to generate a transformed image; generating a video based on the transformed image; and synthesizing the audio and the video to generate audio and video data. The implementation method avoids the problems of equipment failure and the like caused by the fact that a real vehicle-mounted monitoring environment is built by using various kinds of equipment.
Description
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to an audio and video data generation method, an audio and video data generation device, audio and video data generation equipment and a computer readable medium.
Background
With the development of wireless network construction and the development of video technology, the operation of acquiring the audio and video data of vehicle-mounted monitoring by using the wireless network becomes a very convenient industrial application. The related vehicle-mounted monitoring audio and video data acquisition method needs to build a real vehicle environment, utilizes various hardware devices to acquire a vehicle-mounted monitoring audio and video data stream, then carries out packet capturing from the vehicle-mounted monitoring video data stream, and intercepts the required audio and video stream. And finally, storing or using the obtained audio and video stream.
However, when the above-mentioned manner is adopted to acquire audio/video data, the following technical problems often exist:
firstly, acquiring audio and video data, and using multiple devices to build a real vehicle-mounted monitoring environment to cause the problems of device failure and the like;
secondly, images included in the audio and video data stream need to be detected and then acquired, so that a large amount of time is consumed for generating the audio and video data.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Some embodiments of the present disclosure propose audio-video data generation methods, apparatuses, devices and computer readable media to solve one or more of the technical problems mentioned in the background section above.
In a first aspect, some embodiments of the present disclosure provide an audio and video data generation method, including: acquiring audio and a preset image; transforming the preset image to generate a transformed image; generating a video based on the transformed image; and synthesizing the audio and the video to generate audio and video data.
In a second aspect, some embodiments of the present disclosure provide an apparatus for generating audio-visual data, the apparatus comprising: an acquisition unit configured to acquire an audio and a preset image; a first generating unit configured to transform the preset image to generate a transformed image; a second generation unit configured to generate a video based on the converted image; and the synthesis unit is configured to synthesize the audio and the video to generate audio and video data.
In a third aspect, some embodiments of the present disclosure provide an apparatus comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method as described in the first aspect.
In a fourth aspect, some embodiments of the disclosure provide a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method as described in the first aspect.
The above embodiments of the present disclosure have the following advantages: first, audio and a preset image are acquired. Since only arbitrary audio and one preset image are acquired, the audio and the preset image used to generate the audio-visual data can be simply obtained. Then, the preset image is transformed to generate a transformed image. Transforming the image may be transforming the preset image into a desired image such that the transformed image is ready for subsequent generation of video. Then, a video can be generated using the transformed image. Since the transformed image is the desired image processed by the transformation, the video generated from the transformed image can inherit the characteristics of the transformed desired image, so that the generated video can be directly used without being processed again according to the characteristics of the desired video. Therefore, the steps of processing the whole video are saved, and equipment for processing the video is not needed. And finally, synthesizing the audio and the video to generate audio and video data. Because the audio and video data can be quickly generated through a preset image and audio, the required audio and video data do not need to be intercepted from a large amount of video streams generated by a real vehicle-mounted environment. Furthermore, a real vehicle-mounted monitoring environment and various devices required for building the real environment are not required. Therefore, the problems of equipment failure and the like caused by the fact that a real vehicle-mounted monitoring environment is built by using various kinds of equipment are solved.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
Fig. 1 is a schematic view of an application scenario of an audio-video data generation method according to some embodiments of the present disclosure;
fig. 2 is a flow diagram of some embodiments of an audio-visual data generation method according to the present disclosure;
fig. 3 is a schematic block diagram of some embodiments of an audiovisual data generation arrangement in accordance with the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device according to the audio-video data generation method of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 is a schematic diagram of an application scenario of an audio-video data generation method according to some embodiments of the present disclosure.
In the application scenario of fig. 1, first the computing device 101 may obtain audio 102 and a preset image 103. The computing device 101 may then transform the preset image 103 to generate a transformed image 104. Further, a video 105 is generated based on the converted image 104. Finally, the audio 102 and the video 105 are synthesized to generate audio/video data 106. Optionally, the computing device 101 may store and send the audio/video data 106 to the service terminal 107.
The computing device 101 may be hardware or software. When the computing device is hardware, it may be implemented as a distributed cluster composed of multiple servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices enumerated above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.
It should be understood that the number of computing devices in FIG. 1 is merely illustrative. There may be any number of computing devices, as implementation needs dictate.
With continued reference to fig. 2, a flow 200 of some embodiments of an audiovisual data generation method in accordance with the present disclosure is shown. The audio and video data generation method comprises the following steps:
In some embodiments, an executing subject of the audio-video data generation method (for example, the computing device 101 shown in fig. 1) may acquire the audio and the preset image through a wired connection manner or a wireless connection manner. The audio may be audio of any content. The preset image may be a preset image of the surroundings of the vehicle captured by the on-board monitoring.
In some embodiments, based on the preset image obtained in step 201, the executing entity may transform the preset image in various ways to generate a transformed image.
As an example, the image transformation may be to perform a rotation operation on the preset image, resulting in a rotation-transformed image. For example: the rotation angle may be 90 degrees. Then, the rotation-transformed image becomes an image rotated clockwise by 90 degrees.
In some optional implementation manners of some embodiments, the executing body may transform the preset image to generate a transformed image, and may include the following steps:
in a first step, a predetermined timestamp is determined.
As an example, the predetermined time stamp may be a predetermined time point. For example: "2020.02.20".
And secondly, performing superposition transformation on the preset time stamp and the preset image to generate a transformed image. For example, the predetermined timestamp may be superimposed on the preset image to generate a transformed image.
In some optional implementation manners of some embodiments, the performing main body performs superposition transformation on the preset image and the preset timestamp to generate a transformed image, and may include the following steps:
firstly, determining coordinate values of a preset transformation position in the preset image. Wherein the coordinate value may be a predetermined position coordinate value selected from a preset image.
As an example, the coordinate values of the predetermined transformation position may be: (10, 20).
And secondly, superposing the preset timestamp to the preset image by using the position coordinate value to generate a transformed image. Specifically, the leftmost end of the predetermined timestamp image is superimposed on the preset image according to the preset conversion position, and a converted image is generated.
In some embodiments, the execution subject of the audio-video data generation method may perform secondary transformation on the transformed image to generate a plurality of secondary transformed images. Then, the plurality of secondary transformation images are fused to generate a video.
In some optional implementations of some embodiments, the executing subject generates a video based on the transformed image. May include the steps of:
first, a predicted image sequence is generated based on the transformed image.
As an example, the transformed image may be plane-rotated clockwise. The image is saved every rotation of one angle. Then, a plurality of rotated images can be obtained. As a predicted image sequence.
And a second step of fusing the transformed image with each predicted image in the predicted image sequence to generate a video.
As an example, the transformed image and the prediction images in the prediction image sequence are merged together in the order in which the prediction images in the prediction image sequence are generated, so that a video format in which images can be continuously played can be generated, thereby obtaining a video.
In some optional implementations of some embodiments, the performing main body generating a predicted image sequence based on the transformed image may include:
in a first step, the transformed image is segmented to generate a sequence of segmented images. Specifically, the dividing of the transformed image may be dividing the image into a plurality of sub-images having the same size.
As an example, there may be multiple pixel blocks per sub-image. The segmented images generate a sequence of segmented images in the order of segmentation.
And secondly, coding each segmented image in the segmented image sequence to generate a coded image, so as to obtain a coded image sequence.
As an example, encoding each segmented image may be an ordered encoding of individual blocks of pixels in each segmented image. And finally, obtaining the coded image sequence.
And thirdly, generating a predicted image sequence based on the coded image sequence.
As an example, pixel prediction may be performed according to each pixel block of each encoded image, and the size of the pixel value may be changed to generate a corresponding prediction pixel point set. And then generating a predicted image by using the predicted pixel point set.
In some optional implementations of some embodiments, the executing body generating the predicted image sequence based on the encoded image sequence may include:
firstly, determining the number of each pixel point in the encoded image and the pixel value corresponding to each pixel point. For example, the number of pixel points in the encoded image may be: 100. the pixel value of the first pixel point in the encoded image may be: 55.
secondly, performing predictive transformation on pixel values corresponding to each pixel point in each coded image in the coded image sequence to generate a predicted pixel value sequence by using the following formula, so as to obtain a predicted pixel value sequence set:
where a denotes the predicted sequence of pixel values.
d represents a predicted pixel value in the sequence of predicted pixel values.
n denotes the number of pixel values in the predicted sequence of pixel values.
dnRepresenting the nth pixel value in the predicted sequence of pixel values.
p represents the pixel value of a pixel point in the encoded image.
And l represents the l-th pixel point in the coded image.
plAnd expressing the pixel value of the ith pixel point in the coded image.
And M represents the number of each pixel point in the coded image.
pminAnd representing the minimum pixel value corresponding to each pixel point in the coded image.
A predictor variable representing an nth pixel value in the predicted sequence of pixel values. Specifically, the value range of the predictor variable of the pixel value in the predicted pixel value sequence may be: 1 to 4.
And thirdly, combining the predicted pixel values in each predicted pixel value sequence in the set of predicted pixel value sequences to generate a predicted image, and obtaining a predicted image sequence.
As an example, the predicted image may be generated by sequentially arranging and combining the predicted pixel values in the predicted pixel value sequence from left to right and from top to bottom in the generation order corresponding to each predicted pixel point in each predicted pixel point set.
The above formula is used as an invention point of the embodiment of the present disclosure, thereby solving the second technical problem mentioned in the background art, that is, images included in audio and video data in the audio and video data stream need to be detected and then acquired, so that a large amount of time is consumed for generating the audio and video data.
First, since the maximum value of the pixel value may be 255, the range of the predictor variable of each pixel value in the predicted pixel value sequence in the above formula may be: 1 to 4. The predictor variable for each pixel value in the set of predicted pixel values then varies with the size of the range of the pixel value. Therefore, the range of pixel value generation can be limited. Then, since the generation range of the predicted pixel value is limited, the total change between the generated prediction image and the encoded image can be within a controllable range. The generated predictive image may not need to be detected. The steps of image detection are reduced. The predictive image is generated by adding a predictive variable to the pixel value of each pixel point in the encoded image, and the predictive variable can be generated rapidly according to the pixel value in a controllable range. The desired predicted image can be obtained quickly. Thus, a large amount of time consumed to generate the audio-video can be reduced.
And 204, synthesizing the audio and the video to generate audio and video data.
In some embodiments, the execution subject may fuse the audio and the video to generate audio and video data. Specifically, the audio can be fused into the video according to the frame rate to generate audio and video data.
In some optional implementation manners of some embodiments, the synthesizing, by the execution main body, the audio and the video to generate audio and video data may include the following steps:
firstly, determining the frame number and the frame rate of the video.
As an example, the frame number may be the number of images played by the video per second, and the frame rate may be the frequency at which the video is played on the display.
And secondly, dividing the audio to generate the audio to be synthesized based on the frame number and the frame rate of the video.
As an example, the frame number of the video may be divided by the frame rate to obtain the duration of the video, and then the audio may be divided into the audio to be synthesized with the same duration.
And thirdly, combining the audio to be synthesized and the video to generate audio and video data.
As an example, audio to be synthesized with the same duration is merged into a video to obtain data that can be played simultaneously, thereby generating audio-video data. The audio and video data may be data meeting JT1078 (standard 1078 protocol) format requirements.
Optionally, the audio and video data is stored and sent to the service terminal.
In some embodiments, the execution subject may store the audio and video data and then send the audio and video data to the server. The format of the audio and video data storage can be various formats.
By way of example, the format of the audio-video data storage may be a text format, a binary format, a string format, and the like.
The above embodiments of the present disclosure have the following advantages: first, audio and a preset image are acquired. Since only arbitrary audio and one preset image are acquired, the audio and the preset image used to generate the audio-visual data can be simply obtained. Then, the preset image is transformed to generate a transformed image. Transforming the image may be transforming the preset image into a desired image such that the transformed image is ready for subsequent generation of video. Then, a video can be generated using the transformed image. Since the transformed image is the desired image processed by the transformation, the video generated from the transformed image can inherit the characteristics of the transformed desired image, so that the generated video can be directly used without being processed again according to the characteristics of the desired video. Therefore, the steps of processing the whole video are saved, and equipment for processing the video is not needed. And finally, synthesizing the audio and the video to generate audio and video data. Because the audio and video data can be quickly generated through a preset image and audio, the required audio and video data do not need to be intercepted from a large amount of video streams generated by a real vehicle-mounted environment. Furthermore, a real vehicle-mounted monitoring environment and various devices required by the building environment are not required. Therefore, the problems of equipment failure and the like caused by the fact that a real vehicle-mounted monitoring environment is built by using various kinds of equipment are solved.
With further reference to fig. 3, as an implementation of the above method for the above figures, the present disclosure provides some embodiments of an audio-video data generation apparatus, which correspond to those of the method embodiments described above in fig. 2, and which may be applied in various electronic devices.
As shown in fig. 3, the audio-visual data generation apparatus 300 of some embodiments includes: an acquisition unit 301, a first generation unit 302, a second generation unit 303, and a synthesis unit 304. Wherein, the obtaining unit 301 is configured to obtain an audio and a preset image; a first generating unit 302 configured to transform the preset image to generate a transformed image; a second generating unit 303 configured to generate a video based on the converted image; and a synthesizing unit 304 configured to synthesize the audio and the video to generate audio and video data.
Referring now to FIG. 4, a block diagram of an electronic device (e.g., computing device 101 of FIG. 1)400 suitable for use in implementing some embodiments of the present disclosure is shown. The server shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 4, electronic device 400 may include a processing device (e.g., central processing unit, graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage device 408 into a Random Access Memory (RAM) 403. In the RAM403, various programs and data necessary for the operation of the electronic apparatus 400 are also stored. The processing device 401, the ROM 402, and the RAM403 are connected to each other via a bus 404. An input/output (I/O) interface 404 is also connected to bus 404.
Generally, the following devices may be connected to the I/O interface 404: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 408 including, for example, tape, hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device 400 to communicate wirelessly or by wire with other devices to exchange data. While fig. 4 illustrates an electronic device 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 4 may represent one device or may represent multiple devices as desired.
In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 409, or from the storage device 408, or from the ROM 402. The computer program, when executed by the processing apparatus 401, performs the above-described functions defined in the methods of some embodiments of the present disclosure.
It should be noted that the computer readable medium described above in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the apparatus; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring audio and a preset image; transforming the preset image to generate a transformed image; generating a video based on the transformed image; and synthesizing the audio and the video to generate audio and video data.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a first generation unit, a second generation unit, and a third generation unit. The names of these units do not in some cases constitute a limitation on the unit itself, and for example, the acquisition unit may also be described as a "unit that acquires the acquired audio and the preset image".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the technical method may be formed by replacing the above-mentioned features with (but not limited to) technical features having similar functions disclosed in the embodiments of the present disclosure.
Claims (10)
1. An audio and video data generation method comprises the following steps:
acquiring audio and a preset image;
transforming the preset image to generate a transformed image;
generating a video based on the transformed image;
and synthesizing the audio and the video to generate audio and video data.
2. The method of claim 1, wherein the method further comprises:
and storing the audio and video data and sending the audio and video data to a service terminal.
3. The method of claim 2, wherein transforming the preset image to generate a transformed image comprises:
determining a predetermined timestamp;
and performing superposition transformation on the preset timestamp and the preset image to generate a transformed image.
4. The method of claim 3, wherein the subjecting the predetermined timestamp to an overlay transformation with the preset image to generate a transformed image comprises:
determining coordinate values of a preset transformation position in the preset image;
and superposing the preset timestamp to the preset image by using the coordinate value to generate a transformed image.
5. The method of claim 4, wherein said generating a video based on said transformed image comprises:
generating a predicted image sequence based on the transformed image;
and fusing the transformed image with each predicted image in the predicted image sequence to generate a video.
6. The method of claim 5, wherein said synthesizing the audio with the video to generate audio-visual data comprises:
determining the frame number and the frame rate of the video;
based on the frame number and the frame rate of the video, the audio is divided to generate audio to be synthesized;
and combining the audio to be synthesized and the video to generate audio and video data.
7. The method according to claim 6, wherein said generating a sequence of predicted images based on said transformed images comprises:
segmenting the transformed image to generate a sequence of segmented images;
coding each segmented image in the segmented image sequence to generate a coded image, so as to obtain a coded image sequence;
and generating a predicted image sequence based on the coded image sequence.
8. The method according to claim 7, wherein said generating a sequence of predicted pictures based on said sequence of encoded pictures comprises:
determining the number of each pixel point in the encoded image and the pixel value corresponding to each pixel point;
performing predictive transformation on pixel values corresponding to each pixel point in each encoded image in the encoded image sequence to generate a predicted pixel value sequence, using the following formula, to obtain a predicted pixel value sequence set:
wherein a represents a predicted sequence of pixel values;
d represents a predicted pixel value in the sequence of predicted pixel values;
n represents the number of pixel values in the predicted sequence of pixel values;
dnrepresenting the nth pixel value in the predicted sequence of pixel values;
p represents the pixel value of a pixel point in the coded image;
l represents the l-th pixel point in the coded image;
plexpressing the pixel value of the ith pixel point in the coded image;
m represents the number of each pixel point in the coded image;
pminrepresenting the minimum pixel value corresponding to each pixel point in the coded image;
indication deviceAnd the predictor variable of the nth pixel value in the measured pixel value sequence.
And combining the predicted pixel values in each predicted pixel value sequence in the set of predicted pixel value sequences to generate a predicted image, resulting in a predicted image sequence.
9. An audio-visual data generating apparatus comprising:
an acquisition unit configured to acquire an audio and a preset image;
a first generation unit configured to transform the preset image to generate a transformed image;
a second generation unit configured to generate a video based on the transformed image;
and the synthesis unit is configured to synthesize the audio and the video to generate audio and video data.
10. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010938191.1A CN114245134B (en) | 2020-09-09 | Audio/video data generation method, device, equipment and computer readable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010938191.1A CN114245134B (en) | 2020-09-09 | Audio/video data generation method, device, equipment and computer readable medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114245134A true CN114245134A (en) | 2022-03-25 |
CN114245134B CN114245134B (en) | 2024-10-29 |
Family
ID=
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102595139A (en) * | 2012-03-01 | 2012-07-18 | 大连理工大学 | Mobile-phone PDA direct broadcasting system based on android |
TW201332364A (en) * | 2011-11-03 | 2013-08-01 | Panasonic Corp | Efficient rounding for deblocking |
US20140232822A1 (en) * | 2013-02-21 | 2014-08-21 | Pelican Imaging Corporation | Systems and methods for generating compressed light field representation data using captured light fields, array geometry, and parallax information |
CN107085842A (en) * | 2017-04-01 | 2017-08-22 | 上海讯陌通讯技术有限公司 | The real-time antidote and system of self study multiway images fusion |
CN107295326A (en) * | 2017-06-06 | 2017-10-24 | 南京巨鲨显示科技有限公司 | A kind of 3D three-dimensional video-frequencies method for recording |
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW201332364A (en) * | 2011-11-03 | 2013-08-01 | Panasonic Corp | Efficient rounding for deblocking |
CN102595139A (en) * | 2012-03-01 | 2012-07-18 | 大连理工大学 | Mobile-phone PDA direct broadcasting system based on android |
US20140232822A1 (en) * | 2013-02-21 | 2014-08-21 | Pelican Imaging Corporation | Systems and methods for generating compressed light field representation data using captured light fields, array geometry, and parallax information |
CN107085842A (en) * | 2017-04-01 | 2017-08-22 | 上海讯陌通讯技术有限公司 | The real-time antidote and system of self study multiway images fusion |
CN107295326A (en) * | 2017-06-06 | 2017-10-24 | 南京巨鲨显示科技有限公司 | A kind of 3D three-dimensional video-frequencies method for recording |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110413812B (en) | Neural network model training method and device, electronic equipment and storage medium | |
CN110809189B (en) | Video playing method and device, electronic equipment and computer readable medium | |
CN111459364B (en) | Icon updating method and device and electronic equipment | |
CN112752118B (en) | Video generation method, device, equipment and storage medium | |
CN115209215B (en) | Video processing method, device and equipment | |
CN111754600B (en) | Poster image generation method and device and electronic equipment | |
CN116527748B (en) | Cloud rendering interaction method and device, electronic equipment and storage medium | |
KR20220149574A (en) | 3D video processing method, apparatus, readable storage medium and electronic device | |
CN112418249A (en) | Mask image generation method and device, electronic equipment and computer readable medium | |
CN111669476B (en) | Watermark processing method, device, electronic equipment and medium | |
CN112241744B (en) | Image color migration method, device, equipment and computer readable medium | |
JP2023538825A (en) | Methods, devices, equipment and storage media for picture to video conversion | |
CN116760992B (en) | Video encoding, authentication, encryption and transmission methods, devices, equipment and media | |
CN111258582B (en) | Window rendering method and device, computer equipment and storage medium | |
WO2023138468A1 (en) | Virtual object generation method and apparatus, device, and storage medium | |
CN115834918B (en) | Video live broadcast method and device, electronic equipment and readable storage medium | |
CN111815508A (en) | Image generation method, device, equipment and computer readable medium | |
CN116248889A (en) | Image encoding and decoding method and device and electronic equipment | |
CN114125485B (en) | Image processing method, device, equipment and medium | |
CN114245134B (en) | Audio/video data generation method, device, equipment and computer readable medium | |
CN112070888B (en) | Image generation method, device, equipment and computer readable medium | |
CN112418233B (en) | Image processing method and device, readable medium and electronic equipment | |
CN114245134A (en) | Audio and video data generation method, device, equipment and computer readable medium | |
CN113705386A (en) | Video classification method and device, readable medium and electronic equipment | |
CN111738899B (en) | Method, apparatus, device and computer readable medium for generating watermark |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |