WO2015131573A1

WO2015131573A1 - Method and device for producing image having sound, and computer storage medium

Info

Publication number: WO2015131573A1
Application number: PCT/CN2014/092391
Authority: WO
Inventors: 周江; 吴钊; 陈瑞
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-09-24
Filing date: 2014-11-27
Publication date: 2015-09-11
Also published as: CN105513103A

Abstract

Disclosed are a method and device for producing an image having sound. The method comprises: image information is collected; audio information of a collection environment in which the image information is located is collected; the image information, image linked information related to image information analysis, and the audio information are added in an audio file. Also disclosed is a computer storage medium.

Description

Method and device for making audio picture and computer storage medium

Technical field

The present invention relates to the field of information processing, and in particular, to a method and apparatus for making an audio picture and a computer storage medium.

Background technique

In daily life and work, many occasions need to use pictures to record visual information, but also use audio to record auditory information. This requires mixing pictures and sounds. Existing methods usually use video to achieve mixed storage of pictures and sounds. However, the amount of information in the video file is large, which makes storage and information sharing inconvenient.

Therefore, it is an urgent problem to be solved in the prior art to propose an audio picture capable of simultaneously recording visual information and auditory information while keeping the amount of data small.

Summary of the invention

In view of this, embodiments of the present invention are directed to a method and apparatus for making an audio picture, which can retain picture information and audio information in a recording and collection environment with a small amount of data.

In order to achieve the above object, the technical solution of the present invention is achieved as follows:

A first aspect of the embodiments of the present invention provides a method for making an audio picture, the method comprising:

Collect picture information;

Collecting audio information of the collection environment where the picture information is located;

The picture information, the picture associated information related to the picture information analysis, and the audio information are added to the audio file.

Based on the above scheme,

The picture association information includes at least an APIC tag and a MIME type;

Adding the picture information and the picture association information to the audio file, including:

Adding an APIC tag and a MIME type of the picture information to a tag of the audio file;

Adding the picture information to a tag of the audio file.

Based on the above scheme,

Adding the picture information and the picture associated information to the audio file, including

Determining, according to the picture information, the APIC tag, and the MIME type, the information length of the picture information and the picture association information;

The length of the information is added to the tag.

Based on the above scheme,

The method further includes:

Before adding the audio information to the audio file, the method further includes updating a label length of the label according to an information length of the label.

Based on the above scheme,

The audio information includes noise information;

The method further includes:

Before the audio information is added to the audio file, the noise information in the audio information is deleted according to a predetermined policy.

Based on the above scheme,

The noise information includes: a camera sound formed by the electronic device collecting the picture information and an environmental noise specified by the user.

A second aspect of the embodiments of the present invention provides a device for making an audio picture.

The device includes:

An image acquisition unit configured to collect picture information;

The audio collection unit is configured to collect audio information of the collection environment where the picture information is located;

The audio forming unit is configured to add the picture information, the picture associated information related to the picture information analysis, and the audio information to an audio file.

Based on the above scheme,

The audio forming unit is configured to add an APIC tag and a MIME type of the picture information in a tag of the audio file; and add the picture information to the tag.

Based on the above scheme,

The audio forming unit is configured to determine an information length of the picture information and the picture association information according to the picture information, the APIC tag, and a MIME type; and add the information length to the tag.

Based on the above scheme,

The audio forming unit is further configured to: before adding the audio information to the audio file, the method further comprising updating a label length of the label according to an information length of the label.

Based on the above scheme,

The audio information includes noise information;

The device also includes:

And a noise processing unit configured to delete noise information in the audio information according to a predetermined policy before adding the audio information to the audio file.

The third aspect of the embodiments of the present invention further provides a computer storage medium, where the computer storage medium stores computer executable instructions, where the computer executable instructions are used to execute at least one of the methods of the first aspect of the embodiments of the present invention. one.

The method and device for producing an audio picture and the computer storage medium in the embodiment of the present invention, the picture information and the audio information formed are collected to form an audio file carrying the picture information; and the picture information is carried in the audio file, so that Some can play a lot of audio players that carry picture information, and can output picture information and audio information at the same time. Frequency storage of picture information and audio information, which can reduce the amount of data, and when the picture information is stored in a label (specifically, such as an ID3 tag) of the audio file, forming an information format conforming to an existing audio file An audio file, the existing electronic device can output the audio file by using an audio application that can display picture information, and does not need to be installed on a terminal or platform with a specific algorithm, thereby avoiding the disadvantage that it is difficult to access due to a specific algorithm, thereby It has the advantages of strong compatibility with the prior art and good versatility.

DRAWINGS

1 is a schematic flow chart of a method for making an audio picture according to an embodiment of the present invention;

2 is a schematic flowchart of forming an audio file according to an embodiment of the present invention;

3 is a second schematic flowchart of a method for producing an audio picture according to an embodiment of the present invention;

4 is a schematic structural diagram of an apparatus for manufacturing an audio picture according to an embodiment of the present invention;

FIG. 5 is a second schematic structural diagram of an apparatus for making an audio picture according to an embodiment of the present invention; FIG.

FIG. 6 is a schematic flow chart of making an audio picture according to an example of the present invention.

detailed description

The preferred embodiments of the present invention are described in detail below with reference to the accompanying drawings.

Embodiment 1:

As shown in FIG. 1 , this embodiment provides a method for making an audio picture, where the method includes:

Step S110: collecting picture information;

Step S120: collecting audio information of the collection environment where the picture information is located;

Step S130: Add the picture information, the picture associated information related to the picture information analysis, and the audio information to the audio file.

The method described in this embodiment is applied to an electronic device carrying a camera function and an audio collection function; for example, a mobile phone or a tablet computer.

The electronic device synchronously collects the ambient sound of the current environment when the picture is collected, and the audio information is formed, so that the visual information (picture information) of the collection environment collected by the photo is synchronously recorded the audio information (audio). information).

In this embodiment, in order to solve the above problem, the audio information and the picture information collected by the electronic device are fused and stored in the audio file, so that the formed audio file can output both the audio information and the picture information when outputting, but relative to the video. Has the advantage of less information.

The audio file includes a tag and an audio data portion for outputting audio; the tag includes information associated with the audio information, such as information such as a singer name, an album name, a genre, and the like. The label is preferably an ID3 tag.

The ID3 tag is an integral part of an audio file, and is mainly used in the prior art to store some information associated with audio, information such as singer, title, album name, and chronological style of the audio, and the information is not audio of the audio information. Content; when the electronic device forms audio according to the audio content, the information can be output in the form of text and/or picture.

The audio information is added to the audio data portion; the picture information may be added to the audio data portion or may be added to the tag; preferably, the picture information and the picture associated information are added to the In the tag, the audio file thus formed conforms to the information format of the existing audio file, and is convenient for the electronic device to parse and output the audio file formed in the step S130 without providing a proprietary algorithm or a dedicated application when parsing and outputting the audio file.

Therefore, in the step S130, the method may include: adding the picture information and the picture association information to a label of the audio file; and adding the audio information to the audio data portion of the audio file. Specifically, the picture information and the picture associated information are written in the ID3 tag; the audio information is written in the audio data portion.

In the embodiment, the picture information is written in a tag of the audio file; and the picture association information of the picture information is recorded in the tag; the picture association information includes a MIME type; according to the MIME type, the electronic device can be parsed When outputting a picture, determine the information grid of the picture information Style and opening method, etc.

There are a plurality of types of tags, and are not limited to the ID3 tags described above; the related information of the ID3 tags is described in detail below. The ID3 tag includes ID3V1, ID3V2, and ID3V2.3; wherein the V1 represents version 1; the V2 represents version 2; and the V2.3 represents version 2.3. The ID3V1 is located at the end of the audio file; the ID3V2 is located at the beginning of the audio file. A picture of an audio file can be stored in an ID3 tag of ID3V2 or above ID3V2. Therefore, in the embodiment, the ID3 tag adopts a version of ID3V2 or ID3V2.3 or higher.

In this embodiment, the picture information collected by the electronic device is first merged into the audio file. While the audio file is being played, the electronic device reads the ID3 tag of the audio file, and outputs the picture information by parsing and outputting the ID3 tag.

In a specific implementation process, the audio file preferably uses an audio file in an mp3 or AAC format. The picture information is preferably a jpeg format picture. The jpeg format image has the advantages of compression ratio and realistic decompression effect. The use of mp3 format or AAC format, the same compression ratio and the degree of audio distortion after decompression, etc., can facilitate the processing and storage of information.

A specific application scenario of the method in this embodiment is provided below:

Specifically, User A takes a group photo with friends. The friends slogan “Egglet” and other slogans at the moment of taking pictures. User A wants to record this cheerful scene or share the cheerful scene with other friends. Obviously only The static information of photos or pictures, obviously the cheerful atmosphere will be halved, and the method described in this embodiment can synchronously collect picture information and environmental sounds, and form a friend who can open without installing a specific algorithm or application. By carrying the audio file with the picture information, you can feel the visual information and the auditory information of the friend A when the user A is currently taking a picture. The audio file formed by the method described in this embodiment is output. The audio file or the video application, such as a cool dog music application, may be used in the prior art.

As shown in FIG. 2, the step S130 may specifically include:

Step S131: adding an APIC tag tag and a MIME type of the picture information to the tag;

Step S132: Add the picture information to the tag.

Based on the above solution, the step S130 further includes step S133 and step S134:

Step S133: determining, according to the picture information and the APIC tag, the MIME type, the information length of the picture information and the picture association information;

The step S134 is: adding the information length to the label.

The steps S133 and S134 may be performed after the step S132, or may be performed before or in synchronization with the step S132; the method shown in FIG. 2 is not limited.

The APIC tag indicates that the current location in the audio file is followed by the picture information and the picture association information. In a specific implementation process, the APIC tag is followed by a tag length; the tag length is the step. The length of the information described in 133 and 134; the M bytes reserved in the ID3 tag to record the information length of the picture information and the picture associated information; typically the M is equal to 4.

After the picture information and the picture association information are added to the ID3 tag and the information length is determined, the reserved M bytes are updated according to the information length; when the electronic device outputs the audio file, according to The APIC tag and the length of the information know which of the ID3 tags are picture information; and which are associated information of the audio file, such as audio recording time and the like.

In a specific implementation, before the adding the picture information, determining the length of the information according to the step S133 and the step S134, adding the information length to the label before adding the picture information, is not limited to The sequence of processes shown in Figure 2.

The electronic device can know, according to the APIC tag and the length of the information, which bytes in the ID3 tag store the picture information and the picture associated information. In a specific implementation process, if multiple pictures are stored, the picture related information in the ID3 tag may further include information such as a picture type, a text encoding identifier, a memo string, and a frame flag, and the details of the information. You can refer to the information format of the existing ID3 tags, which will not be repeated here.

The picture association information includes at least a MIME type; the MIME type corresponds to one or more bytes in the ID3 tag, and the corresponding data type may be a character string. Specifically, the character string of the byte corresponding to the MIME is: jpeg; when the electronic device interprets the MIME type, knowing that the file format of the picture information is jpeg, thereby knowing which data format is used to parse and display the image information. In a specific implementation process, when the file format of the picture information is image, the MIME type is image.

Similarly, in order to output the audio file, the electronic device is convenient to determine which data belongs to the ID3 tag and which belongs to the audio content, and the ID3 tag also includes a tag length indicating the length of the ID3 tag information; The method described also includes:

Based on the above solution, the audio information includes noise information;

As shown in FIG. 3, before the step S130, the method further includes:

Step S121: deleting noise information in the audio information according to a predetermined policy.

The collected audio information may include some noise information that the user does not want. In the embodiment, noise filtering is also performed through the step S121 to delete the noise information, so that the information desired by the user can be saved.

Specifically, the noise information includes: a camera sound formed by the electronic device in collecting the picture information and an environmental noise specified by the user.

In the image acquisition, in order to prompt the user to complete the image acquisition, usually the electronic device will emit a sound similar to “咔嚓”. If the mute photo is taken, the user may not be able to accurately determine whether the photo is completed; if the prompt tone is retained, these The prompt tone will be collected into the audio information as an ambient sound; in this embodiment, the camera sound formed by the collected picture information can be removed by step S121; how to remove it can be adopted in the following manner:

Preserving the contrast sound corresponding to the photographing sound in advance;

Comparing the collected audio information with the contrast sound to form a comparison result;

According to the different result, the deletion of the information in the audio information that meets the preset condition with the difference of the comparison sounds achieves the purpose of removing the camera sound.

When the user takes a picture, there may be other unexpected noise disturbances, such as the long beeping sound of the car that has just been turned on, the railroad sound of the train traveling, the noise caused by the construction machinery movement of the construction site, and other user-specified environmental noise.

When the specified ambient noise is removed, the collected audio information may be compared with the sound sample to delete the sound; in a specific implementation process, after the comparison result is formed, the comparison result may also be used. The prompt information is generated and output, and according to the input of the user based on the prompt information, whether the photo sound or the environmental noise needs to be deleted is determined.

In the specific use process, there is a user taking pictures along the railroad track, and want to keep the railroad track sound. After comparing with the noise sample, it is found that there is a railroad sound in the audio information. If it is directly regarded as the specified ambient noise, it may obviously be deleted. It is not the environmental sound that the user wants. Therefore, it is also possible to inform the user through the text or audio, whether the track sound is currently detected, whether to delete the track sound as the environmental noise deletion prompt; the user indicates by inputting confirmation or cancel input. The specific operation of the electronic device.

In summary, the embodiment provides a method for making an audio picture, which carries the picture information in the label of the audio information, has strong compatibility with the prior art, and has the advantages of universality; the electronic device does not need to install the specified application. It can simultaneously output the picture information and audio information collected separately.

Embodiment 2:

As shown in FIG. 4, the embodiment provides a device for making an audio picture.

The device includes:

The image collection unit 110 is configured to collect picture information.

The audio collection unit 120 is configured to collect audio information of the collection environment where the picture information is located;

The audio forming unit 130 is configured to add the picture information, the picture associated information related to the picture information analysis, and the audio information to an audio file.

The device may be an electronic device such as a mobile phone including a camera and a recording function, and a tablet computer.

The image acquisition unit 110 may specifically include a camera component or the like that can perform image acquisition. The audio collection unit 120 may include a recorder or the like.

The audio forming unit 130 may include a processor and a storage medium; the processor may be an electronic device such as a central processing unit CPU, a microprocessor MCU, a digital signal processor DSP, and a programmable processor PLC. The storage medium is for storing the formed audio file. In a specific implementation, the storage medium may also be used to store the picture information, audio information, and the like.

Based on the above scheme, the audio forming unit 130 is configured to write an APIC tag and a MIME type of the picture information in a tag of the audio file (such as an ID3 tag); and add the picture information to the tag. Here, it is defined that the audio forming unit 130 specifically writes the picture information and the picture associated information in the audio file, and the writing in the label can well match the information format of the existing audio file, so there is no need to pass a specific An algorithm or application to parse the audio file has the advantage of being versatile.

In addition, the audio forming unit 130 is further configured to determine an information length of the picture information and the picture association information according to the picture information and the APIC tag and a MIME type; and add the information length in the tag .

The specific content of the MIME type, the APIC tag, and the tag can be referred to in Embodiment 1, and will not be repeated here. The label is preferably an ID3 tag.

Based on the above solution, the audio forming unit 130 is further configured to: before adding the audio information to the audio file, the method further comprises: updating the location according to the information length of the label The label length of the label.

In the embodiment, the audio forming unit 130 distinguishes the picture information and other information in the label according to the length of the picture information; and writes the label length in the label, and uses the label of the electronic device when outputting the audio file. Audio information; facilitates the output of subsequent audio information.

Based on the above solution, the audio information includes noise information;

As shown in FIG. 5, the device further includes:

The noise processing unit 140 is configured to delete the noise information in the audio information according to a predetermined policy before adding the audio information to the audio file.

The specific structure of the noise processing unit 140 may include various types of processors, and the specific structure may also be a structure such as a noise processor in the prior art.

In the embodiment, the noise processing unit 140 is added, and the noise in the audio file can be filtered, and only the environmental sound desired by the user can be retained, thereby improving the intelligence of the electronic device and the satisfaction of the user.

The device in the embodiment provides the implementation hardware for the method in the first embodiment, and is used to form an audio file including picture information when taking a picture by taking a voice, instead of requiring a special application decoding output in the prior art. The image file including audio information improves versatility and compatibility.

A specific application example is provided in conjunction with the embodiment of the present invention: as shown in FIG. 6, the method includes:

Step 1: Using a smartphone camera function to generate a jpeg picture; and simultaneously recording an MP3 file; in the specific implementation process, the jpeg picture (ie, the picture information above) and the MP3 file (ie, the above) are also formed. After the audio information is output, the prompt information is output to prompt the user to simultaneously collect the picture information and the audio information, so that the electronic device can perform the subsequent steps after receiving the operation performed by the user based on the prompt.

Step 2: Build the ID3V2.3 information of the new MP3 file and locate the end of the ID3 tag.

Step 3: Add information such as APIC tag, text encoding identifier, MIME type, image type, and remarks to the ID3 tag.

Step 4: Open the jpeg picture, write its picture data to the data area of the ID3 tag, and update the tag length of ID3;

Step 5: Write the audio data in the MP3 file in step 1 to the MP3 file in which the picture data is written.

The embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions are used to execute at least one of the methods of the embodiments of the present invention, such as 1 At least one of FIG. 2 and FIG.

The computer storage medium may be a removable storage device, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program code. The computer storage medium can be selected as a non-transitory storage medium.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed. In addition, the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.

The units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; You can choose which one according to your actual needs. Some or all of the units implement the objectives of the embodiment of the present embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; the above integration The unit can be implemented in the form of hardware or in the form of hardware plus software functional units.

A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to the program instructions. The foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing storage device includes the following steps: the foregoing storage medium includes: a mobile storage device, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk. A medium that can store program code.

The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and modifications made in accordance with the principles of the present invention should be understood as falling within the scope of the present invention.

Claims

A method of making an audio picture, the method comprising:

Collect picture information;

Collecting audio information of the collection environment where the picture information is located;

The picture information, the picture associated information related to the picture information analysis, and the audio information are added to the audio file.
The method of claim 1 wherein

The picture association information includes at least an APIC tag and a MIME type;

Adding the picture information and the picture association information to the audio file, including:

Adding an APIC tag and a MIME type of the picture information to a tag of the audio file;

Adding the picture information to a tag of the audio file.
The method of claim 2, wherein

Adding the picture information and the picture associated information to the audio file, including

Determining, according to the picture information, the APIC tag, and the MIME type, the information length of the picture information and the picture association information;

The length of the information is added to the tag.
The method of claim 3, wherein

The method further includes:

Before adding the audio information to the audio file, the method further includes updating a label length of the label according to an information length of the label.
The method according to any one of claims 1 to 4, wherein

The audio information includes noise information;

The method further includes:

Before the audio information is added to the audio file, the noise information in the audio information is deleted according to a predetermined policy.
The method of claim 5, wherein

The noise information includes: a camera sound formed by the electronic device collecting the picture information and an environmental noise specified by the user.
A device for making a sound picture, the device comprising:

An image acquisition unit configured to collect picture information;

The audio collection unit is configured to collect audio information of the collection environment where the picture information is located;

The audio forming unit is configured to add the picture information, the picture associated information related to the picture information analysis, and the audio information to an audio file.
The apparatus according to claim 7, wherein

The picture association information includes at least an APIC tag and a MIME type;

The audio forming unit is configured to add an APIC tag and a MIME type of the picture information in a tag of the audio file; and add the picture information to the tag.
The device according to claim 8, wherein

The audio forming unit is configured to determine an information length of the picture information and the picture association information according to the picture information, the APIC tag, and a MIME type; and add the information length to the tag.
The device according to claim 8, wherein

The audio forming unit is further configured to: before adding the audio information to the audio file, the method further comprising updating a label length of the label according to an information length of the label.
A device according to any one of claims 7 to 10, wherein

The audio information includes noise information;

The device also includes:

a noise processing unit configured to add the audio information to the audio file before The noise information in the audio information is deleted according to a predetermined policy.
A computer storage medium having stored therein computer executable instructions for performing at least one of the methods of claims 1 to 6.