CN112422808A

CN112422808A - Method and device for acquiring photos and processing media objects

Info

Publication number: CN112422808A
Application number: CN201910785236.3A
Authority: CN
Inventors: 郑凯方
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-08-23
Filing date: 2019-08-23
Publication date: 2021-02-26
Anticipated expiration: 2039-08-23
Also published as: CN112422808B

Abstract

The application discloses a method and a device for acquiring a photo and processing a media object, wherein the method for acquiring the photo comprises the following steps: collecting audio data in the process of collecting first media data; the first media data comprises photo data; generating second media data according to the collected audio data; establishing an incidence relation between the first media data and the second media data; generating a target object according to the first media data, the second media data and the incidence relation; and when the target object is operated, outputting the first media data and the second media data according to the corresponding relation. By the method, the photo content can be enriched, the interestingness of photo application is increased, and the richness of photo application is increased.

Description

Method and device for acquiring photos and processing media objects

Technical Field

The present invention relates to the field of media applications, and in particular, to a method and an apparatus for acquiring a photo and processing a media object.

Background

With the popularization of portable terminal devices such as smart phones, the functional applications brought to users by the software and the hardware of the smart terminal devices are more and more abundant, the shooting of photos is an important application frequently used by the users, and the users can shoot various interested target objects such as people, objects, scenery and the like at any time and any place through the shooting function provided by the commonly-held portable smart terminal devices. Similar to the development process of many other device functions, the function of taking pictures provided by the intelligent terminal device also undergoes the development process from simple one to complex and diversified, and through the detailed development of the device functions in multiple directions, different shooting requirements of various different application scenes of a user can be better met, and the use requirements of the shooting functions are increasingly refined.

In the development of hardware, the hardware specification of a camera carried by the terminal equipment is higher and higher, and components and functions of the camera, such as pixels, photosensitive elements, anti-shake and zoom control and the like, are developed towards a stronger direction, so that a guarantee is provided for obtaining a picture with higher imaging quality; by matching with an increasingly mature algorithm and later-stage image-repairing software, the quality of the obtained photo image is obviously improved, and meanwhile, the application of the photographing function is enriched. In the prior art, the quality of a photo image is improved and the richness of photographing application is improved mainly through the two main directions, namely the directions of improving the function configuration of hardware equipment and enriching the software processing means. In fact, the purpose of enriching the application of the photos can be achieved in other ways.

Disclosure of Invention

The embodiment of the invention provides a method and a device for acquiring a photo and processing a media object, which can enrich the content of the photo, increase the interestingness of photo application and increase the richness of photo application.

The invention provides the following scheme:

an acquisition method of acquiring a photograph, comprising:

collecting audio data in the process of collecting first media data; the first media data comprises photo data;

generating second media data according to the collected audio data;

establishing an incidence relation between the first media data and the second media data;

generating a target object according to the first media data, the second media data and the incidence relation; and when the target object is operated, outputting the first media data and the second media data according to the corresponding relation.

A processing method of a media object, the media object is generated based on first media data and second media data; the first media data comprise photo data, and the second media data are generated according to audio data collected in the photo data collection process; the method comprises the following steps:

providing a first operation option for operating the media object;

and when an operation request for loading the media object is received through the first operation option, loading the photo data and the second media data, displaying the photo data and playing corresponding audio data content.

A panoramic photograph, comprising:

first media data; the first media data comprises image data of a panoramic photograph;

second media data; the second media data are generated according to the audio data synchronously acquired in the process of acquiring the first media data; and outputting the first media data and the second media data when the panoramic photo is operated.

An apparatus for obtaining a photograph, comprising:

the audio data acquisition unit is used for acquiring audio data in the process of acquiring the first media data; the first media data comprises photo data;

the second media data generating unit is used for generating second media data according to the collected audio data;

an association relationship establishing unit, configured to establish an association relationship between the first media data and the second media data;

the target object generating unit is used for generating a target object according to the first media data, the second media data and the incidence relation; and when the target object is operated, outputting the first media data and the second media data according to the corresponding relation.

A processing device of a media object, the media object being generated based on first media data and second media data; the first media data comprise photo data, and the second media data are generated according to audio data collected in the photo data collection process; the device comprises:

an operation option providing unit, configured to provide a first operation option for operating the media object;

and the object loading and displaying unit is used for loading the photo data and the second media data, displaying the photo data and playing corresponding audio data contents when receiving an operation request for loading the media object through the first operation option.

An electronic device, comprising:

one or more processors; and

a memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform operations comprising:

generating second media data according to the collected audio data;

According to the specific embodiments provided herein, the present application discloses the following technical effects:

according to the method, the audio data can be collected in the process of collecting the photo data, the second media data can be generated according to the collected audio data, the second media data can be audio content obtained by the audio data, and after the incidence relation between the audio data and the second media data is established, the target object can be generated according to the first media data, the second media data and the incidence relation; and when the target object is operated, outputting the first media data and the second media data according to the corresponding relation. The obtained photo combines the photo data and the content generated based on the audio data when the photo data is collected, the scene information when the photo is shot is reflected through the second media data, the information related to the scene when the photo is shot is integrated into the target object together with the photo content, and particularly, for the photo types with relatively long shooting time, such as panoramic photos or continuous photos, rich synchronous audio content can be collected. Compared with media contents such as videos and the like, the target object produced by the method has the advantage of being relatively light in weight. When the target object is output based on the obtained target object, the information related to the scene of the photo at the moment of shooting can be obtained, the photo content is enriched, the interestingness of photo application is increased, and the richness of photo application is increased.

Of course, it is not necessary for any product to achieve all of the above-described advantages at the same time for the practice of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic diagram of an image and a second media data storage according to an embodiment of the present application;

FIG. 2 is a flow chart of a first method provided by an embodiment of the present application;

FIG. 3 is a diagram illustrating the correspondence between sub-photos and audio data;

FIG. 4 is a flow chart of a second method provided by embodiments of the present application;

fig. 5 is a schematic diagram of a switching operation option of the second media data content;

FIG. 6 is a schematic diagram of a first apparatus provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a second apparatus provided by an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.

The functions of taking pictures provided by portable terminal devices such as smart phones and the like are undergoing detailed development in multiple directions so as to better meet different shooting requirements of different application scenes and increasingly detailed use requirements of the shooting functions. The mainstream direction of the existing related technology development is that the hardware specification changes, such as the improvement of the number of camera pixels, photosensitive elements, and cooperative cameras; the other direction is the links of image acquisition and processing, such as adopting a better imaging algorithm, applying photo processing software with rich functions and the like. The method provided by the embodiment of the application aims to enrich the photo content and the application scene of photo shooting through another way, and in the process of shooting and displaying the photo, the interestingness of photo acquisition and display is improved through a media type enriching mode, so that the purpose of enriching the application scene of photo shooting application is achieved.

In order to achieve the above purpose, the method provided by the application introduces second media data obtained according to audio data on the basis of photos, and can obtain a photo form of 'rich media' through the combined application of the photos and the second media data. The second media data may be obtained from the captured audio data. Compared with video, the acquisition and processing of audio data are easier to realize, and the processing such as audio recording, encoding and storage is lighter. Moreover, the second media data obtained based on the audio data can be processed and output in a superposition mode on the basis of the photo content, the photo is more emphasized to be represented while certain photo shooting scene information is represented, the photo obtaining and displaying processes are all processed by taking the photo data as the center, and the main body of the photo can be further represented in the photo application. Whereas if the video content is taken as the second media data, the more likely processing is to be output in a switched manner due to the nature of the video medium itself, the photo itself being easily swamped by the video information. The target object is generated by combining the photo image and the second media data generated based on the audio data, and the target object may have different processing modes according to actual application requirements, for example, in a storage mode, the photo image and the corresponding second media data may be stored respectively, and the association relationship between the photo image and the corresponding second media data may be embodied in a certain mode. For example, 11 and 12 in fig. 1, 11 in fig. 1 is an image file corresponding to the photo data, and 12 in fig. 1 is a file corresponding to the second media data in audio form, and the two are identified by similar file numbers. A new file format may also be developed to accommodate the photo image and the second media data, that is, the photo data and the second media data are stored in the same file, and the contents of each part of the file are parsed and loaded by a processing tool for the file format. Such a storage may be, for example, a file 13 in fig. 1, where the file 13 includes the photo data 14 and the second media 15, and the photo data 14 and the second media 15 may be viewed as different blocks or tracks of the file 13. Of course, in practical application, the method is not limited to the two examples. The method and apparatus for acquiring a photo, the method and apparatus for processing a media object provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Example one

An embodiment of the present application provides a method for obtaining a photo, and referring to fig. 2, as a flowchart of the method, the method for obtaining a photo may include the following steps:

s210: collecting audio data in the process of collecting first media data; the first media data comprises photo data;

firstly, audio data can be collected during the process of collecting first media data; wherein the first media data comprises photo data. When the photo data and the audio data are collected through the portable terminal device, such as a smart phone, the collection of the photo data and the audio data can be realized through the corresponding functional components, for example, the photo data can be collected through shooting of a camera component, and the audio data can be collected through a microphone component of the smart phone. The audio data is usually synchronized audio data during the process of capturing the photo data, so as to represent the scene of the photo at the time of taking the photo through the captured audio data or data obtained from the audio data. In practical application, the acquisition of the "rich media" photo can be realized by providing an application APP for taking a photo and by using a functional module of the photo application or the photo application. After the application or the application function module is started, audio data can be synchronously acquired in the process of taking a picture to acquire the photo data.

The acquisition of the photos is based on the captured photo data as a core and is realized based on the photo data and the second media data, wherein the photo data can be of different photo types, for example, a common photo, a 3D photo, a panoramic photo, or a photo group composed of a plurality of sub-photos, and the like. The panoramic photos, the continuous photos and the like are shot for multiple times in the acquisition process, the acquisition and processing process of the rich media photos based on the photos can have a more flexible processing mode, the characteristics of the method can be reflected, and the subsequent contents are introduced mainly based on the photos of the type. The audio data can directly represent information related to the scene at the time of shooting as a synchronous medium of the scene at the time of shooting, and the audio data or information generated based on the audio data and the like can be stored together with the photo data as the second media data, for example, in the same folder, or the second media data can be embedded into the photo data to form a single file. When the display is carried out, the photo can be displayed, and the scene information when the photo is shot can be obtained by analyzing the second media data.

When the type of the photo is a panoramic photo or a continuously shot photo, the photo data may be treated as one photo group, and the photo group includes a plurality of sub-photos, except that the panoramic photo acquires a plurality of photos based on the shooting position during the shooting process, but the photos are finally spliced into one photo, so that the sub-photos may be regarded as logical division relative to the panoramic photo in the panoramic photo, for example, during specific treatment, the logical division of the sub-photos in the panoramic photo may be determined according to the sequentially shot photo data, or the sub-photos in the panoramic photo divided directly in the panoramic photo may be determined directly. The sub-photos in the photo group comprise sub-photos obtained by dividing the panoramic photo, and each sub-photo corresponds to a corresponding part of the generated panoramic photo. While the continuous photo itself is composed of a plurality of photos, the sub-photos in a group of continuous photos are usually stored as different files, and thus can be regarded as a physical partition (relative to the logical partition of the panoramic photo). Such as a panoramic photograph or a burst photograph, whose photograph data may include a group of photographs including at least two sub-photographs. The sub-photos can be physically divided and stored as different sub-photo files, such as the photo group 310 shown in fig. 3, which includes a plurality of independent

sub-photo files

311 and 315, and the

sub-photos

311 and 315 constitute a photo group; the group of photos may also be a logical partition, such as dividing a plurality of portions on a panorama photo, dividing the photo into a plurality of logical portions as sub-photos, and these divided sub-photos constitute a group of photos, for example, the portions identified by the division of the panorama photo 340 into the

sub-photos

341 and 345 and 341 and 345 in fig. 3 may be respectively used as sub-photos of the panorama photo and form a group of photos.

When the photo data of the photo group is collected, shooting needs to be carried out for a plurality of times, and different implementation modes can be provided when the audio data in the photo shooting process is collected. The first way may be that, in the process of collecting the whole group of photos, the uninterrupted audio data of the whole process is collected, obviously, the collected audio corresponds to the whole group of photos, and the sub-photos and the audio data are in a many-to-one relationship. For example, the audio data 320 in fig. 3 is the above mentioned uninterrupted audio data, the

sub-photos

311 and 315 in the photo group 310 may all correspond to the audio data 310, or the

sub-photos

341 and 345 in the panorama photo may all correspond to the audio data 310. The second way may be that the audio data corresponding to the sub-photos are collected when the sub-photos are taken, and thus the collected audio data may be in the form of a plurality of audio segments, for example, the

audio segment

331 and 335 included in the audio data 330 shown in fig. 3, and each audio segment corresponds to a corresponding sub-photo, for example, the

audio segment

331 and 335 may correspond to the

sub-photo

311 and 315 in the photo group 310, or correspond to the

sub-photo

341 and 345 in the panorama photo, respectively. The captured audio data includes a plurality of audio segments, where each audio segment corresponds to one or more sub-photographs of a group of photographs.

S220: generating second media data according to the collected audio data;

after the audio data in the process of obtaining the photo data is collected, second media data can be generated according to the collected audio data. The second media data may be audio content obtained according to the audio data, may directly use the acquired audio data as the second media data, may also be processed according to the acquired audio data to obtain audio or other types of media data as the second media data, or may determine a combination of different types of media data obtained based on the audio data as the second media data. For example, voice recognition may be performed on the collected audio data, text information corresponding to the audio data may be determined, and the determined text information may be determined as the second media data. Or the audio content may be determined according to the acquired audio data, and the acquired audio data is subjected to speech recognition to determine text information corresponding to the audio data, and the second media data is determined based on the audio content and the corresponding text information, that is, the combination of different types of media data is determined as the second media data.

The second media data may also include other types of content, for example, the second media data may include the result of image recognition of the photo data, such as recognizing whether the image contains a specific person or object, such as a cell phone owner, a car or an animal, and adding corresponding information to the photo based on the result of image recognition, so as to increase the interest and applicability of the photo. During specific implementation, image recognition can be performed on the acquired photo data, and an image recognition result of the photo data is determined; the result of the image recognition is added to the second media data. In addition, the geographical position and/or weather state information in the process of acquiring the photo data can be acquired, and the geographical position and/or weather state information when the photo is taken is added into the second media data, so that the photo can be displayed based on the information when the photo is used, for example, the geographical position and weather state information in the process of acquiring the photo data are displayed above the photo content.

The second media data may be implemented based on the commentary of the photographer. For example, in the process of taking a panoramic photo by a photographer, a plurality of pictures are taken and spliced to form the panoramic photo, and taking the panoramic photo is a longer process compared with taking a common photo. The shooting commentary audio data of the photographer can be collected in the process of collecting the photo data, and then second media data relevant to the shooting commentary is generated according to the shooting commentary audio data. Therefore, when the picture is viewed, the corresponding explanation audio is played, so that more scene contents during shooting can be known, and the interestingness of the picture is increased.

The second media data may also be obtained or changed after the photo data is determined, for example, an operation entry may be provided in the photographing application or the photo viewing application to obtain or change the second media data corresponding to the photo data. Therefore, the corresponding audio content can be obtained after the photo data are obtained, and even the audio data can be collected for many times, so that a more satisfactory target object can be obtained.

In addition, when generating the second media data based on the collected audio data, the collected audio data may be preprocessed, effective audio may be extracted from the collected audio data, and the second media data may be generated based on the extracted effective audio. When the audio data is collected, blank content segments may be collected, the collected audio data may include silent segments, and the blank segments in the audio data can be eliminated as invalid data; or only the part with the voice can be determined as the effective audio, the voice part in the collected audio data is extracted, the extraction result is used as the effective audio, and then the second media data is generated based on the extracted effective audio.

S230: establishing an incidence relation between the first media data and the second media data;

after the first media data and the second media data are determined, an association relationship between the first media data and the second media data may be established, and in different applications, the association relationship between the first media data and the second media data may have different implementation manners and meanings. Since different media data types are collected, the incidence relation between the first data and the second media data exists objectively in most cases, and different applications of the photo can be realized according to the incidence relation between the first data and the second media data. For example, in the photo presentation stage, to implement presentation of different types of media data, the first data and the corresponding second media data may be read according to the association relationship to load, present, and play media content, for example, play audio content while presenting photo content.

When the type of the photo is a photo group in the form of the aforementioned panoramic photo or continuous photo, the association relationship may further include an association relationship between the sub-photo and the second media data, that is, an association relationship between the sub-photo in the photo group and the second media data is established. The second media data may be collected audio data or media content acquired based on the collected audio data, and here, the collected audio data is taken as an example. As shown in fig. 3, the photo data and the audio data may establish a corresponding relationship between the sub-photos 311 and 315 and the audio data 320, and in the specific implementation, all of the sub-photos may correspond to the audio data 320, or only a part of the sub-photos may be established corresponding to the audio data 320, for example, when the

sub-photos

312 and 314 are main contents of a photo group, only the

sub-photos

312 and 314 may be established corresponding to the audio data 320. When the audio data is collected in the form of a plurality of audio segments, corresponding second media data is generated according to the audio segments, and then the corresponding relation between each sub-photo and the corresponding second media data is established. Specifically, the association relationship between the second media data corresponding to the audio segment and one or more sub-photos may be established. For example, in fig. 3, the corresponding relationship between the sub-photos 341 and 345 of the panoramic photo and the corresponding audio segments in the audio data 330 can be established. When a valid audio extraction operation is performed on the audio data, an audio segment that is determined to be invalid may exist in the audio data, in which case, part of the sub-photos may not correspond to the audio segment, or multiple sub-photos may correspond to the same audio segment. The related application of the photo usually relates to a plurality of photo data, establishes an association relationship between the photo data and the second media data, not only can reflect the objective association relationship between the photo data and the corresponding second media data, but also can provide convenience for displaying subsequent applications such as the photo, for example, each media content can be loaded correctly according to the association relationship between the photo data and the corresponding second media data, and the like.

The association relationship between the photo data and the second media data may have different implementation manners under different data organization forms. A simple file association relationship may be established by the file name, for example, the file name is processed to be composed of the same characters, or a part of the file name is composed of the same characters, for example, the file of the photo data is named as "a 015. jpg", and the corresponding second media data is named as "a 015. wav", and when the file is read to show/play the media content, the corresponding media content is loaded only according to the file name. The formats of the pictures and the audio such as jpg, wav, etc. are only exemplary formats, and in practical applications, other suitable media formats of the pictures and the audio can be adopted.

When the photo data includes a plurality of sub-photos, the association relationship between the sub-photos and the second media data may be recorded using a separate file, for example, "a 019. inf", or information of the association relationship may be embedded in a file tag of the photo or the second media data. When reading the file to present/play the media content, the file tag of the photo/second media data may be read first to load the corresponding media content according to the record in the file tag.

An example of the association relationship of the sub-photo with the second media data is shown in table 1:

TABLE 1

image	audio
		A015.1.jpg	A015.1.wav
A015.2.jpg	N/A
		A015.3.jpg	A015.3.wav
…	…

When recording the association relationship between the sub-photos and the second media data by using the individual files or file tag information for the sub-photos obtained by dividing the panoramic photo, the following information may be recorded:

the number of sub-photos of the panoramic photo; panoramic photograph height/length; a second sequence of media data.

One specific example of the above information is as follows:

{parts＝5；pixels＝1920；audio＝(A015.1.wav；A015.2.wav；null；A015.4.wav；null)}

when reading the file to show/play the media content, the information can be read and analyzed firstly, the panoramic photo is divided according to the information to determine the range and the content of the sub-photos, the second media content corresponding to each sub-photo is indexed, and the corresponding media content is loaded according to the record in the information.

In an implementation manner of merging the photo data and the second media data into the same file, for example, when the photo data and the second media data are merged into different tracks of the same file, the association relationship information of the photo data and the second media data may be embodied as a binding relationship between the photo data and the second media data, or in an application manner including a plurality of sub-photos, the association relationship information may be embedded into a target file, for example, as target file tag information embedded into the target file.

In addition, some photo-related dynamic effects, such as birthday fireworks, love shapes, etc., may also be included in the generated target object. The photo-related dynamic effect may be determined based on application of a preset effect, user selection from a library of preset effects, and the like. The interactive effect data may be saved with the target object when generated. The interactive effect data and the target object have a corresponding relationship. The interactive effect data can be added to the photo data or can be independently stored. When the first media data, i.e., the photo data, is displayed, a corresponding interactive effect may be displayed according to the interactive effect data. When the interactive effect data is independently stored, optionally, the corresponding relationship between the interactive effect data and the target object, or the corresponding relationship between the interactive effect data and the photo data may be stored, for example, in the above-mentioned association relationship, so that when the first media data is displayed, the association relationship is read to determine the interactive effect data and display the corresponding interactive effect.

S240: generating a target object according to the first media data, the second media data and the incidence relation; and when the target object is operated, outputting the first media data and the second media data according to the corresponding relation.

And generating a target object according to the first media data, the second media data and the association relation, wherein the target object can have different forms in different applications. In one implementation, the target object may be a file set, the file set may include the photo data, the second media data and the association relationship information therebetween, and an example of the file set may be as shown in table 2, including the files in table 2:

TABLE 2

A015.1.jpg	A015.1.wav	A015.1.txt
			A015.2.jpg	A015.2.wav	A015.2.txt
A015.3.jpg	A015.3.wav	A015.3.txt
			…	…	A015.inf

The photo data in the jpg format, the second media data in the wav and txt formats and the file in the inf format are included to record the association relationship between the photo data and the second media data.

Each file in the file set, including the photo data file and the second media file, may be stored separately, or the image file, the audio file and the association relation of each sub-photo may be stored in one unified package file. For example, the files listed in table 2 are packed and saved in one packed file, so that different groups/sets of pictures can be easily identified, and the storage in a compressed form can save the storage space. Another way to generate the target object may be to combine the photo data and the second media data into the same file according to the association relationship, such as finally saving the same as 13 in fig. 1. Of course, when the photo data and the second media data are merged into the same file according to the association relationship, the correspondence relationship between the multiple sub-photos and the multiple audio segments needs to be considered, for example, when the panoramic photo 340 and the audio 330 in fig. 3 are merged into the same file, the audio segments of the sub-photos may be correspondingly stored in corresponding tracks, and the correspondence relationship is embodied by the positions of the sub-photos and the audio. Of course, the sub-photos of the panoramic photo and the corresponding second media files may also be files stored independently from each other, and the association relationship between the two is recorded by recording the files or the file tag information. When the sub-photos in the photo group are corresponding to the corresponding parts of the generated panoramic photo, the association relationship between the sub-photos of the panoramic photo and the second media data can be stored, when the image file is displayed and the audio file is played, the panoramic photo can be displayed, and when the parts corresponding to the sub-photos of the panoramic photo are displayed, the audio file corresponding to the current sub-photo is played according to the association relationship.

In a word, in the process of generating the target object according to the photo data, the second media data and the association relationship, the photo data can be saved to the image file, the audio data can be saved to the audio file, and the association relationship can be saved; when the first media data and the second media data are output according to the corresponding relationship, for example, when the photo content is viewed through a photo viewing application, the stored associated relationship can be read, the image file can be displayed, and the audio file can be played.

The method for acquiring the photo provided by the first embodiment of the present application is described in detail above, and the method may acquire audio data in a process of acquiring photo data, generate second media data according to the acquired audio data, where the second media data may be audio content obtained from the audio data, and generate a target object according to the first media data, the second media data, and an association relationship after establishing the association relationship between the audio data and the second media data; and when the target object is operated, outputting the first media data and the second media data according to the corresponding relation. The obtained photo combines the photo data and the content generated based on the audio data when the photo data is collected, the scene information when the photo is shot is reflected through the second media data, the information related to the scene when the photo is shot is integrated into the target object together with the photo content, and particularly, for the photo types with relatively long shooting time, such as panoramic photos or continuous photos, rich synchronous audio content can be collected. Compared with media contents such as videos and the like, the target object produced by the method has the advantage of being relatively light in weight. When the target object is output based on the obtained target object, the information related to the scene of the photo at the moment of shooting can be obtained, the photo content is enriched, the interestingness of photo application is increased, and the richness of photo application is increased.

Example two

The second embodiment of the present application provides a method for processing a media object, which may be based on the target object generated in the first embodiment, where the generated target object is processed as a media object. The method focuses more on the processing of such media objects, for example, a media object browsing application may be provided, by which the media objects of the type may be processed, and photo data and content of the second media data in the media system may be read and parsed for viewing by the user. The media object may be generated based on first media data, which may include photo data, and second media data, which may be generated from audio data captured during the capturing of the photo data. The audio data collected in the process of collecting the photo data can be realized by calling a voice assistant in the system or collecting the audio data through a recording interface of the system. As shown in fig. 4, which is a flow chart of a processing method of a media object, the method may include the following steps:

s410: providing a first operation option for operating the media object;

first, a first operation option may be provided to operate on a media object. In specific implementation, different implementation modes can be adopted according to practical application. For example, the processing method of the media object may be applied to a photo presentation application, through which photo data is processed and presented and corresponding second media data is played/presented. The specific form of the first operation option may be different according to the actual application environment of the software application and the data organization form of the media object. For example, when the media object comprises a file set including the photo data, the second media data and the association relationship information of the two, the item of any photo data displayed can be used as a first operation option; further, for example, when the file of the media object is a single file, in a manner as described above in which the photo file and the second media file are packaged in one packaged file or combined in one file, thumbnails of the packaged files or combined files may be provided in the user interface, and a plurality of thumbnails may form a file list with the thumbnails as the first operational option.

A file list may be provided in the user interface, the items in the file list corresponding to the media objects, and the second media data may be hidden in the user interface while only the visual content related to the photo image is displayed, so that the displayed content is more intuitive and concise. The file list may include a plurality of items, for example, thumbnails corresponding to the picture data in the media files may be included, and a first operation option may be implemented on the items in the file list, for example, the thumbnails may be configured as operable objects, and when a user clicks on the thumbnails, data of the corresponding media objects may be read and parsed, and corresponding photos may be displayed, and corresponding audio and other second media data may be played. To distinguish from a common photo file, icon information may also be provided on the thumbnail of the media object to identify the target object as a media object including photo data and second media data.

S420: and when an operation request for loading the media object is received through the first operation option, loading the photo data and the second media data, displaying the photo data and playing corresponding audio data content.

When an operation request for loading the media object is received through the first operation option, the content data of the media object may be read, the photo data and the second media data may be loaded, the photo content therein may be displayed, and the content of the second media data may be displayed/played, such as displaying characters obtained based on the audio data, playing audio content, and the like. The specific implementation of loading the photo data and the second media data may be different according to different data organization forms of the media object, for example, when the media object includes a file set including the photo data, the second media data and the association information thereof, the association information thereof may be read, and when the photo data is loaded, the corresponding second media data is loaded according to the association information, for example, when the photo data includes a plurality of sub-photos, each of which corresponds to a different audio segment, and when the current sub-photo is loaded, the audio segment corresponding to the current sub-photo may be loaded and played according to the association information.

When the media object further includes the association relationship information between the photo data and the second media data, and when the photo data and the second media data are loaded, the association relationship information can be read, and the second media data corresponding to the photo data is determined according to the association information, so that the corresponding audio data content is played when the photo data is displayed. The photograph data may include a group of photographs including at least two sub-photographs, and the audio data may include a plurality of segments of audio segments, wherein each audio segment corresponds to one or more of the sub-photographs of the group of photographs. In this implementation, when the corresponding audio data content is played while the photo data is displayed, the association relationship between each sub-photo and the audio segment may be read, and the audio segment corresponding to each sub-photo is determined according to the association relationship, so as to play the corresponding audio segment content while the sub-photo is displayed.

The sub-photos may include sub-photos obtained by dividing the panoramic photo, and each sub-photo corresponds to a corresponding part of the generated panoramic photo; correspondingly, the association relationship may include an association relationship between each sub-photo of the panoramic photo and the audio segment; in this implementation manner, the association relationship information is read, when the second media data corresponding to the photo data is determined according to the association information, the association relationship between each sub-photo of the panoramic photo and the audio segment may be read, and the audio segment corresponding to the currently displayed sub-photo of the panoramic photo is determined according to the association relationship, so that the audio segment content corresponding to the currently displayed sub-photo is played when the panoramic photo is displayed.

When the file of the media object is a single file, the photo file and the second media file are packaged into a packaged file as described above, the unpacking process may be performed first, and the photo data and the corresponding second media data are loaded therein. When the file of the media object is a single file, the photo data and the second media file are merged into one file and are accommodated through different tracks, the tracks of the merged file can be analyzed, the photo data and the corresponding second media data are determined, and the photo data and the corresponding second media data are displayed or played.

In addition to providing the first operation option for requesting to load the media object, a second operation option for operating the display content may be provided, for example, when the photo content is displayed, an operation option for a sliding operation may be provided through a screen of the terminal device, and when the sliding operation is received through the second operation option, a portion of the panoramic photo that is displayed is switched according to a sliding direction, for example, when the sliding operation is performed leftward, a next sub-photo portion of the panoramic photo is switched, and an audio segment corresponding to the switched portion is determined and played.

The second media data may include a combination of a plurality of media information, for example, voice recognition may be performed on the collected audio data, text information corresponding to the audio data may be determined, both the audio and the text may be used as the second media data, and an operation option of switching operation may be provided when displaying. In a specific implementation, a third operation option for switching the content of the providing mode of the second media data may be provided, and when a switching operation request is received through the third operation option, switching is performed between the provided audio data or the displayed text information. For example, as shown in fig. 5, in fig. 5(a), the photo data currently displayed in the user interface and the corresponding audio content are played, and a switching operation option 510 is provided, and after a switching request is received through the switching operation option 510, the playing of the audio content of the second media data is switched to the text presentation of the second media data, as shown in the state of fig. 5(b), the photo data currently displayed and the corresponding text content are displayed, and an operation option 520 is provided, and when a switching operation request is received through the operation option 520, the format of playing the audio content of the second media data shown in fig. 5(a) can be switched again.

When playing the second media data in the form of audio content, an operation option to control the playing of the audio content may also be provided. For example, control buttons for corresponding audio content may be provided at the time of photo data displayed in the user interface. In one implementation, the audio content may be configured to be in a stop-play state in a default state, and a play operation option is provided, and when the user operates the play operation option, the corresponding audio content is played. In a specific implementation, a fourth operation option for controlling the audio data content may be provided, and when a control request is received through the fourth operation option, the playing/pausing of the audio data content is controlled, and then the playing/stopping is continued, and so on.

The display of the photo data and the playing of the corresponding audio data content in the user interface may have different specific implementations, for example, a target page may be provided, and the content of the audio data may be loaded as a background sound of the page when the photo data is loaded as a background image of the target page. And a target window can be provided in the applications such as injection photo display and the like, photo data is displayed in the target window, and meanwhile, an audio playing interface is called to play corresponding audio data contents in a background mode.

The second embodiment of the present application provides a method for processing a media object, where the media object may be generated based on first media data and second media data; the first media data comprise photo data, and the second media data are generated according to audio data collected in the photo data collecting process; a first operation option may be provided to operate on the media object; and when an operation request for loading the media object is received through the first operation option, loading the photo data and the second media data, displaying the photo data and playing corresponding audio data content. By the method, based on the content generated by combining the photo data and the audio data generated by collecting the photo data, the scene information during photo shooting is reflected by the second media data, the information related to the scene at the time of photo shooting is integrated with the photo content into the media object, and particularly, rich synchronous audio content can be collected for photo types with relatively long shooting time, such as panoramic photos or continuous photos. When the media object is output, the photo content can be displayed, and meanwhile, the information related to the scene of the photo at the moment of shooting is obtained, so that the photo content is enriched, and the interestingness and richness of photo application are increased.

EXAMPLE III

The third embodiment of the application provides a panoramic photo; wherein the panoramic photograph may include first media data; the first media data may be image data of a panoramic photograph; and second media data, the second media data can be produced according to the audio data collected synchronously in the course of collecting the first media data; therefore, when the panoramic photo is operated, the first media data and the second media data can be output, the effect that synchronous audio content of a corresponding scene can be played while the panoramic photo is displayed is achieved, and the application of the photo is enriched.

The panoramic photo provides a 'rich media' photo information organization form, and can also be considered as a new data form. On the basis of the photo, the second media data obtained according to the audio data is introduced, and a photo form of 'rich media' can be obtained through the combined application of the photo and the second media data. The second media data may be obtained from the captured audio data. The collection and processing of audio data is relatively easy to implement, and the processing such as recording, encoding and storing of audio is more lightweight. Moreover, the second media data obtained based on the audio data can be processed and output in a superposition mode on the basis of the photo content, the photo is more emphasized to be represented while certain photo shooting scene information is represented, the photo obtaining and displaying processes are all processed by taking the photo data as the center, and the main body of the photo can be further represented in the photo application. The target object is generated by combining the photo image and the second media data generated based on the audio data, and the target object may have different processing modes according to actual application requirements, for example, in a storage mode, the photo image and the corresponding second media data may be stored respectively, and the association relationship between the photo image and the corresponding second media data may be embodied in a certain mode. For example, 11 and 12 in fig. 1 are described above, where 11 in fig. 1 is an image file corresponding to the photo data, and 12 in fig. 1 is a file corresponding to the second media data in the form of audio, and the two are identified by similar file numbers. A new file format may also be developed to accommodate the photo image and the second media data, that is, the photo data and the second media data are stored in the same file, and the contents of each part of the file are parsed and loaded by a processing tool for the file format. Such a storage may be, for example, a file 13 in fig. 1, where the file 13 includes the photo data 14 and the second media 15, and the photo data 14 and the second media 15 may be viewed as different blocks or tracks of the file 13. Of course, in practical application, the method is not limited to the two examples. The panoramic photo may include a plurality of logically divided sub-photos, and different sub-photos may correspond to different audio segments, such as the aforementioned organization manner of the sub-photos and the audio segments in fig. 3; of course, some of the molecular photos may not correspond to any audio piece, similar to the correspondence shown in table 1 above.

The third embodiment of the present application provides a panoramic photo, which may include image data of the panoramic photo, and second media data; the second media data is generated from audio data synchronously acquired during the acquisition of the first media data. Thus, a photo form of "rich media" is provided, in which audio data as a synchronous medium of a scene at the time of shooting can directly embody information related to the scene at the time of shooting, and the audio data or information generated based on the audio data and the like can be stored together with the photo data as second media data, for example, in the same folder, or the second media data can be embedded in the photo data to form a single file. When the display is carried out, the photo can be displayed, and the audio scene information when the photo is shot can be obtained by analyzing the second media data. The content of the photo information is enriched, and the interestingness and the richness of the content of the photo are increased.

Corresponding to the first embodiment of the present application, there is also provided an apparatus for obtaining a photo, as shown in fig. 6, which is a schematic diagram of the apparatus for obtaining a photo, and the apparatus may include:

the audio data acquisition unit 610 is used for acquiring audio data in the process of acquiring the first media data; wherein the first media data may include photo data;

a second media data generating unit 620, configured to generate second media data according to the acquired audio data;

an association relationship establishing unit 630, configured to establish an association relationship between the first media data and the second media data; and

a target object generating unit 640, configured to generate a target object according to the first media data, the second media data, and the association relationship; and when the target object is operated, outputting the first media data and the second media data according to the corresponding relation.

Corresponding to the second embodiment of the present application, a device for processing a media object is further provided, as shown in fig. 7, which is a schematic diagram of a device for acquiring a photo. Wherein the media object may be generated based on the first media data and the second media data; the first media data comprise photo data, and the second media data are generated according to audio data collected in the photo data collecting process. The apparatus may include:

an operation option providing unit 710, configured to provide a first operation option for operating the media object;

an object loading and displaying unit 720, configured to, when an operation request for loading the media object is received through the first operation option, load the photo data and the second media data, display the photo data, and play corresponding audio data content.

In addition, an embodiment of the present application further provides an electronic device, which may include:

one or more processors; and

generating second media data according to the collected audio data;

Where fig. 8 illustrates an architecture of an electronic device, for example, device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, an aircraft, and the like.

Referring to fig. 8, device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing element 802 may include one or more processors 820 to execute instructions to complete generating a traffic compression request when a preset condition is met in the video playing method provided in the technical solution of the present disclosure, and sending the traffic compression request to the server, where the traffic compression request records information for triggering the server to acquire a target attention area, and the traffic compression request is used to request the server to preferentially ensure a bitrate of video content in the target attention area; and playing the video content corresponding to the code stream file according to the code stream file returned by the server, wherein the code stream file is all or part of the video file obtained by carrying out code rate compression processing on the video content outside the target attention area by the server according to the flow compression request. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in the position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, orientation or acceleration/deceleration of the device 800, and a change in the temperature of the device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communications component 816 is configured to facilitate communications between device 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions, for example, the memory 804 includes instructions, which are executable by the processor 820 of the device 800 to perform generating a traffic compression request when a preset condition is met in a video playing method provided in the technical solution of the present disclosure, and sending the traffic compression request to a server, where the traffic compression request records information for triggering the server to obtain a target attention area, and the traffic compression request is used to request the server to preferentially guarantee a bitrate of video content in the target attention area; and playing the video content corresponding to the code stream file according to the code stream file returned by the server, wherein the code stream file is obtained by performing code rate compression processing on the video content outside the target attention area by the server according to the flow compression request. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

User-specific personal data may be used in the schemes described herein within the scope of the applicable laws and regulations to be allowed in the country where it is required (e.g., the user explicitly agrees, the user is informed of it, etc.).

The method and the device for acquiring the photos and processing the media objects provided by the application are introduced in detail, and a specific example is applied in the text to explain the principle and the implementation of the application, and the description of the above embodiment is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific embodiments and the application range may be changed. In view of the above, the description should not be taken as limiting the application.

Claims

1. A method for obtaining a photograph, comprising:

generating second media data according to the collected audio data;

2. The method of claim 1, wherein the photo data comprises a group of photos, the group of photos comprising at least two sub-photos; the establishing of the association relationship between the first media data and the second media data includes:

and establishing an association relation between the sub-photos in the photo group and the second media data.

3. The method of claim 2, wherein the sub-photos in the photo group comprise sub-photos obtained by dividing the panoramic photo, and each sub-photo corresponds to a corresponding portion of the generated panoramic photo.

4. The method of claim 2, wherein the sub-photos in the group of photos include sub-photos obtained by continuous shooting, each sub-photo corresponding to a separate photo file.

5. The method of claim 2, wherein the captured audio data comprises a plurality of audio segments, wherein each audio segment corresponds to one or more sub-photos of the group of photos.

6. The method of claim 5, wherein generating second media data from the captured audio data comprises:

generating corresponding second media data according to each audio segment;

the establishing of the association relationship between the first media data and the second media data includes:

and establishing an association relation between second media data corresponding to the audio segments and one or more sub-photos.

7. The method of claim 1, wherein generating second media data from the captured audio data comprises:

and carrying out voice recognition on the collected audio data, determining character information corresponding to the audio data, and determining the character information as the second media data.

8. The method of claim 1, wherein generating second media data from the captured audio data comprises:

determining audio content according to the acquired audio data, performing voice recognition on the acquired audio data, and determining character information corresponding to the audio data;

and determining the second media data based on the audio content and the text information.

9. The method of claim 1, wherein generating second media data from the captured audio data comprises:

effective audio is extracted from the collected audio data, and second media data is generated based on the extracted effective audio.

10. The method according to any one of claims 1-9, wherein generating the target object according to the first media data, the second media data and the association relationship comprises:

storing the photo data into an image file, storing the audio data into an audio file, and storing the association relation;

the outputting the first media data and the second media data according to the corresponding relationship includes:

and reading the stored association relation, displaying the image file, and playing the audio file.

11. The method of claim 10, further comprising:

and storing the image file, the audio file and the association relation of each sub-photo into a unified packaging file.

12. The method of claim 10, wherein the photo data comprises a group of photos, the group of photos comprising at least two sub-photos; the sub-photos in the photo group comprise sub-photos obtained by dividing the panoramic photo, and each sub-photo corresponds to a corresponding part of the generated panoramic photo respectively; the storing the association relationship includes:

storing the association relation between each sub-photo of the panoramic photo and the second media data;

the displaying the image file and playing the audio file include:

and displaying the panoramic photo, and playing an audio file corresponding to the current sub-photo according to the association relation when the part corresponding to the sub-photo of the panoramic photo is displayed.

13. The method according to any one of claims 1-9, further comprising:

carrying out image recognition on the acquired photo data, and determining an image recognition result of the photo data;

adding the result of the image recognition to the second media data.

14. The method according to any one of claims 1-9, further comprising:

acquiring the geographical position and/or weather state information in the process of acquiring the first media data;

adding the geographical location and/or weather status information to the second media data.

15. The method of any of claims 1-9, wherein the capturing of audio data during the capturing of the first media data comprises:

in the process of collecting the first media data, collecting audio data of shooting explanation of a photographer;

the generating of the second media data from the captured audio data comprises:

and generating second media data related to the shooting commentary according to the audio data of the shooting commentary.

16. The method according to any one of claims 1-9, further comprising:

determining interactive effect data; the interactive effect data and the target object have a corresponding relation; and when the first media data is displayed, displaying corresponding interactive effects according to the interactive effect data.

17. A method for processing a media object, wherein the media object is generated based on first media data and second media data; the first media data comprise photo data, and the second media data are generated according to audio data collected in the photo data collection process; the method comprises the following steps:

providing a first operation option for operating the media object;

18. The method of claim 17, wherein the media object further comprises association information of the photo data and the second media data; the loading the photo data and the second media data, displaying the photo data and playing corresponding audio data content includes:

and reading the association relation information, and determining second media data corresponding to the photo data according to the association information so as to play corresponding audio data contents when the photo data is displayed.

19. The method of claim 18, wherein the photo data comprises a group of photos, the group of photos comprising at least two sub-photos; the audio data includes a plurality of segments of audio segments, wherein each segment of audio corresponds to one or more sub-photos of the group of photos.

20. The method of claim 19, wherein the association information comprises an association of each sub-photo in the group of photos with the audio segment;

the reading the association relationship information, and determining second media data corresponding to the photo data according to the association information, so as to play corresponding audio data content when the photo data is displayed, including:

and reading the association relation between each sub-photo and the audio segment, and determining the audio segment corresponding to each sub-photo according to the association relation so as to play the corresponding audio segment content when displaying the sub-photos.

21. The method of claim 19, wherein the sub-photos include sub-photos obtained by dividing the panoramic photo, and each sub-photo corresponds to a corresponding portion of the generated panoramic photo; the incidence relation comprises the incidence relation between each sub-photo of the panoramic photo and the audio segment;

and reading the association relation between each sub-photo of the panoramic photo and the audio segment, and determining the audio segment corresponding to the currently displayed sub-photo of the panoramic photo according to the association relation so as to play the audio segment content corresponding to the currently displayed sub-photo when the panoramic photo is displayed.

22. The method of claim 21, further comprising:

providing a second operation option for operating the display content;

and when a sliding operation is received through the second operation option, switching and displaying the part of the panoramic photo according to the sliding direction, and determining and playing the audio segment corresponding to the switched part.

23. The method of claim 17, further comprising:

carrying out voice recognition on the collected audio data, and determining character information corresponding to the audio data;

providing a third operation option for switching the content of the providing mode of the second media data;

and when a switching operation request is received through the third operation option, switching between the provided audio data or the displayed text information.

24. The method of claim 17, wherein the capturing audio data captured during the capturing of the photograph data comprises:

and acquiring audio data by calling a voice assistant in the system or a recording interface of the system.

25. The method of claim 17, further comprising:

providing a fourth operation option for controlling the audio data content;

and when a control request is received through the fourth operation option, controlling the playing/pausing/playing continuation/stopping of the audio data content after pausing.

26. The method of claim 17, wherein displaying the photo data and playing the corresponding audio data content comprises:

and providing a target page, and loading the content of the audio data as the background sound of the page when the photo data is loaded as the background image of the target page.

27. The method of claim 17, wherein displaying the photo data and playing the corresponding audio data content comprises:

and providing a target window, displaying the photo data in the target window, and calling an audio playing interface to play corresponding audio data content in a background.

28. The method of any one of claims 17-27, further comprising:

providing a file list in a user interface, items in the file list corresponding to the media objects, and hiding the second media data in the user interface.

29. The method of claim 28, further comprising:

implementing the first operational option on an item in the file list.

30. The method of claim 28, wherein the items in the file list comprise thumbnails of picture data.

31. The method of claim 29, wherein icon information is provided on a thumbnail of the picture data, the icon information identifying a target object as the media object including the picture data and the second media data.

32. A panoramic photograph, comprising:

33. An apparatus for obtaining a photograph, comprising:

34. A processing apparatus of a media object, wherein the media object is generated based on first media data and second media data; the first media data comprise photo data, and the second media data are generated according to audio data collected in the photo data collection process; the device comprises:

35. An electronic device, comprising:

one or more processors; and

generating second media data according to the collected audio data;