WO2019039194A1

WO2019039194A1 - Voice image reproduction device, voice image reproduction method, and data structure of image data

Info

Publication number: WO2019039194A1
Application number: PCT/JP2018/028373
Authority: WO
Inventors: 裕生渡邉
Original assignee: 株式会社Ｊｖｃケンウッド
Priority date: 2017-08-23
Filing date: 2018-07-30
Publication date: 2019-02-28
Also published as: JP6874593B2; JP2019041191A

Abstract

Provided are a voice image reproduction device, a voice image reproduction method, and the data structure of image data, by which image display corresponding to a reproduction elapsed time of voice data can be easily performed. The present invention is provided with a voice image reproduction unit that reproduces voice and an image from voice data in which image data including data obtained by encoding the image and including meta data related to the data, is embedded. The meta data at least includes telop information in which text data and time information are paired with each other. The voice image reproduction unit reproduces the voice on the basis of the voice data, and displays, in accordance with the telop information of the image data, a telop image that is based on the text data corresponding to an elapsed time from start of the voice reproduction.

Description

Audio image reproduction apparatus, audio image reproduction method, and data structure of image data

The present invention relates to an audio and video reproduction apparatus, an audio and video reproduction method, and a data structure of image data.

In recent years, with the spread of various information devices, it has become possible to easily reproduce images. In addition, at the time of image reproduction, it has become possible to display related information in a superimposed manner, and various techniques involved in the reproduction have been proposed (for example, Patent Document 1, Patent Document 2, Patent Document 3) .

The content display device disclosed in Patent Document 1 can display moving image content and stream AR information in a superimposed manner (Paragraphs 0062 to 0065 of Patent Document 1).

The timing correction system disclosed in Patent Document 2 can correct the timing of superimposing and displaying a comment on a video in accordance with the video (Paragraph 0033 of Patent Document 2).

In the subtitle display device disclosed in Patent Document 3, the timing of superimposing the moving image and the text can be matched (Paragraph 0033 of Patent Document 3).

Patent No. 6130841 JP, 2016-200711, A Patent No. 4792458 gazette

In order to display information superimposed on an image, it is necessary to manage data relating to superimposed display, such as superimposed timing and superimposed text, separately from image data, which causes a problem of complexity.

In addition, an apparatus and software capable of reading out data relating to superimposed display are required, which causes a problem of cost.

The present invention solves at least one of the above-mentioned problems, and an audio and video reproduction apparatus, an audio and video reproduction method, and data of image data capable of easily displaying an image according to the reproduction elapsed time of audio data. Intended to provide a structure.

In order to solve the above problems, an audio and video reproduction apparatus according to the present invention reproduces audio data in which image data including data obtained by encoding an image and metadata relating to the data is embedded. The metadata includes at least telop information in which text data and time information are combined, and the audio image reproducing unit reproduces the audio based on the audio data, and based on the audio data according to the telop information of the image data It is characterized in that a telop image based on text data according to an elapsed time from the start of reproduction of voice is superimposed on an image based on image data.

Further, in the above-described audio and video reproduction apparatus, the telop information further includes text control information including at least one of color information of text data, font information, information indicating presence or absence of shading, and background color information. Preferably, the audio image reproducing unit displays a telop image based on text data in accordance with the text control information.

In addition, in the above-described audio image reproducing apparatus, it is preferable that the text data is lyric data, and the data obtained by encoding the image is obtained by encoding the original image data made of artwork.

In addition, another aspect of the present invention relates to an audio-visual image reproduction method. That is, the audio and video reproduction method of the present invention has a data reproduction step of reproducing audio data in which image data having encoded data and metadata is embedded, and the metadata includes text data and time information. The data reproduction step includes the step of reproducing the audio data, the step of reproducing the audio data, and the telop based on the text data according to the elapsed time from the start of the reproduction of the audio data according to the telop information of the image data. And displaying the image superimposed on the image data.

Another aspect of the present invention relates to the data structure of image data. That is, the data structure of the image data according to the present invention is text data for displaying a text superimposed on the image in addition to the data obtained by encoding the image, and time information indicating timing of superimposing the text of the text data on the image And metadata having at least telop information in combination with the above.

According to the present invention, it is possible to provide an audio and video reproduction apparatus, an audio and video reproduction method, and a data structure of image data capable of easily displaying an image according to the reproduction elapsed time of audio data.

It is a figure shown about an outline of generation processing of image data which stored metadata. It is a figure which shows the example of a format of image data. It is a figure which shows the example which trims and displays a score according to reproduction | regeneration elapsed time. It is a block diagram showing the example of hardware constitutions of an information processor. It is a figure showing an example of functional block composition of an information processor for carrying out image reproduction processing accompanied by trimming processing. It is a flowchart which shows an image trimming display process. It is a figure which shows the example which carries out a telop display according to reproduction | regeneration elapsed time to a lyric. It is a figure showing an example of functional block composition of an information processor for carrying out an example of sound image reproduction processing concerning an embodiment of the present invention. It is a flowchart which shows telop display processing. It is a figure which shows the example of the image data by which falsification detection data were described by the metadata. FIG. 7 is a diagram showing an example of a functional block configuration of an information processing apparatus for performing an example of image reproduction processing accompanied by tampering detection. 5 is a flowchart showing tampering detection processing in image reproduction processing. It is a figure which shows the process example in case an original image is a map image and metadata contain character strings, such as a place name selected and displayed according to the position on the map, and a setting language. It is a figure which shows the processing example in case an original image is a photograph and metadata contain character strings, such as an address of the photography place of the photograph, and a facility name. It is a figure which shows the process example in case an original image is an image of a road information sign, and metadata contain the text data which show the content of the road information sign. It is a figure which shows the process example in case the original image data is encrypted by a public key and the public key is stored in metadata. FIG. 6 is a diagram showing an example of processing when the original image is a landscape picture and the metadata includes object information such as position information of a building or the like in the picture. It is a figure which shows the process example in case an original image is a landscape photography and metadata contain object information, such as positional information on a building etc. in the photography.

The audio and video reproduction apparatus, the information processing apparatus, the audio and video reproduction method, and the data structure of image data according to the present invention will be described below with reference to FIGS. 1 to 6 and FIGS. The audio image reproduction processing according to the embodiment of the present invention will be described with reference to FIGS. 7 to 9. The image reproduction processing in FIGS. 1 to 6 and FIGS. 10 to 18 can be replaced with or combined with the image reproduction processing in the sound and image reproduction processing described with reference to FIGS. 7 to 9. The audio and video reproduction apparatus, the information processing apparatus, the audio and video reproduction method, and the data structure of the image data according to the present invention are not limited to the embodiments exemplified herein. The description will be made in the following order.
1. Outline of generation process of image data storing metadata Example of image reproduction processing Example of sound and image reproduction processing Another Example of Image Reproduction Process Modified example

<< Outline of generation process of image data storing metadata >>
FIG. 1 is a diagram showing an outline of generation processing of image data storing metadata. The information processing apparatus 1 of the present embodiment is, for example, an apparatus such as a notebook computer or a desktop computer. The information processing apparatus 1 generates metadata, and functions as an image data generating apparatus that generates image data storing the generated metadata, and an image capable of reproducing an image from the image data storing the metadata It has a function as a playback device. Therefore, in the information processing apparatus 1, a program for functioning as an image data generation apparatus and a program for functioning as an image reproduction apparatus are installed in advance. However, the information processing apparatus 1 may have only the function as the image data generation apparatus or any one of the functions of the image reproduction apparatus.

The information processing apparatus 1 inputs original image data captured by a camera or original image data created by image processing (including so-called artwork that is data created by image processing software), and Input playback control data of image data. The reproduction control data is, for example, data consisting of trimming information in which time information and area information are combined. The area information is information for specifying an area in the original image data, and is, for example, information including upper left coordinates, width, and height, or information including upper left coordinates and lower right coordinates. . The time information is information indicating an elapsed time (elapsed time) from the start of reproduction of the original image data.

The information processing apparatus 1 performs a predetermined encoding process on the input original image data, and generates metadata from the input reproduction control data, and has the encoded data and the generated metadata. Generate image data.

FIG. 2 is a view showing an example of the format of image data. As shown in FIG. 2, the image data P includes SOI (Start of Image), APP1 (Application marker segment 1),... APP11 (Application marker segment 11), original image data, and EOI (End of Image). It consists of areas. The image data P of the present embodiment is defined, for example, by a box file format of JPEG XT Part 3 which is an extension function of the conventional JPEG (Joint Photographic Experts Group) standard, and an extensible box-based which can be freely described File format is specified.

The SOI is a marker at the top of the JPEG file and representing the start point of the JPEG file. By reading this SOI, the JPEG file is identified.

APP1 stores attached information (Exif: Exchangeable image file format) for the image.

The APP 11 stores metadata defined by the box file format of JPEG XT Part 3 described in JSON (JavaScript Object Notation). More specifically, in APP 11, the length of the application marker segment and a plurality of box data are stored, and in each box data, the box length (Box Length), box type (Box Type), metadata Stores the type (Metadata type), schema ID (Schema ID), and metadata. In the example of FIG. 2, data in which the metadata type is MIME, the schema ID is APP / JSON, and the metadata is JSON is stored in the box data of JUMBF (0). In the box data of JUMBF (1), data having a metadata type of Vender, a schema ID of Vender / XXX, and metadata of XXX data is stored.

As original image data, compressed image coded data in JPEG format is stored.

EOI is a marker that represents the end of the JPEG file.

As shown in FIG. 2, by storing metadata that can be described in JSON in the box file data of the APP 11 of the image data P, managing reproduction of the image by reading the data designated there Can.

<< Example of image reproduction processing for trimming and displaying images (example using score) >>
FIG. 3 is a diagram showing an example of trimming the score according to the playback elapsed time. As shown in FIG. 3, in the original image data of the image data P1, image encoded data consisting of a score of 12 bars is stored. The metadata M1 described in JSON is stored in the area of the APP 11 of the image data P1. In the metadata M1, the first line is "" clip ": [", the second line is "{", the third line is "" time ": 0,", the fourth line is """:10," 5th line "" top ": 60," 6th line "" width ": 400," 7th line "" height ": 100, 8th line The eyes "}," line 9 "{", line 10 "" time ": 16,", line 11 "" left ": 10, line 12"" top ": 160", "13""line":"400", line 14 "" height ": 100", line 15 "},", line n " ] Is described.

““ Clip ”is information instructing to use the trimming function (clip function). The information described after "" time "indicates time information, and the information described after" "left" "," "top", and "" width "indicates area information. That is, trimming information in which time information and area information for trimming a predetermined position of an image are trimmed by the trimming function is described in the metadata M1, and the information processing apparatus 1 uses the metadata (trimming information). ) By reading out M1, it is possible to trim and sequentially display a predetermined area based on the area information linked to the time information according to the elapsed time from the start of reproduction of the image data P1.

In the example of FIG. 3, when displaying the image data P1 in which such metadata M1 is stored, the height from the position of the left 10 pixels and the upper 60 pixels from the display start time to the first 16 seconds is An area 100 pixels wide and 400 pixels wide is trimmed. Thus, the area P2 of the first four bars is trimmed and displayed as indicated by the end of the arrow A1.

Subsequently, an area 100 pixels high and 400 pixels wide is trimmed from the position of 10 pixels on the left and 160 pixels on from the left until 16 seconds after the display start time until 32 seconds. As a result, as indicated by the end of the arrow A2, the area P3 of the next four bars is trimmed and displayed.

Details of the operation of trimming and displaying the image data as described above in accordance with the elapsed time will be described later with reference to the flowchart.

<Configuration Example of Information Processing Device>
FIG. 4 is a block diagram showing an example of the hardware configuration of the information processing apparatus 1. The information processing apparatus 1 includes a central processing unit (CPU) 11, a read only memory (ROM) 12, a random access memory (RAM) 13, a bus 14, an input unit 15, an output unit 16, a storage unit 17, and a communication unit 18. Have.

The CPU 11, the ROM 12 and the RAM 13 are mutually connected by a bus 14. An input unit 15, an output unit 16, a storage unit 17, and a communication unit 18 are also connected to the bus 14.

The input unit 15 includes an input device such as a keyboard and a mouse, and supplies various information to the CPU 11 via the bus 14. The output unit 16 is composed of an output device such as a display or a speaker, and displays an image or reproduces an audio according to an instruction of the CPU 11. The storage unit 17 is configured of a hard disk, a non-volatile memory, and the like. The storage unit 17 stores various data such as image data in which metadata is stored, in addition to the program executed by the CPU 11. The communication unit 18 is configured by a network interface or the like, and communicates with an external device (not shown) via wireless or wired communication.

FIG. 5 shows an example of functional block configuration of the information processing apparatus 1 for carrying out an example of image reproduction processing for trimming an image as an information processing apparatus 1A. The information processing apparatus 1A includes an image data generating apparatus 30 that generates metadata and generates image data storing the generated metadata, and an image reproducing apparatus 40 that reproduces an image based on the metadata. .

The image data generation device 30 includes an image encoding unit 31, a metadata generation unit 32, an image data generation unit 33, and a recording control unit 34.

The image encoding unit 31 inputs original image data captured by a camera or original image data created by image processing, and encodes the input original image data in JPEG XT format. The obtained image coded data is supplied to the image data generation unit 33.

The metadata generation unit 32 inputs reproduction control data composed of trimming information in which time information and area information are combined, and generates metadata defined by a box file format of JPEG XT Part 3 that can be described in JSON. The generated metadata is supplied to the image data generation unit 33.

The image data generation unit 33 generates image data (FIG. 2) in which the image coded data supplied from the image coding unit 31 and the metadata supplied from the metadata generation unit 32 are stored. The generated image data is supplied to the recording control unit 34.

The recording control unit 34 supplies the image encoded data and the image data having the metadata supplied from the image data generation unit 33 to the storage unit 17 and controls the recording there.

The image reproduction device 40 includes an analysis unit 41, an image decoding unit 42, an image storage unit 43, an image trimming unit 44, and an output control unit 45.

The analysis unit 41 acquires image data from the storage unit 17 based on an instruction from the input unit 15, analyzes the metadata stored in the acquired image data, and stores the JPEG XT stored in the image data. The encoded image data in the format is supplied to the image decoding unit 42. The analysis unit 41 starts an internal timer (not shown), and among the plurality of trimming information in which time information of the internal timer and time information described in the analyzed metadata and area information are combined, the internal timer performs time measurement. The image trimming unit 44 is controlled based on trimming information having time information that matches the time. That is, based on a plurality of trimming information described in the metadata, the analyzing unit 41 sequentially trims an image of a predetermined area among the images represented by the image data stored in the image storage unit 43 at a predetermined timing. The image trimming unit 44 is controlled so as to

The image decoding unit 42 decodes the image coding data in the JPEG XT format supplied from the analysis unit 41. The obtained image decoded data is supplied to the image storage unit 43 and temporarily stored there.

The image trimming unit 44 trims an image in a predetermined area at a predetermined timing among the image decoding data stored in the image storage unit 43 based on the control of the analysis unit 41, and decodes the image corresponding to the trimmed image. Supply data to the output control unit 45.

The output control unit 45 outputs (displays) the decoded data of the image of the predetermined area supplied from the image trimming unit 44 to the display.

<Operation of Information Processing Apparatus in Example of Image Reproduction Processing for Trimming an Image>
The image trimming display process of the information processing apparatus 1A will be described with reference to the flowchart of FIG.

In step S <b> 1, the analysis unit 41 acquires image data from the storage unit 17 based on an instruction from the input unit 15. In step S 2, the analysis unit 41 analyzes the metadata stored in the image data, and supplies the image decoding unit 42 with the image coding data in JPEG XT format stored in the read image data.

In step S3, the image decoding unit 42 decodes the image encoded data supplied from the analysis unit 41 to obtain image decoded data. The image decoding data is supplied to the image storage unit 43 and temporarily stored therein.

In step S4, the analysis unit 41 activates an internal timer. In step S5, the analysis unit 41 determines whether or not there is trimming information having time information that matches the timekeeping time of the internal timer among the plurality of trimming information described in the analyzed timer and the measured time of the internal timer. Determine

In step S5, when the analysis unit 41 determines that there is trimming information having time information that matches the clocking time of the internal timer (step S5: YES), trimming information having time information that matches the clocking time of the internal timer The image trimming unit 44 is controlled based on

In step S6, the image trimming unit 44 generates an image of a predetermined area based on the area information linked to the time information among the image decoded data stored in the image storage unit 43 under the control of the analysis unit 41. The image decoding data corresponding to the above is taken out and supplied to the output control unit 45.

In step S7, the output control unit 45 outputs the image decoding data corresponding to the image of the predetermined area supplied from the image trimming unit 44 to the display. Thereafter, the process returns to step S5, and the above-described process is repeated until it is determined that there is no trimming information having time information that matches the time measured by the internal timer.

When it is determined in step S5 that there is no trimming information having time information that matches the time measured by the internal timer (step S5: NO), the image trimming display process shown in FIG. 6 ends.

As described above, according to the information processing apparatus 1A illustrated in FIG. 5, generation of image data having metadata including at least data obtained by encoding an image and trimming information in which time information and area information are combined is generated. Thus, when the display timing that matches the time information described in the metadata is reached, it is possible to trim and display only a predetermined area of the image based on the area information linked to the time information. Since the display timing and management data can be included in the image data, data management becomes simple. Also, the image area to be displayed and the reproduction timing of the image area need only be edited in the information in the metadata, and can be easily changed since it is not necessary to use a specific device or software. The display according to the reproduction elapsed time can be easily performed.

Although not shown, the information processing apparatus 1A further includes an audio data reproduction unit, and can store the audio data in the storage unit 17 in association with the image data. With such a configuration, when displaying the image data, the information processing apparatus 1A can reproduce audio data associated with the image data. For example, when displaying musical score data of a piano, it is possible to simultaneously reproduce voice data of a piano performance which serves as a guide for the musical score. This allows the user to practice the piano performance according to the guide performance. For example, when displaying musical score data of a piano, audio data of a violin performance based on the musical score can also be reproduced simultaneously. Thereby, the user can enjoy the double performance with the violin performance only by performing the piano performance.

Also, in the above, the information processing apparatus 1A may further describe animation information in metadata including at least trimming information in which time information and area information are combined. With such a configuration, when displaying image data, the information processing apparatus 1A can simultaneously display an image based on animation information associated with the image data. For example, when displaying a predetermined area of musical score data of a piano, it is possible to superimpose and display an image of a guiding function of the piano performance of the musical score (an animation which tells the location of the keyboard to be played next). This allows the user to practice piano playing according to the guide function.

<< Example of sound and image reproduction processing (example using lyrics data) >>
FIG. 7 is a diagram showing an example in which the lyric data is displayed in telop in accordance with the reproduction elapsed time of the audio data. As shown in FIG. 7, in the original image data of the image data P11, image encoded data consisting of artwork is stored. In an area of the APP 11 of the image data P11, metadata M11 described in JSON is stored. In the metadata M11, "lyrics": ["in the first line,"{"in the second line,""time": 58 "in the third line," text "in the fourth line : "Oh Kanazawa-", 5th line "}," 6th line "{", 7th line "" time ": 65", 8th line "" text ": "It was snowing today", "}," is described on the 9th line, and "]" is described on the nth line.

"" Lyrics "" is information instructing to use the lyric display function. The information described after "" time "indicates time information, and the information described after" "text" indicates text data. That is, in the metadata M11, telop information in which time information and text data for displaying lyrics are described by the lyrics display function is described, and the information processing apparatus 1 is an image data in which the metadata M11 is stored. When reproducing audio data by generating audio data in which P11 is embedded, the image data P11 embedded in the audio data is acquired, and the metadata (telop information) M11 stored in the acquired image data P11 is read out. Thus, it is possible to sequentially display telops based on the text data linked to the time information according to the elapsed time from the start of reproduction of the audio data.

In the example of FIG. 7, when reproducing the audio data in which the image data P11 in which such metadata M11 is stored is reproduced, the period of 58 seconds has elapsed from the reproduction start time to 65 seconds. "-" Is read out. As a result, as indicated by the end of the arrow A11, the text "Oh Kanazawa ha" is superimposed on the image P12.

Subsequently, “It is snow today” is read out from the time when 65 seconds have elapsed from the reproduction start time until the next time information. As a result, as indicated by the end of the arrow A12, the text "It is snow today" is superimposed on the image P13.

The details of the operation of displaying telops in accordance with the reproduction elapsed time of the audio data as described above will be described later with reference to the flowchart.

<Example of Functional Configuration of Information Processing Apparatus that Executes Audio-Visual Image Reproduction Processing>
As a hardware configuration of an information processing apparatus for implementing the above-described audio / video reproduction processing example, one shown in FIG. 4 can be used, and the description thereof will be omitted. FIG. 8 shows an example of the functional block configuration of the information processing apparatus 1 for carrying out the sound and image reproduction processing example as an information processing apparatus 1B. The information processing apparatus 1B generates metadata, generates image data storing the generated metadata, generates a sound data in which the generated image data is embedded, and generates a sound from the sound data. An audiovisual player (Audiovisual Player) 60 that reproduces an image from image data based on metadata while reproducing.

The data generation device 50 includes an image encoding unit 51, a metadata generation unit 52, a data generation unit 53, and a recording control unit 54.

The image coding unit 51 inputs original image data captured by a camera or original image data created by image processing, and performs image encoding on the input original image data in JPEG XT format. The encoded data is supplied to the data generation unit 53.

The metadata generation unit 52 inputs reproduction control data consisting of telop information in which time information and text data are combined, and generates metadata defined by a box file format of JPEG XT Part 3 that can be described in JSON. The generated metadata is supplied to the data generation unit 53.

The data generation unit 53 generates image data (FIG. 2) storing the encoded data supplied from the image coding unit 51 and the metadata supplied from the metadata generation unit 52. The data generation unit 53 inputs audio data from the outside, embeds the image data in which the metadata is stored in the input audio data, and supplies it to the recording control unit 54.

The recording control unit 54 supplies, to the storage unit 17, the audio data in which the image data having the encoded image data and the metadata is embedded and which is supplied from the data generation unit 53, and controls the recording there.

The audio and video reproduction apparatus 60 includes an analysis unit 61, an image decoding unit 62, a text drawing unit 63, and an output control unit 64.

The analysis unit 61 acquires audio data from the storage unit 17 based on an instruction from the input unit 15, supplies the acquired audio data to the output control unit 64, and the image data embedded in the acquired audio data Is acquired, and the metadata stored in the acquired image data is analyzed. The image encoded data in the JPEG XT format stored in the image data is supplied to the image decoding unit 62 by analysis.

In addition, the analysis unit 61 activates an internal timer (not shown), and the internal timer among the plurality of telop information that is a combination of time information described in the analyzed metadata, time information described in the analyzed metadata, and text data. The text drawing unit 63 is controlled based on the telop information having time information that matches the clocked time of. That is, the analysis unit 61 controls the text drawing unit 63 so that the text data is sequentially imaged at predetermined timing based on the plurality of telop information described in the metadata.

The image decoding unit 62 decodes the encoded image data of JPEG XT format supplied from the analysis unit 61. The decoded image data is supplied to the output control unit 64.

The text drawing unit 63 converts the text data supplied from the analysis unit 61 into image data at a predetermined timing based on the control of the analysis unit 61, and supplies the image data to the output control unit 64.

The output control unit 64 outputs a voice based on the voice data supplied from the analysis unit 61 to a speaker for reproduction, and causes the image data supplied from the image decoding unit 62 to be image data supplied from the text drawing unit 63. Are output (displayed) on the display.

<Operation of Information Processing Device in Example of Sound and Image Reproduction Processing>
The telop display process of the information processing apparatus 1B will be described with reference to the flowchart of FIG.

In step S11, the analysis unit 61 acquires voice data from the storage unit 17 based on an instruction from the input unit 15. In step S12, the analysis unit 61 analyzes metadata of the image data embedded in the audio data. The acquired audio data is supplied to the output control unit 64, and the encoded image data of JPEG XT format stored in the analyzed metadata is supplied to the image decoding unit 52.

In step S 13, the image decoding unit 62 decodes the image coding data in the JPEG XT format supplied from the analysis unit 61 to generate image decoding data, and supplies the image decoding data to the output control unit 64. In step S14, the output control unit 64 outputs the sound based on the sound data to the speaker for reproduction.

In step S15, the analysis unit 61 activates an internal timer. In step S16, the analysis unit 61 determines whether or not there is telop information having time information that matches the timekeeping time of the internal timer among the plurality of telop information described in the analyzed metadata and the timekeeping time of the internal timer. Determine

In step S16, when the analyzing unit 61 determines that there is telop information having time information that matches the clocking time of the internal timer (step S16: YES), telop information having time information that matches the clocking time of the internal timer The text drawing unit 63 is controlled based on

In step S17, the text drawing unit 63 converts the text data linked to the time information into image data based on the control of the analysis unit 61, and supplies the image data to the output control unit 64.

In step S18, the output control unit 64 superimposes the text image data supplied from the text drawing unit 63 on the image data supplied from the image decoding unit 62, and outputs the superimposed image. Thereafter, the process returns to step S16, and the above-described process is repeated until it is determined that there is no telop information having time information that matches the time measured by the internal timer.

When it is determined in step S16 that there is no telop information having time information that matches the time measured by the internal timer (step S16: NO), the telop display process shown in FIG. 9 is ended.

As described above, when reproducing audio data by generating audio data in which image data having metadata including at least encoded data and telop information including time information and text data is embedded is generated. When the display timing matches the time information described in the metadata of the image data embedded in the audio data, the text data linked to the time information is converted into image data, and the obtained text image data is Since the subtitles can be displayed superimposed on the image data, it is possible to easily display an image according to the playback elapsed time of the audio data. In addition, for example, the image data, the audio data, and the text data described above can be managed as one music file, which facilitates the handling of the data. Further, since telop information is stored as text data, editing of telop time information becomes easy.

Further, in the above, in the metadata including at least telop information in which time information and text data are combined, information processing apparatus 1B further includes text color information, font information, information indicating the presence or absence of shading, and background color information. And the like may be described. With such a configuration, the information processing apparatus 1B can display a telop that can be enjoyed visually even from a monotonous telop when displaying a telop.

<< Example of image reproduction processing with falsification detection >>
FIG. 10 is a diagram illustrating an example of image data in which tampering detection data is described in metadata. As shown in FIG. 10, in the original image data of the image data P21, image encoded data in which a photograph is an original image is stored. In the area of the APP 11 of the image data P21, metadata M21 described in JSON is stored. In the metadata M21, a hash value A, a hash value B, and a script are described. The hash value A is a value obtained by executing a script using Seed data as an argument. Seed data is data (parameters) embedded in advance in a predetermined area of the image data P21. The hash value B is a value obtained by executing a script with the program string of the script as an argument. The script is a hash function (program) for calculating a hash value. That is, data for detecting tampering is described in the metadata M21, and the information processing apparatus 1 reads the metadata (falsification detection data) M21 and executes a script to obtain image data P21. It is possible to detect tampering.

Details of the operation for reading out and executing the tampering detection data as described above will be described later with reference to the flowchart.

<An example of functional composition of an information processor which performs an example of image reproduction processing accompanied by falsification detection>
As a hardware configuration of an information processing apparatus that implements an example of image reproduction processing accompanied by tampering detection, one shown in FIG. 4 can be used, and the description thereof will be omitted. FIG. 11 shows an example of functional block configuration of the information processing apparatus 1 for carrying out this example of image reproduction processing as an information processing apparatus 1C. In the configuration shown in FIG. 11, the same components as those in FIG. 5 are denoted by the same reference numerals, and the redundant description will be appropriately omitted. The information processing apparatus 1C generates metadata, generates an image data generation apparatus 30 that generates image data storing the generated metadata, and detects whether the image data storing the metadata has been tampered with or not. The image data tampering detection apparatus 70 reproduces image data when the image data is not tampered with.

The metadata generation unit 32 inputs reproduction control data including a hash value A, a hash value B, and a script for detecting tampering, and specifies metadata defined by a box file format of JPEG XT Part 3 that can be described in JSON. Generate The generated metadata is supplied to the image data generation unit 33.

The image data tampering detection device 70 includes an analysis unit 71, a comparison unit 72, a tampering detection unit 73, an image decoding unit 74, and an output control unit 75.

The analysis unit 71 acquires image data from the storage unit 17 based on an instruction from the input unit 15, analyzes metadata stored in the acquired image data, and detects tampering detection data described in the metadata ( The hash value A, the hash value B, and the script are supplied to the comparison unit 72, and the encoded data of the JPEG XT image format stored in the image data is supplied to the image decoding unit 74. The analysis unit 71 reads the Seed data embedded in the image data by a predetermined method, and also supplies the same to the comparison unit 72.

The comparing unit 72 calculates the hash value A ′ based on the script and the Seed data included in the tampering detection data supplied from the analyzing unit 71, and is described in the calculated hash value A ′ and metadata (tampering detection data) And the hash value A. Further, the comparison unit 72 calculates the hash value B ′ based on the program character string of the script included in the tampering detection data, and the calculated hash value B ′ and the hash value B described in the metadata (tampering detection data) Compare The comparison result is supplied to the tampering detection unit 73.

The falsification detection unit 73 detects whether the image data is falsified or not based on the two comparison results of the comparison unit 72, and the image data is not falsified (both the hash value A and the hash value B are correct). If it is determined that the image data is tampered (if either or both of the hash value A and the hash value B is incorrect) is detected, the image is decoded. The decryption process of the decryption unit 74 is prohibited.

The image decoding unit 74 decodes the image coding data in the JPEG XT format supplied from the analysis unit 71 when the execution of the decoding process is instructed based on the control of the tampering detection unit 73, and performs image decoding. The data is supplied to the output control unit 75 as data. When the decoding process is prohibited based on the control of the tampering detection unit 73, the image decoding unit 74 does not decode the JPEG XT image encoded data supplied from the analysis unit 71, but the output control unit Supply to 75.

The output control unit 75 outputs (displays) the data supplied from the image decoding unit 74 to a display.

<Operation of Information Processing Apparatus in Example of Image Reproduction Processing with Tamper Detection>
The falsification detection process of the information processing apparatus 1C in the example of the image reproduction process with the falsification detection having the configuration as described above will be described with reference to the flowchart of FIG.

In step S <b> 21, the analysis unit 71 acquires image data from the storage unit 17 based on an instruction from the input unit 15. In step S22, the analysis unit 71 analyzes the metadata stored in the image data, and supplies the tampering detection data (hash value A, hash value B, and script) described in the metadata to the comparison unit 72. At the same time, the encoded image data in the JPEG XT format stored in the read out image data is supplied to the image decoding unit 74. Further, the analysis unit 71 reads Seed data embedded in the image data by a predetermined method, and supplies the Seed data to the comparison unit 72.

In step S23, the comparison unit 72 executes a script described in metadata (falsification detection data) using the Seed data supplied from the analysis unit 71 as an argument, and calculates a hash value A ′. In step S24, the comparison unit 72 compares the hash value A described in the metadata (tamper detection data) with the calculated hash value A '.

In step S25, the comparison unit 72 executes the script with the program character string of the script described in the metadata (tamper detection data) as an argument, and calculates the hash value B '. In step S26, the comparison unit 72 compares the hash value B described in the metadata (tamper detection data) with the calculated hash value B '. The comparison results of step S24 and step S26 are supplied to the tampering detection unit 73.

In step S27, the falsification detection unit 73 determines whether or not the image data has been falsified from the two comparison results, and if any one or both comparison results are different, it is determined that the image data is falsified. (Step S27: YES), the decoding process of the image decoding unit 74 is prohibited in step S28. Accordingly, the image decoding unit 74 supplies the image control data in the JPEG XT format supplied from the analysis unit 71 to the output control unit 75 without decoding. The output control unit 75 outputs (displays) the data supplied from the image decoding unit 74 to a display.

In step S27, if the two comparison results are identical to each other, the tampering detection unit 73 determines that the image data is not tampered (step S27: NO), and in step S29, the decryption processing of the image decryption unit 74 Run The image decoding unit 74 decodes the image coding data in the JPEG XT format supplied from the analysis unit 71, and supplies the decoded data as image decoding data to the output control unit 75. The output control unit 75 outputs (displays) the image decoded data supplied from the image decoding unit 74 to a display.

As described above, by generating image data having metadata including at least encoded data and tampering detection data, by reading the tampering detection data described in the metadata and executing the script, Whether or not image data has been tampered with can be easily detected. Then, when it is determined that the image data is falsified, the decryption processing can be prohibited. As a result, in comparison with the conventional tampering detection method using a hash value, a script for calculating the hash value is sent together with the image data, so that tampering detection itself can be easily performed. When attempting to falsify, since it is possible to change the hash value calculation method for each image data, it is difficult to uniquely falsify, and it becomes impossible to establish the falsification method. In addition, falsification can be easily verified for image data generated by data providers other than the user.

In the above, the Seed data is assumed to be embedded in a predetermined area of the image data P21 in advance. However, the present invention is not limited to this. You may make it store.

Also, although the hash value B 'calculated in step S25 is obtained by executing the script using the program character string of the script as an argument, the script is executed using the program character string of the script and the Seed data as arguments. May be obtained by

<< Modification >>
<Modification 1>
The

information processing apparatuses

1A, 1B, and 1C may generate image data having image encoded data and metadata including a character string such as a location name to be selectively displayed according to position information on a map or a setting language. Thus, when displaying the image based on the image data, the

information processing apparatuses

1A, 1B, and 1C set the language set in the

information processing apparatuses

1A, 1B, and 1C among the metadata stored in the image data. It is possible to acquire a string attached, and to display the acquired string superimposed on a predetermined position.

FIG. 13 is a diagram showing an example of use of image data having metadata including a character string such as a place name to be selectively displayed according to a position on a map and a set language, in addition to image coded data.

As shown in FIG. 13, in the image data P <b> 31, image encoded data in which an original image of a Japanese map is encoded is stored in the original image data. In an area of the APP 11 of the image data P31, metadata M31 described in JSON is stored. In the metadata M31, the first line is "" point ": {", the second line is "" Sapporo ": {", the third line is "" x ": 560,", the fourth line "" Y ": 80,", 5th line "" name ": {", 6th line "" en-US ":" Sapporo "," 7th line "" JP ":" Sapporo "", line 8 "}", line 9 "}," line 10 "" Tokyo ": {", line 11 "x": 600 "," Y ": 600," on the 12th line, "" name ": {" on the 13th line, "" en-US ":" Tokyo ", on the 14th line, 15 Line "" ja-JP ":" Tokyo "", line 16 "}", line 17 "}," line 18 "" Naha ": {, line 19 To "" x ": 200," to line 20, "" y ": 1100," to line 21, "" name ": {", to line 22, "" en-US ":" Naha “,” “23”, “'ja-JP”: “Naha”, line 24'} ', line 25'}, ', line 26'} ” Description It is done.

"" Point "" is information instructing to use a function for pointing to a specific position on the screen. The information described after "" Sapporo "", "" Tokyo "", "" Naha "", "" x "", "" y "" is the coordinate information of each place name (position) on the map It shows. The information described after "" name "indicates the language, and the information described after" "en-US" "indicates the name of the place to be displayed when the language is set. The information described after "JP" indicates a place name (character string) to be displayed when the language is set. That is, in the metadata M31, place name information including a combination of coordinate information for displaying a place name in a predetermined language, a set language and a place name is described by a function indicating a specific position on the screen, and the

information processing apparatus

1A , 1B, and 1C, by displaying the metadata (place name information) when displaying the image data, the place name corresponding to the predetermined language set in the terminal may be superimposed and displayed at the predetermined position. it can.

In the example of FIG. 13, when displaying an image based on the image data P31 in which such metadata M31 is stored, when the language of the

information processing apparatuses

1A, 1B, and 1C is set to Japanese, The Japanese notation (Sapporo, Tokyo, Naha) of the place name following "" ja-JP "" of the metadata M31 is read out. Thereby, the

information processing apparatuses

1A, 1B, and 1C superimpose a geographical name in Japanese on a predetermined position on the Japanese map display P32 as indicated by the end of the arrow A31. When the language of the

information processing apparatuses

1A, 1B, and 1C is set to English, the place names (Sapporo, Tokyo, Naha) following the "" en-US "of the metadata M31 are read out. Thereby, the

information processing apparatuses

1A, 1B, and 1C superimpose a geographical name in English on a predetermined position on the Japanese map display P33 as indicated by the end of the arrow A32.

As described above, according to the first modification, this image is generated by generating image encoded data and metadata including a character string such as a location name to be selectively displayed according to position information on a map and a setting language. When displaying an image based on data, the place name linked to the language set in the

information processing devices

1A, 1B, 1C may be superimposed and displayed at a predetermined position based on the place name information described in the metadata. it can.

<Modification 2>
The

information processing apparatuses

1A, 1B, and 1C may generate image data including encoded image data and metadata including a character string such as an address of a shooting location of the image and a facility name. As a result, when displaying an image, the

information processing apparatuses

1A, 1B, and 1C can acquire the character string of the metadata stored in the image data, and superimpose the acquired character string on the image. The

information processing apparatuses

1A, 1B, and 1C can also perform image search using a character string of metadata stored in image data as a search key.

FIG. 14 is a diagram showing a usage example of image data having metadata including a character string such as an address of a photographing place of an image and a facility name in addition to image coded data.

As shown in FIG. 14, in the original image data of the image data P41, a picture taken in Okinawa is encoded and stored as image encoded data. In an area of the APP 11 of the image data P41, metadata M41 described in JSON is stored. In the metadata M41, the first line is ““ location ”: {”, the second line is ““ address ”:“ Shuri Kinjocho 1-chome 2 Naha, Okinawa Prefecture ”, the third line is“ } Is described.

““ Location ”is information instructing to use a function that can specify the current location and cooperate with the service. The information described after "" address "indicates the address of the shooting location. That is, information indicating the address of the shooting location is described in the metadata M41, and the

information processing apparatuses

1A, 1B, and 1C describe the metadata by reading out the metadata when displaying the image. Information indicating the address of the photographed place can be superimposed and displayed.

In the example of FIG. 14, when displaying an image based on the image data P41 in which such metadata M41 is stored, a character string following ““ address ”” of the metadata M41 (Shuri Kinjo, Naha City, Okinawa Prefecture Town 1-2 is read out. Thus, the

information processing apparatuses

1A, 1B, and 1C superimpose an address, which is a shooting location, on the image display P42 as indicated by the end of the arrow A41.

In addition, the

information processing apparatuses

1A, 1B, and 1C connect the image data P41 storing such metadata M41 to a database (DB) connected via a network (not shown) as indicated by the point of arrow A42. ) Can be supplied and managed there. Accordingly, when the

information processing apparatuses

1A, 1B, and 1C perform image search using "Okinawa" as a search key, image data including "Okinawa" in the metadata M41 among a plurality of image data managed by the database 101 You can search for Then, as indicated by the end of the arrow A43, the

information processing apparatuses

1A, 1B, and 1C can display the image list P43 including thumbnail images of a plurality of searched image data.

As described above, according to the second modification, when the image is displayed by generating the image data including the image encoded data and the metadata including the character string such as the address of the shooting location and the facility name, the image is displayed It is possible to superimpose the address of the shooting location stored in the data and the facility name. In addition, by causing the generated image data to be managed in a database, when a search key is designated, it is possible to easily search for image data in which metadata including the search key is stored.

<Modification 3>
The

information processing apparatuses

1A, 1B, and 1C may generate image data having metadata including text data indicating the content of the image coded data in addition to the image coded data. Thus, when displaying an image based on image data, the

information processing apparatuses

1A, 1B, and 1C acquire text data of metadata stored in the image data, and the acquired text data is voiced by the text-to-speech function Can be converted and played back.

FIG. 15 is a diagram showing an example of use of image data having metadata including text data indicating contents of the image coded data in addition to the image coded data.

As shown in FIG. 15, in the original image data of the image data P51, data of a navigation image displayed by the car navigation system is stored as image encoded data. In the area of the APP 11 of the image data P51, metadata M51 described in JSON is stored. In the metadata M51, "tts": {in the first line, "lang": "ja-JP", in the second line, "text" in the third and fourth lines. "Tokushima Honcho, in traffic. It takes about 20 minutes to Tokushima Honcho. "", "}" Is described in the fifth line.

"" Tts "" is information instructing to use a text-to-speech function called a tts (text-to speech) system. The information described after "" lang "" indicates the language specified when using the text-to-speech function. The information described after "" text "indicates text data read out when using the tts system. That is, text data for reading out in Japanese by the text-to-speech function is described in the metadata M51, and the

information processing apparatuses

1A, 1B, and 1C read this metadata when displaying the image data. The voice based on the text data described in the metadata can be reproduced.

In the example of FIG. 15, when displaying an image based on the image data P51 in which such metadata M51 is stored, text data following "" text "" of the metadata M51 (in Tokushima Honcho, traffic jam). It takes about 20 minutes to Tokushima Honcho.) Is read out. Thus, as indicated by the end of arrow A51, the

information processing apparatuses

1A, 1B, and 1C cause the image P52 to be displayed, and the text-to-speech function is used to reproduce the speech based on the text as shown in the balloon S51. (Read aloud).

As described above, according to the third modification, the image is displayed based on the image data by generating the image data including the encoded image data and the metadata including the text data indicating the content of the encoded image data. In this case, sound based on text data stored in the image data can be reproduced.

<Modification 4>
The

information processing apparatuses

1A, 1B, and 1C may generate image data including image coded data encrypted by a public key and metadata storing the public key. Thus, when displaying the image, the

information processing apparatuses

1A, 1B, and 1C acquire the public key of the metadata stored in the image data, and only when the image code has the secret key linked to the acquired public key, the image code Can be decoded and displayed.

FIG. 16 is a diagram showing an example of use of image data including image encoded data encrypted by a public key and metadata storing the public key.

As shown in FIG. 16, in the original image data of the image data P61, image encoded data encrypted with a public key is stored. The metadata M61 described in JSON is stored in the area of the APP 11 of the image data P61. Also, in the area of APP1 (Exif) of the image data P61, a thumbnail image P61a as it is in plain text is also stored. In the metadata M61, the first line is ““ encrypt ”: {”, the second line is ““ OID ”:“ 1.2.840.10045.2.1 ”,” the third line is “public_key”: “ 04FC 2 E 8 B 81 DD ... ”” and “}” are described in the fourth line.

““ Encrypt ”” is information instructing to use the encryption function. The information described after "" OID "" indicates information identifying an object, and the information described after "" public_key "" indicates a public key. That is, the public key used for the encryption of the image encoded data is described in the metadata M61, and the

information processing apparatuses

1A, 1B, and 1C read this metadata when displaying the image. The image encoded data in the image data P61 can be decoded and displayed only when there is a secret key linked to the public key described in the metadata.

In the example of FIG. 16, when displaying an image based on the image data P61 in which such metadata M61 is stored, the public key (04FC2E8B81DD ...) following "" public_key "of the metadata M61 is read out. . Accordingly, when the

information processing apparatuses

1A, 1B, and 1C have the secret key 111 linked to the read public key, the

information processing apparatuses

1A, 1B, and 1C use the secret key 111 to decrypt (decode) the image coded data in the image data P61. The image P62 is displayed as indicated by the tip of the arrow A61.

Further, when the

information processing apparatuses

1A, 1B, and 1C do not have the secret key 111 linked to the public key read from the metadata M61, the

information processing apparatuses

1A, 1B, and 1C can decode the image encoded data in the image data P61. Then, as indicated by the tip of the arrow A62, the data P63 as it is encrypted is displayed.

As described above, according to the fourth modification, when the image is displayed by generating the image data having the image encoded data encrypted by the public key and the metadata storing the public key, Only in the case of having a secret key linked to the public key of the metadata stored in the data, the encrypted image encoded data can be decoded and displayed.

<Modification 5>
The

information processing apparatuses

1A, 1B, and 1C are image data having encoded image data, metadata including object (facility etc.) information identified based on the shooting position and direction of the original image, and the angle of view and map information. May be generated. Accordingly, the

information processing apparatuses

1A, 1B, and 1C can perform image search using object information of metadata stored in image data as a search key.

FIG. 17 and FIG. 18 are diagrams showing an example of use of image data having image coded data, metadata including object information identified based on shooting position and direction of the original image, angle of view and map information. is there.

As shown in FIG. 17, in the original image data of the image data P71 and the image data P72, an image of the Tokyo Tower taken at latitude 35.65851 and longitude 139.745433 is encoded and stored as image encoded data. ing. In the area of APP1 (Exif) of the image data P71, Exif information of latitude 35.6591, longitude 139.741969, and azimuth N 90 ° is stored. In the area of APP1 (Exif) of the image data P72, Exif information of latitude 35.65851, longitude 139.745433, and azimuth N 315 ° is stored.

The operation unit 112 of the

information processing apparatus

1A, 1B, 1C inputs the image data P71, refers to the Map database 111 connected via the network (not shown), and relates to Exif information stored in the image data P71. Object information to be acquired. The calculation unit 112 generates metadata M71 described in JSON based on the object information acquired from the Map database 111, as indicated by the tip of the arrow A71.

The operation unit 113 of the

information processing apparatuses

1A, 1B, and 1C receives the image data P72, refers to the Map database 111 connected via a network (not shown), and relates to Exif information stored in the image data P72. Object information to be acquired. The calculation unit 113 generates the metadata M72 described in JSON based on the object information acquired from the Map database 111 as indicated by the tip of the arrow A72.

In the metadata M71 and M72, "objects": ["in the first line," {"in the second line," "name": "Tokyo Tower", in the third line, n- "}" Is described in the first line, and "]" is described in the n-th line. The information described after "" objects "" indicates object information. That is, object information related to the shooting position is described in the metadata M71 and M72.

The

information processing apparatuses

1A, 1B, and 1C store the generated metadata M71 in the area of the APP 11 of the image data P71, and store the generated metadata M72 in the area of the APP 11 of the image data P72.

The

information processing apparatuses

1A, 1B, and 1C are configured such that the image data P71 in which the metadata M71 is stored, and the image data P72 in which the metadata M72 is stored, as indicated by the arrow A81 in FIG. Can be supplied to and managed by the object database 121 connected thereto. As a result, when the

information processing apparatuses

1A, 1B, and 1C perform image search using "Tokyo Tower" as a search key, "Tokyo Tower" is selected as metadata M71, M72 from among a plurality of image data managed by the database 121. The image data P71 and P72 including the image data can be searched. The

information processing apparatuses

1A, 1B, and 1C can display an image list P81 including thumbnail images of a plurality of searched image data, as indicated by the end of the arrow A82.

As described above, according to the fifth modification, image data having metadata including object data identified based on encoded data, shooting position and direction of image data, and angle of view and map information is generated. By designating the search key by managing the generated image data in the database, it is possible to easily search the image data in which the metadata including the search key is stored.

As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, A various change is possible in the range which does not deviate from the summary of this invention. For example, the metadata described in the image reproduction processing example for trimming an image describes time information and area information, and the metadata described in the audio image reproduction processing example describes time information and text data Although the metadata described in the example of the image reproduction processing accompanied by the falsification detection describes the falsification detection data, it is also possible to generate, for example, metadata describing time information, area information, and text information. It is possible. With such a configuration, when the display timing that matches the time information described in the metadata is reached, only the predetermined area of the image data is trimmed and displayed based on the area information linked to the time information. Furthermore, it is possible to image the text data linked to the time information and superimpose the imaged text image on the image data for telop display.

Also, time information, area information, metadata describing tamper detection data, time information, text data, metadata describing tamper detection data, time information, area information, text information, and tamper detection data are described It is also possible to generate metadata. With such a configuration, only when the image data is detected not to be tampered with is detected according to the tampering detection data described in the metadata, the predetermined area of the image data is trimmed and displayed at a predetermined display timing. It is possible to display the image data in telop display at a predetermined timing, or to display only a predetermined area of the image data in a trimmed display at a predetermined timing and to display a telop display on the image data.

In the modified example, object information, shooting position information, and the like are described in the metadata, but the present invention is not limited to this. For example, the face of Mr. Yamada is located at x-coordinate 300 and y-coordinate 200 in image data. It is also possible to describe information indicating that Mr. Suzuki's face is in the x-coordinate 500 and the y-coordinate 300. With such a configuration, it is possible to extract Mr. Yamada's image from among a plurality of image data and to search for Mr. Yamada's face (position) in the extracted image.

In addition, image data detected by performing predetermined image recognition processing on image data captured by a drive recorder, a security camera or the like, data such as date and time, location, status, etc. may be described in metadata. With such a configuration, it is possible to extract an image in a dangerous situation by image analysis from among a plurality of image data.

In the above, the image data generation device 30, the image reproduction device 40, the audio image data generation device 50, the audio image reproduction device 60, and the image data tampering detection device 70 are provided in the same

information processing devices

1A, 1B, 1C. However, it is also possible to provide those functions as separate devices.

Also, the series of processes described above can be performed by hardware or software. When a series of processes are executed by software, various functions are executed by installing a computer in which a program configuring the software is incorporated in dedicated hardware or various programs. Can be installed, for example, on a general-purpose personal computer from the program storage medium.

Note that the program executed by the computer may be a program that performs processing in chronological order according to the order described in this specification, in parallel, or when necessary, such as when a call is made. It may be a program to be processed.

1, 1A, 1B, 1C: information processing apparatus, 16: output unit, 17: storage unit, 30: image data generation apparatus, 31: image coding unit, 32: metadata generation unit, 33: image data generation unit, 34: recording control unit 40: image reproduction device 41: analysis unit 42: image decoding unit 43: image storage unit 44: image trimming unit 45: output control unit 50: audio image data generation device 51 ... image encoding unit, 52 ... metadata generation unit, 53 ... data generation unit, 54 ... recording control unit, 60 ... audio image reproduction device, 61 ... analysis unit, 62 ... image decoding unit, 63 ... text drawing unit, 64 ... output control unit 70 ... image data tampering detection device 71 ... analysis unit 72 ... comparison unit 73 ... tampering detection unit 74 ... image decoding unit 75 ... output control unit

Claims

The audio image reproduction unit is configured to reproduce audio and an image based on audio data in which image data having data obtained by encoding an image and metadata relating to the data is embedded.
The metadata includes at least telop information in which text data and time information are combined,
The audio and video reproduction unit reproduces the audio based on the audio data, and according to the telop information of the image data, a telop image based on the text data according to an elapsed time from the start of the audio reproduction based on the audio data An audio and video reproduction apparatus characterized by superimposing and displaying it on a reproduced image.
The telop information further includes text control information including at least one of color information of the text data, font information, information indicating presence or absence of shading, and background color information.
The data reproduction apparatus according to claim 1, wherein the audio and video reproduction unit displays a telop image based on the text data according to the text control information.
The data reproduction apparatus according to claim 1, wherein the telop image is lyric data, and the image data is generated from original image data consisting of artwork.
The audio image reproduction step of reproducing audio and an image based on audio data in which image data having data obtained by encoding an image and metadata relating to the data is embedded,
The metadata includes at least telop information in which text data and time information are combined,
The sound and image reproduction step includes
Playing an audio based on the audio data;
Displaying a telop image based on the text data in accordance with the elapsed time from the start of reproduction of the audio data according to the telop information of the image data, and superimposing it on the reproduction image.
Encoded image data,
Metadata including at least text data for superimposing on the image data, and telop information in which time information indicating timing of superimposing the text of the text data on the image data is combined;
The data structure of image data characterized by having.