WO2019039194A1 - Dispositif de reproduction d'image vocale, procédé de reproduction d'image vocale et structure des données de données d'image - Google Patents

Dispositif de reproduction d'image vocale, procédé de reproduction d'image vocale et structure des données de données d'image Download PDF

Info

Publication number
WO2019039194A1
WO2019039194A1 PCT/JP2018/028373 JP2018028373W WO2019039194A1 WO 2019039194 A1 WO2019039194 A1 WO 2019039194A1 JP 2018028373 W JP2018028373 W JP 2018028373W WO 2019039194 A1 WO2019039194 A1 WO 2019039194A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
image
information
image data
metadata
Prior art date
Application number
PCT/JP2018/028373
Other languages
English (en)
Japanese (ja)
Inventor
裕生 渡邉
Original Assignee
株式会社Jvcケンウッド
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Jvcケンウッド filed Critical 株式会社Jvcケンウッド
Publication of WO2019039194A1 publication Critical patent/WO2019039194A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • H04N5/93Regeneration of the television signal or of selected parts thereof

Definitions

  • the present invention relates to an audio and video reproduction apparatus, an audio and video reproduction method, and a data structure of image data.
  • Patent Document 1 Patent Document 2
  • Patent Document 3 various techniques involved in the reproduction
  • Patent Document 1 can display moving image content and stream AR information in a superimposed manner (Paragraphs 0062 to 0065 of Patent Document 1).
  • the timing correction system disclosed in Patent Document 2 can correct the timing of superimposing and displaying a comment on a video in accordance with the video (Paragraph 0033 of Patent Document 2).
  • the present invention solves at least one of the above-mentioned problems, and an audio and video reproduction apparatus, an audio and video reproduction method, and data of image data capable of easily displaying an image according to the reproduction elapsed time of audio data. Intended to provide a structure.
  • an audio and video reproduction apparatus reproduces audio data in which image data including data obtained by encoding an image and metadata relating to the data is embedded.
  • the metadata includes at least telop information in which text data and time information are combined, and the audio image reproducing unit reproduces the audio based on the audio data, and based on the audio data according to the telop information of the image data It is characterized in that a telop image based on text data according to an elapsed time from the start of reproduction of voice is superimposed on an image based on image data.
  • the telop information further includes text control information including at least one of color information of text data, font information, information indicating presence or absence of shading, and background color information.
  • the audio image reproducing unit displays a telop image based on text data in accordance with the text control information.
  • the text data is lyric data
  • the data obtained by encoding the image is obtained by encoding the original image data made of artwork.
  • the audio and video reproduction method of the present invention has a data reproduction step of reproducing audio data in which image data having encoded data and metadata is embedded, and the metadata includes text data and time information.
  • the data reproduction step includes the step of reproducing the audio data, the step of reproducing the audio data, and the telop based on the text data according to the elapsed time from the start of the reproduction of the audio data according to the telop information of the image data. And displaying the image superimposed on the image data.
  • the data structure of the image data according to the present invention is text data for displaying a text superimposed on the image in addition to the data obtained by encoding the image, and time information indicating timing of superimposing the text of the text data on the image And metadata having at least telop information in combination with the above.
  • an audio and video reproduction apparatus an audio and video reproduction method, and a data structure of image data capable of easily displaying an image according to the reproduction elapsed time of audio data.
  • FIG. 7 is a diagram showing an example of a functional block configuration of an information processing apparatus for performing an example of image reproduction processing accompanied by tampering detection.
  • 5 is a flowchart showing tampering detection processing in image reproduction processing. It is a figure which shows the process example in case an original image is a map image and metadata contain character strings, such as a place name selected and displayed according to the position on the map, and a setting language.
  • FIG. 6 is a diagram showing an example of processing when the original image is a landscape picture and the metadata includes object information such as position information of a building or the like in the picture. It is a figure which shows the process example in case an original image is a landscape photography and metadata contain object information, such as positional information on a building etc. in the photography.
  • the audio and video reproduction apparatus, the information processing apparatus, the audio and video reproduction method, and the data structure of image data according to the present invention will be described below with reference to FIGS. 1 to 6 and FIGS.
  • the audio image reproduction processing according to the embodiment of the present invention will be described with reference to FIGS. 7 to 9.
  • the image reproduction processing in FIGS. 1 to 6 and FIGS. 10 to 18 can be replaced with or combined with the image reproduction processing in the sound and image reproduction processing described with reference to FIGS. 7 to 9.
  • the audio and video reproduction apparatus, the information processing apparatus, the audio and video reproduction method, and the data structure of the image data according to the present invention are not limited to the embodiments exemplified herein. The description will be made in the following order. 1. Outline of generation process of image data storing metadata Example of image reproduction processing Example of sound and image reproduction processing Another Example of Image Reproduction Process Modified example
  • FIG. 1 is a diagram showing an outline of generation processing of image data storing metadata.
  • the information processing apparatus 1 of the present embodiment is, for example, an apparatus such as a notebook computer or a desktop computer.
  • the information processing apparatus 1 generates metadata, and functions as an image data generating apparatus that generates image data storing the generated metadata, and an image capable of reproducing an image from the image data storing the metadata It has a function as a playback device. Therefore, in the information processing apparatus 1, a program for functioning as an image data generation apparatus and a program for functioning as an image reproduction apparatus are installed in advance. However, the information processing apparatus 1 may have only the function as the image data generation apparatus or any one of the functions of the image reproduction apparatus.
  • the information processing apparatus 1 inputs original image data captured by a camera or original image data created by image processing (including so-called artwork that is data created by image processing software), and Input playback control data of image data.
  • the reproduction control data is, for example, data consisting of trimming information in which time information and area information are combined.
  • the area information is information for specifying an area in the original image data, and is, for example, information including upper left coordinates, width, and height, or information including upper left coordinates and lower right coordinates.
  • the time information is information indicating an elapsed time (elapsed time) from the start of reproduction of the original image data.
  • the information processing apparatus 1 performs a predetermined encoding process on the input original image data, and generates metadata from the input reproduction control data, and has the encoded data and the generated metadata. Generate image data.
  • FIG. 2 is a view showing an example of the format of image data.
  • the image data P includes SOI (Start of Image), APP1 (Application marker segment 1),... APP11 (Application marker segment 11), original image data, and EOI (End of Image). It consists of areas.
  • the image data P of the present embodiment is defined, for example, by a box file format of JPEG XT Part 3 which is an extension function of the conventional JPEG (Joint Photographic Experts Group) standard, and an extensible box-based which can be freely described File format is specified.
  • the SOI is a marker at the top of the JPEG file and representing the start point of the JPEG file. By reading this SOI, the JPEG file is identified.
  • APP1 stores attached information (Exif: Exchangeable image file format) for the image.
  • the APP 11 stores metadata defined by the box file format of JPEG XT Part 3 described in JSON (JavaScript Object Notation). More specifically, in APP 11, the length of the application marker segment and a plurality of box data are stored, and in each box data, the box length (Box Length), box type (Box Type), metadata Stores the type (Metadata type), schema ID (Schema ID), and metadata.
  • the metadata type is MIME
  • the schema ID is APP / JSON
  • the metadata is JSON is stored in the box data of JUMBF (0).
  • the box data of JUMBF (1) data having a metadata type of Vender, a schema ID of Vender / XXX, and metadata of XXX data is stored.
  • compressed image coded data in JPEG format is stored.
  • EOI is a marker that represents the end of the JPEG file.
  • FIG. 3 is a diagram showing an example of trimming the score according to the playback elapsed time.
  • image encoded data consisting of a score of 12 bars is stored.
  • the metadata M1 described in JSON is stored in the area of the APP 11 of the image data P1.
  • the first line is "" clip “: [”, the second line is “ ⁇ ”, the third line is “" time “: 0,", the fourth line is “"”:10,” 5th line “" top “: 60,” 6th line “" width “: 400,” 7th line “" height “: 100, 8th line
  • the eyes “ ⁇ ,” line 9 “ ⁇ ”, line 10 "” time “: 16,", line 11 “" left “: 10, line 12”” top “: 160”, “13””line”:”400", line 14 "" height ": 100”, line 15 “ ⁇ ,”, line n " ] Is described.
  • ““ Clip ” is information instructing to use the trimming function (clip function).
  • the information described after “" time indicates time information
  • the information described after" “left” “,” “top”, and “” width indicates area information. That is, trimming information in which time information and area information for trimming a predetermined position of an image are trimmed by the trimming function is described in the metadata M1, and the information processing apparatus 1 uses the metadata (trimming information). ) By reading out M1, it is possible to trim and sequentially display a predetermined area based on the area information linked to the time information according to the elapsed time from the start of reproduction of the image data P1.
  • the height from the position of the left 10 pixels and the upper 60 pixels from the display start time to the first 16 seconds is An area 100 pixels wide and 400 pixels wide is trimmed.
  • the area P2 of the first four bars is trimmed and displayed as indicated by the end of the arrow A1.
  • an area 100 pixels high and 400 pixels wide is trimmed from the position of 10 pixels on the left and 160 pixels on from the left until 16 seconds after the display start time until 32 seconds.
  • the area P3 of the next four bars is trimmed and displayed.
  • FIG. 4 is a block diagram showing an example of the hardware configuration of the information processing apparatus 1.
  • the information processing apparatus 1 includes a central processing unit (CPU) 11, a read only memory (ROM) 12, a random access memory (RAM) 13, a bus 14, an input unit 15, an output unit 16, a storage unit 17, and a communication unit 18.
  • CPU central processing unit
  • ROM read only memory
  • RAM random access memory
  • bus 14 an input unit 15, an output unit 16, a storage unit 17, and a communication unit 18.
  • the CPU 11, the ROM 12 and the RAM 13 are mutually connected by a bus 14.
  • An input unit 15, an output unit 16, a storage unit 17, and a communication unit 18 are also connected to the bus 14.
  • the input unit 15 includes an input device such as a keyboard and a mouse, and supplies various information to the CPU 11 via the bus 14.
  • the output unit 16 is composed of an output device such as a display or a speaker, and displays an image or reproduces an audio according to an instruction of the CPU 11.
  • the storage unit 17 is configured of a hard disk, a non-volatile memory, and the like.
  • the storage unit 17 stores various data such as image data in which metadata is stored, in addition to the program executed by the CPU 11.
  • the communication unit 18 is configured by a network interface or the like, and communicates with an external device (not shown) via wireless or wired communication.
  • FIG. 5 shows an example of functional block configuration of the information processing apparatus 1 for carrying out an example of image reproduction processing for trimming an image as an information processing apparatus 1A.
  • the information processing apparatus 1A includes an image data generating apparatus 30 that generates metadata and generates image data storing the generated metadata, and an image reproducing apparatus 40 that reproduces an image based on the metadata. .
  • the image data generation device 30 includes an image encoding unit 31, a metadata generation unit 32, an image data generation unit 33, and a recording control unit 34.
  • the image encoding unit 31 inputs original image data captured by a camera or original image data created by image processing, and encodes the input original image data in JPEG XT format.
  • the obtained image coded data is supplied to the image data generation unit 33.
  • the metadata generation unit 32 inputs reproduction control data composed of trimming information in which time information and area information are combined, and generates metadata defined by a box file format of JPEG XT Part 3 that can be described in JSON.
  • the generated metadata is supplied to the image data generation unit 33.
  • the image data generation unit 33 generates image data (FIG. 2) in which the image coded data supplied from the image coding unit 31 and the metadata supplied from the metadata generation unit 32 are stored. The generated image data is supplied to the recording control unit 34.
  • the recording control unit 34 supplies the image encoded data and the image data having the metadata supplied from the image data generation unit 33 to the storage unit 17 and controls the recording there.
  • the image reproduction device 40 includes an analysis unit 41, an image decoding unit 42, an image storage unit 43, an image trimming unit 44, and an output control unit 45.
  • the analysis unit 41 acquires image data from the storage unit 17 based on an instruction from the input unit 15, analyzes the metadata stored in the acquired image data, and stores the JPEG XT stored in the image data.
  • the encoded image data in the format is supplied to the image decoding unit 42.
  • the analysis unit 41 starts an internal timer (not shown), and among the plurality of trimming information in which time information of the internal timer and time information described in the analyzed metadata and area information are combined, the internal timer performs time measurement.
  • the image trimming unit 44 is controlled based on trimming information having time information that matches the time.
  • the analyzing unit 41 sequentially trims an image of a predetermined area among the images represented by the image data stored in the image storage unit 43 at a predetermined timing.
  • the image trimming unit 44 is controlled so as to
  • the image decoding unit 42 decodes the image coding data in the JPEG XT format supplied from the analysis unit 41.
  • the obtained image decoded data is supplied to the image storage unit 43 and temporarily stored there.
  • the image trimming unit 44 trims an image in a predetermined area at a predetermined timing among the image decoding data stored in the image storage unit 43 based on the control of the analysis unit 41, and decodes the image corresponding to the trimmed image. Supply data to the output control unit 45.
  • the output control unit 45 outputs (displays) the decoded data of the image of the predetermined area supplied from the image trimming unit 44 to the display.
  • step S ⁇ b> 1 the analysis unit 41 acquires image data from the storage unit 17 based on an instruction from the input unit 15.
  • step S 2 the analysis unit 41 analyzes the metadata stored in the image data, and supplies the image decoding unit 42 with the image coding data in JPEG XT format stored in the read image data.
  • step S3 the image decoding unit 42 decodes the image encoded data supplied from the analysis unit 41 to obtain image decoded data.
  • the image decoding data is supplied to the image storage unit 43 and temporarily stored therein.
  • step S4 the analysis unit 41 activates an internal timer.
  • step S5 the analysis unit 41 determines whether or not there is trimming information having time information that matches the timekeeping time of the internal timer among the plurality of trimming information described in the analyzed timer and the measured time of the internal timer. Determine
  • step S5 when the analysis unit 41 determines that there is trimming information having time information that matches the clocking time of the internal timer (step S5: YES), trimming information having time information that matches the clocking time of the internal timer
  • the image trimming unit 44 is controlled based on
  • step S6 the image trimming unit 44 generates an image of a predetermined area based on the area information linked to the time information among the image decoded data stored in the image storage unit 43 under the control of the analysis unit 41.
  • the image decoding data corresponding to the above is taken out and supplied to the output control unit 45.
  • step S7 the output control unit 45 outputs the image decoding data corresponding to the image of the predetermined area supplied from the image trimming unit 44 to the display. Thereafter, the process returns to step S5, and the above-described process is repeated until it is determined that there is no trimming information having time information that matches the time measured by the internal timer.
  • step S5 When it is determined in step S5 that there is no trimming information having time information that matches the time measured by the internal timer (step S5: NO), the image trimming display process shown in FIG. 6 ends.
  • generation of image data having metadata including at least data obtained by encoding an image and trimming information in which time information and area information are combined is generated.
  • the display timing that matches the time information described in the metadata is reached, it is possible to trim and display only a predetermined area of the image based on the area information linked to the time information. Since the display timing and management data can be included in the image data, data management becomes simple. Also, the image area to be displayed and the reproduction timing of the image area need only be edited in the information in the metadata, and can be easily changed since it is not necessary to use a specific device or software. The display according to the reproduction elapsed time can be easily performed.
  • the information processing apparatus 1A further includes an audio data reproduction unit, and can store the audio data in the storage unit 17 in association with the image data.
  • the information processing apparatus 1A can reproduce audio data associated with the image data.
  • audio data of a violin performance which serves as a guide for the musical score.
  • the user can practice the piano performance according to the guide performance.
  • audio data of a violin performance based on the musical score can also be reproduced simultaneously. Thereby, the user can enjoy the double performance with the violin performance only by performing the piano performance.
  • the information processing apparatus 1A may further describe animation information in metadata including at least trimming information in which time information and area information are combined.
  • the information processing apparatus 1A can simultaneously display an image based on animation information associated with the image data. For example, when displaying a predetermined area of musical score data of a piano, it is possible to superimpose and display an image of a guiding function of the piano performance of the musical score (an animation which tells the location of the keyboard to be played next). This allows the user to practice piano playing according to the guide function.
  • FIG. 7 is a diagram showing an example in which the lyric data is displayed in telop in accordance with the reproduction elapsed time of the audio data.
  • image encoded data consisting of artwork is stored in the original image data of the image data P11.
  • metadata M11 described in JSON is stored in an area of the APP 11 of the image data P11.
  • “" Lyrics "" is information instructing to use the lyric display function.
  • the information described after "" time indicates time information, and the information described after" “text” indicates text data. That is, in the metadata M11, telop information in which time information and text data for displaying lyrics are described by the lyrics display function is described, and the information processing apparatus 1 is an image data in which the metadata M11 is stored.
  • the image data P11 embedded in the audio data is acquired, and the metadata (telop information) M11 stored in the acquired image data P11 is read out.
  • FIG. 8 shows an example of the functional block configuration of the information processing apparatus 1 for carrying out the sound and image reproduction processing example as an information processing apparatus 1B.
  • the information processing apparatus 1B generates metadata, generates image data storing the generated metadata, generates a sound data in which the generated image data is embedded, and generates a sound from the sound data.
  • An audiovisual player (Audiovisual Player) 60 that reproduces an image from image data based on metadata while reproducing.
  • the data generation device 50 includes an image encoding unit 51, a metadata generation unit 52, a data generation unit 53, and a recording control unit 54.
  • the image coding unit 51 inputs original image data captured by a camera or original image data created by image processing, and performs image encoding on the input original image data in JPEG XT format.
  • the encoded data is supplied to the data generation unit 53.
  • the metadata generation unit 52 inputs reproduction control data consisting of telop information in which time information and text data are combined, and generates metadata defined by a box file format of JPEG XT Part 3 that can be described in JSON.
  • the generated metadata is supplied to the data generation unit 53.
  • the data generation unit 53 generates image data (FIG. 2) storing the encoded data supplied from the image coding unit 51 and the metadata supplied from the metadata generation unit 52.
  • the data generation unit 53 inputs audio data from the outside, embeds the image data in which the metadata is stored in the input audio data, and supplies it to the recording control unit 54.
  • the recording control unit 54 supplies, to the storage unit 17, the audio data in which the image data having the encoded image data and the metadata is embedded and which is supplied from the data generation unit 53, and controls the recording there.
  • the audio and video reproduction apparatus 60 includes an analysis unit 61, an image decoding unit 62, a text drawing unit 63, and an output control unit 64.
  • the analysis unit 61 acquires audio data from the storage unit 17 based on an instruction from the input unit 15, supplies the acquired audio data to the output control unit 64, and the image data embedded in the acquired audio data Is acquired, and the metadata stored in the acquired image data is analyzed.
  • the image encoded data in the JPEG XT format stored in the image data is supplied to the image decoding unit 62 by analysis.
  • the analysis unit 61 activates an internal timer (not shown), and the internal timer among the plurality of telop information that is a combination of time information described in the analyzed metadata, time information described in the analyzed metadata, and text data.
  • the text drawing unit 63 is controlled based on the telop information having time information that matches the clocked time of. That is, the analysis unit 61 controls the text drawing unit 63 so that the text data is sequentially imaged at predetermined timing based on the plurality of telop information described in the metadata.
  • the image decoding unit 62 decodes the encoded image data of JPEG XT format supplied from the analysis unit 61.
  • the decoded image data is supplied to the output control unit 64.
  • the text drawing unit 63 converts the text data supplied from the analysis unit 61 into image data at a predetermined timing based on the control of the analysis unit 61, and supplies the image data to the output control unit 64.
  • the output control unit 64 outputs a voice based on the voice data supplied from the analysis unit 61 to a speaker for reproduction, and causes the image data supplied from the image decoding unit 62 to be image data supplied from the text drawing unit 63. Are output (displayed) on the display.
  • step S11 the analysis unit 61 acquires voice data from the storage unit 17 based on an instruction from the input unit 15.
  • step S12 the analysis unit 61 analyzes metadata of the image data embedded in the audio data.
  • the acquired audio data is supplied to the output control unit 64, and the encoded image data of JPEG XT format stored in the analyzed metadata is supplied to the image decoding unit 52.
  • step S 13 the image decoding unit 62 decodes the image coding data in the JPEG XT format supplied from the analysis unit 61 to generate image decoding data, and supplies the image decoding data to the output control unit 64.
  • step S14 the output control unit 64 outputs the sound based on the sound data to the speaker for reproduction.
  • step S15 the analysis unit 61 activates an internal timer.
  • step S16 the analysis unit 61 determines whether or not there is telop information having time information that matches the timekeeping time of the internal timer among the plurality of telop information described in the analyzed metadata and the timekeeping time of the internal timer. Determine
  • step S16 when the analyzing unit 61 determines that there is telop information having time information that matches the clocking time of the internal timer (step S16: YES), telop information having time information that matches the clocking time of the internal timer
  • the text drawing unit 63 is controlled based on
  • step S17 the text drawing unit 63 converts the text data linked to the time information into image data based on the control of the analysis unit 61, and supplies the image data to the output control unit 64.
  • step S18 the output control unit 64 superimposes the text image data supplied from the text drawing unit 63 on the image data supplied from the image decoding unit 62, and outputs the superimposed image. Thereafter, the process returns to step S16, and the above-described process is repeated until it is determined that there is no telop information having time information that matches the time measured by the internal timer.
  • step S16 When it is determined in step S16 that there is no telop information having time information that matches the time measured by the internal timer (step S16: NO), the telop display process shown in FIG. 9 is ended.
  • information processing apparatus 1B further includes text color information, font information, information indicating the presence or absence of shading, and background color information. And the like may be described. With such a configuration, the information processing apparatus 1B can display a telop that can be enjoyed visually even from a monotonous telop when displaying a telop.
  • FIG. 10 is a diagram illustrating an example of image data in which tampering detection data is described in metadata.
  • image encoded data in which a photograph is an original image is stored.
  • metadata M21 described in JSON is stored.
  • the hash value A is a value obtained by executing a script using Seed data as an argument.
  • Seed data is data (parameters) embedded in advance in a predetermined area of the image data P21.
  • the hash value B is a value obtained by executing a script with the program string of the script as an argument.
  • the script is a hash function (program) for calculating a hash value. That is, data for detecting tampering is described in the metadata M21, and the information processing apparatus 1 reads the metadata (falsification detection data) M21 and executes a script to obtain image data P21. It is possible to detect tampering.
  • FIG. 11 shows an example of functional block configuration of the information processing apparatus 1 for carrying out this example of image reproduction processing as an information processing apparatus 1C.
  • the same components as those in FIG. 5 are denoted by the same reference numerals, and the redundant description will be appropriately omitted.
  • the information processing apparatus 1C generates metadata, generates an image data generation apparatus 30 that generates image data storing the generated metadata, and detects whether the image data storing the metadata has been tampered with or not.
  • the image data tampering detection apparatus 70 reproduces image data when the image data is not tampered with.
  • the metadata generation unit 32 inputs reproduction control data including a hash value A, a hash value B, and a script for detecting tampering, and specifies metadata defined by a box file format of JPEG XT Part 3 that can be described in JSON. Generate The generated metadata is supplied to the image data generation unit 33.
  • the image data tampering detection device 70 includes an analysis unit 71, a comparison unit 72, a tampering detection unit 73, an image decoding unit 74, and an output control unit 75.
  • the analysis unit 71 acquires image data from the storage unit 17 based on an instruction from the input unit 15, analyzes metadata stored in the acquired image data, and detects tampering detection data described in the metadata ( The hash value A, the hash value B, and the script are supplied to the comparison unit 72, and the encoded data of the JPEG XT image format stored in the image data is supplied to the image decoding unit 74.
  • the analysis unit 71 reads the Seed data embedded in the image data by a predetermined method, and also supplies the same to the comparison unit 72.
  • the comparing unit 72 calculates the hash value A ′ based on the script and the Seed data included in the tampering detection data supplied from the analyzing unit 71, and is described in the calculated hash value A ′ and metadata (tampering detection data) And the hash value A. Further, the comparison unit 72 calculates the hash value B ′ based on the program character string of the script included in the tampering detection data, and the calculated hash value B ′ and the hash value B described in the metadata (tampering detection data) Compare The comparison result is supplied to the tampering detection unit 73.
  • the falsification detection unit 73 detects whether the image data is falsified or not based on the two comparison results of the comparison unit 72, and the image data is not falsified (both the hash value A and the hash value B are correct). If it is determined that the image data is tampered (if either or both of the hash value A and the hash value B is incorrect) is detected, the image is decoded. The decryption process of the decryption unit 74 is prohibited.
  • the image decoding unit 74 decodes the image coding data in the JPEG XT format supplied from the analysis unit 71 when the execution of the decoding process is instructed based on the control of the tampering detection unit 73, and performs image decoding.
  • the data is supplied to the output control unit 75 as data.
  • the image decoding unit 74 does not decode the JPEG XT image encoded data supplied from the analysis unit 71, but the output control unit Supply to 75.
  • the output control unit 75 outputs (displays) the data supplied from the image decoding unit 74 to a display.
  • step S ⁇ b> 21 the analysis unit 71 acquires image data from the storage unit 17 based on an instruction from the input unit 15.
  • step S22 the analysis unit 71 analyzes the metadata stored in the image data, and supplies the tampering detection data (hash value A, hash value B, and script) described in the metadata to the comparison unit 72.
  • the encoded image data in the JPEG XT format stored in the read out image data is supplied to the image decoding unit 74.
  • the analysis unit 71 reads Seed data embedded in the image data by a predetermined method, and supplies the Seed data to the comparison unit 72.
  • step S23 the comparison unit 72 executes a script described in metadata (falsification detection data) using the Seed data supplied from the analysis unit 71 as an argument, and calculates a hash value A ′.
  • step S24 the comparison unit 72 compares the hash value A described in the metadata (tamper detection data) with the calculated hash value A '.
  • step S25 the comparison unit 72 executes the script with the program character string of the script described in the metadata (tamper detection data) as an argument, and calculates the hash value B '.
  • step S26 the comparison unit 72 compares the hash value B described in the metadata (tamper detection data) with the calculated hash value B '. The comparison results of step S24 and step S26 are supplied to the tampering detection unit 73.
  • step S27 the falsification detection unit 73 determines whether or not the image data has been falsified from the two comparison results, and if any one or both comparison results are different, it is determined that the image data is falsified.
  • Step S27: YES the decoding process of the image decoding unit 74 is prohibited in step S28. Accordingly, the image decoding unit 74 supplies the image control data in the JPEG XT format supplied from the analysis unit 71 to the output control unit 75 without decoding.
  • the output control unit 75 outputs (displays) the data supplied from the image decoding unit 74 to a display.
  • step S27 if the two comparison results are identical to each other, the tampering detection unit 73 determines that the image data is not tampered (step S27: NO), and in step S29, the decryption processing of the image decryption unit 74 Run
  • the image decoding unit 74 decodes the image coding data in the JPEG XT format supplied from the analysis unit 71, and supplies the decoded data as image decoding data to the output control unit 75.
  • the output control unit 75 outputs (displays) the image decoded data supplied from the image decoding unit 74 to a display.
  • the Seed data is assumed to be embedded in a predetermined area of the image data P21 in advance.
  • the present invention is not limited to this. You may make it store.
  • step S25 the hash value B 'calculated in step S25 is obtained by executing the script using the program character string of the script as an argument
  • the script is executed using the program character string of the script and the Seed data as arguments. May be obtained by
  • the information processing apparatuses 1A, 1B, and 1C may generate image data having image encoded data and metadata including a character string such as a location name to be selectively displayed according to position information on a map or a setting language.
  • the information processing apparatuses 1A, 1B, and 1C set the language set in the information processing apparatuses 1A, 1B, and 1C among the metadata stored in the image data. It is possible to acquire a string attached, and to display the acquired string superimposed on a predetermined position.
  • FIG. 13 is a diagram showing an example of use of image data having metadata including a character string such as a place name to be selectively displayed according to a position on a map and a set language, in addition to image coded data.
  • image encoded data in which an original image of a Japanese map is encoded is stored in the original image data.
  • metadata M31 described in JSON is stored in an area of the APP 11 of the image data P31.
  • Point “” is information instructing to use a function for pointing to a specific position on the screen.
  • the information described after “" Sapporo "", “" Tokyo “", “” Naha “”, “” x “”, “” y “” is the coordinate information of each place name (position) on the map It shows.
  • the information described after "" name “indicates the language, and the information described after” “en-US” "indicates the name of the place to be displayed when the language is set.
  • the information described after "JP” indicates a place name (character string) to be displayed when the language is set.
  • place name information including a combination of coordinate information for displaying a place name in a predetermined language, a set language and a place name is described by a function indicating a specific position on the screen, and the information processing apparatus 1A , 1B, and 1C, by displaying the metadata (place name information) when displaying the image data, the place name corresponding to the predetermined language set in the terminal may be superimposed and displayed at the predetermined position. it can.
  • the place names (Sapporo, Tokyo, Naha) following the "" en-US "of the metadata M31 are read out.
  • the information processing apparatuses 1A, 1B, and 1C superimpose a geographical name in English on a predetermined position on the Japanese map display P33 as indicated by the end of the arrow A32.
  • this image is generated by generating image encoded data and metadata including a character string such as a location name to be selectively displayed according to position information on a map and a setting language.
  • the place name linked to the language set in the information processing devices 1A, 1B, 1C may be superimposed and displayed at a predetermined position based on the place name information described in the metadata. it can.
  • the information processing apparatuses 1A, 1B, and 1C may generate image data including encoded image data and metadata including a character string such as an address of a shooting location of the image and a facility name. As a result, when displaying an image, the information processing apparatuses 1A, 1B, and 1C can acquire the character string of the metadata stored in the image data, and superimpose the acquired character string on the image. The information processing apparatuses 1A, 1B, and 1C can also perform image search using a character string of metadata stored in image data as a search key.
  • FIG. 14 is a diagram showing a usage example of image data having metadata including a character string such as an address of a photographing place of an image and a facility name in addition to image coded data.
  • a picture taken in Okinawa is encoded and stored as image encoded data.
  • metadata M41 described in JSON is stored in an area of the APP 11 of the image data P41.
  • the first line is ““ location ”: ⁇ ”
  • the second line is ““ address ”:“ Shuri Kinjocho 1-chome 2 Naha, Okinawa Prefecture ”
  • the third line is“ ⁇ Is described.
  • ““ Location ” is information instructing to use a function that can specify the current location and cooperate with the service.
  • the information described after "" address "indicates the address of the shooting location. That is, information indicating the address of the shooting location is described in the metadata M41, and the information processing apparatuses 1A, 1B, and 1C describe the metadata by reading out the metadata when displaying the image. Information indicating the address of the photographed place can be superimposed and displayed.
  • the information processing apparatuses 1A, 1B, and 1C connect the image data P41 storing such metadata M41 to a database (DB) connected via a network (not shown) as indicated by the point of arrow A42. ) Can be supplied and managed there. Accordingly, when the information processing apparatuses 1A, 1B, and 1C perform image search using "Okinawa" as a search key, image data including "Okinawa" in the metadata M41 among a plurality of image data managed by the database 101 You can search for Then, as indicated by the end of the arrow A43, the information processing apparatuses 1A, 1B, and 1C can display the image list P43 including thumbnail images of a plurality of searched image data.
  • the image when the image is displayed by generating the image data including the image encoded data and the metadata including the character string such as the address of the shooting location and the facility name, the image is displayed It is possible to superimpose the address of the shooting location stored in the data and the facility name.
  • the generated image data when a search key is designated, it is possible to easily search for image data in which metadata including the search key is stored.
  • the information processing apparatuses 1A, 1B, and 1C may generate image data having metadata including text data indicating the content of the image coded data in addition to the image coded data.
  • the information processing apparatuses 1A, 1B, and 1C acquire text data of metadata stored in the image data, and the acquired text data is voiced by the text-to-speech function Can be converted and played back.
  • FIG. 15 is a diagram showing an example of use of image data having metadata including text data indicating contents of the image coded data in addition to the image coded data.
  • Tts "" is information instructing to use a text-to-speech function called a tts (text-to speech) system.
  • the information described after "" lang “” indicates the language specified when using the text-to-speech function.
  • the information described after "" text "indicates text data read out when using the tts system. That is, text data for reading out in Japanese by the text-to-speech function is described in the metadata M51, and the information processing apparatuses 1A, 1B, and 1C read this metadata when displaying the image data. The voice based on the text data described in the metadata can be reproduced.
  • the image is displayed based on the image data by generating the image data including the encoded image data and the metadata including the text data indicating the content of the encoded image data.
  • the image data is displayed based on the image data by generating the image data including the encoded image data and the metadata including the text data indicating the content of the encoded image data.
  • sound based on text data stored in the image data can be reproduced.
  • the information processing apparatuses 1A, 1B, and 1C may generate image data including image coded data encrypted by a public key and metadata storing the public key. Thus, when displaying the image, the information processing apparatuses 1A, 1B, and 1C acquire the public key of the metadata stored in the image data, and only when the image code has the secret key linked to the acquired public key, the image code Can be decoded and displayed.
  • FIG. 16 is a diagram showing an example of use of image data including image encoded data encrypted by a public key and metadata storing the public key.
  • image encoded data encrypted with a public key is stored in the original image data of the image data P61.
  • the metadata M61 described in JSON is stored in the area of the APP 11 of the image data P61.
  • a thumbnail image P61a as it is in plain text is also stored in the area of APP1 (Exif) of the image data P61.
  • the first line is ““ encrypt ”: ⁇ ”
  • the second line is ““ OID ”:“ 1.2.840.10045.2.1 ””
  • the third line is “public_key”: “ 04FC 2 E 8 B 81 DD ... ”” and “ ⁇ ” are described in the fourth line.
  • ““ Encrypt ”” is information instructing to use the encryption function.
  • the information described after “" OID “" indicates information identifying an object, and the information described after "" public_key “” indicates a public key. That is, the public key used for the encryption of the image encoded data is described in the metadata M61, and the information processing apparatuses 1A, 1B, and 1C read this metadata when displaying the image.
  • the image encoded data in the image data P61 can be decoded and displayed only when there is a secret key linked to the public key described in the metadata.
  • the information processing apparatuses 1A, 1B, and 1C do not have the secret key 111 linked to the public key read from the metadata M61, the information processing apparatuses 1A, 1B, and 1C can decode the image encoded data in the image data P61. Then, as indicated by the tip of the arrow A62, the data P63 as it is encrypted is displayed.
  • the fourth modification when the image is displayed by generating the image data having the image encoded data encrypted by the public key and the metadata storing the public key, Only in the case of having a secret key linked to the public key of the metadata stored in the data, the encrypted image encoded data can be decoded and displayed.
  • the information processing apparatuses 1A, 1B, and 1C are image data having encoded image data, metadata including object (facility etc.) information identified based on the shooting position and direction of the original image, and the angle of view and map information. May be generated. Accordingly, the information processing apparatuses 1A, 1B, and 1C can perform image search using object information of metadata stored in image data as a search key.
  • FIG. 17 and FIG. 18 are diagrams showing an example of use of image data having image coded data, metadata including object information identified based on shooting position and direction of the original image, angle of view and map information. is there.
  • an image of the Tokyo Tower taken at latitude 35.65851 and longitude 139.745433 is encoded and stored as image encoded data. ing.
  • APP1 (Exif) of the image data P71 Exif information of latitude 35.6591, longitude 139.741969, and azimuth N 90 ° is stored.
  • APP1 (Exif) of the image data P72 Exif information of latitude 35.65851, longitude 139.745433, and azimuth N 315 ° is stored.
  • the operation unit 112 of the information processing apparatus 1A, 1B, 1C inputs the image data P71, refers to the Map database 111 connected via the network (not shown), and relates to Exif information stored in the image data P71. Object information to be acquired.
  • the calculation unit 112 generates metadata M71 described in JSON based on the object information acquired from the Map database 111, as indicated by the tip of the arrow A71.
  • the operation unit 113 of the information processing apparatuses 1A, 1B, and 1C receives the image data P72, refers to the Map database 111 connected via a network (not shown), and relates to Exif information stored in the image data P72. Object information to be acquired.
  • the calculation unit 113 generates the metadata M72 described in JSON based on the object information acquired from the Map database 111 as indicated by the tip of the arrow A72.
  • the information processing apparatuses 1A, 1B, and 1C store the generated metadata M71 in the area of the APP 11 of the image data P71, and store the generated metadata M72 in the area of the APP 11 of the image data P72.
  • the information processing apparatuses 1A, 1B, and 1C are configured such that the image data P71 in which the metadata M71 is stored, and the image data P72 in which the metadata M72 is stored, as indicated by the arrow A81 in FIG. Can be supplied to and managed by the object database 121 connected thereto.
  • the information processing apparatuses 1A, 1B, and 1C perform image search using "Tokyo Tower” as a search key, "Tokyo Tower” is selected as metadata M71, M72 from among a plurality of image data managed by the database 121.
  • the image data P71 and P72 including the image data can be searched.
  • the information processing apparatuses 1A, 1B, and 1C can display an image list P81 including thumbnail images of a plurality of searched image data, as indicated by the end of the arrow A82.
  • image data having metadata including object data identified based on encoded data, shooting position and direction of image data, and angle of view and map information is generated.
  • the metadata described in the image reproduction processing example for trimming an image describes time information and area information
  • the metadata described in the audio image reproduction processing example describes time information and text data
  • the metadata described in the example of the image reproduction processing accompanied by the falsification detection describes the falsification detection data
  • time information, area information, metadata describing tamper detection data, time information, text data, metadata describing tamper detection data, time information, area information, text information, and tamper detection data are described It is also possible to generate metadata. With such a configuration, only when the image data is detected not to be tampered with is detected according to the tampering detection data described in the metadata, the predetermined area of the image data is trimmed and displayed at a predetermined display timing. It is possible to display the image data in telop display at a predetermined timing, or to display only a predetermined area of the image data in a trimmed display at a predetermined timing and to display a telop display on the image data.
  • object information, shooting position information, and the like are described in the metadata, but the present invention is not limited to this.
  • the face of Mr. Yamada is located at x-coordinate 300 and y-coordinate 200 in image data. It is also possible to describe information indicating that Mr. Suzuki's face is in the x-coordinate 500 and the y-coordinate 300. With such a configuration, it is possible to extract Mr. Yamada's image from among a plurality of image data and to search for Mr. Yamada's face (position) in the extracted image.
  • image data detected by performing predetermined image recognition processing on image data captured by a drive recorder, a security camera or the like, data such as date and time, location, status, etc. may be described in metadata. With such a configuration, it is possible to extract an image in a dangerous situation by image analysis from among a plurality of image data.
  • the image data generation device 30, the image reproduction device 40, the audio image data generation device 50, the audio image reproduction device 60, and the image data tampering detection device 70 are provided in the same information processing devices 1A, 1B, 1C. However, it is also possible to provide those functions as separate devices.
  • the series of processes described above can be performed by hardware or software.
  • various functions are executed by installing a computer in which a program configuring the software is incorporated in dedicated hardware or various programs. Can be installed, for example, on a general-purpose personal computer from the program storage medium.
  • the program executed by the computer may be a program that performs processing in chronological order according to the order described in this specification, in parallel, or when necessary, such as when a call is made. It may be a program to be processed.
  • 1, 1A, 1B, 1C information processing apparatus
  • 16 output unit
  • 17 storage unit
  • 30 image data generation apparatus

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Signal Processing For Recording (AREA)
  • Controls And Circuits For Display Device (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

L'invention concerne un dispositif de reproduction d'image vocale, un procédé de reproduction d'image vocale et la structure des données de données d'image, par lesquels un affichage d'image, correspondant à un temps écoulé de reproduction de données vocales, peut être facilement réalisé. La présente invention comprend une unité de reproduction d'image vocale, qui reproduit la voix et une image provenant de données vocales, dans laquelle sont incorporées des données d'image comprenant des données obtenues par codage de l'image et comprenant des métadonnées liées aux données. Les métadonnées comprennent au moins des informations de sous-titrage « Telop » dans lesquelles des données de texte et des informations temporelles sont appariées les unes aux autres. L'unité de reproduction d'image vocale reproduit la voix en fonction des données vocales et affiche, en fonction des informations de Telop des données d'image, une image sous-titrée de Telop, qui s'appuie sur les données de texte correspondant à un temps écoulé depuis le début de la reproduction vocale.
PCT/JP2018/028373 2017-08-23 2018-07-30 Dispositif de reproduction d'image vocale, procédé de reproduction d'image vocale et structure des données de données d'image WO2019039194A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2017160605A JP6874593B2 (ja) 2017-08-23 2017-08-23 データ再生装置、データ再生方法、および画像データのデータ構造
JP2017-160605 2017-08-23

Publications (1)

Publication Number Publication Date
WO2019039194A1 true WO2019039194A1 (fr) 2019-02-28

Family

ID=65438779

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/028373 WO2019039194A1 (fr) 2017-08-23 2018-07-30 Dispositif de reproduction d'image vocale, procédé de reproduction d'image vocale et structure des données de données d'image

Country Status (2)

Country Link
JP (1) JP6874593B2 (fr)
WO (1) WO2019039194A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11606531B2 (en) * 2020-02-19 2023-03-14 Beijing Xiaomi Mobile Software Co., Ltd. Image capturing method, apparatus, and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023075188A1 (fr) * 2021-10-28 2023-05-04 세종대학교산학협력단 Procédé de configuration de contenu multimédia basé sur un objet pour un contenu de forme courte et dispositif l'utilisant

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04326398A (ja) * 1991-04-26 1992-11-16 Casio Comput Co Ltd 自動演奏装置
JPH10254435A (ja) * 1997-01-09 1998-09-25 Yamaha Corp 表示制御方法および表示制御装置、ならびに、表示制御用プログラムを記録した記録媒体
JP2007142728A (ja) * 2005-11-17 2007-06-07 Sharp Corp 携帯端末、情報処理方法、プログラム、およびプログラムを記録したコンピュータで読取り可能な記録媒体

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04326398A (ja) * 1991-04-26 1992-11-16 Casio Comput Co Ltd 自動演奏装置
JPH10254435A (ja) * 1997-01-09 1998-09-25 Yamaha Corp 表示制御方法および表示制御装置、ならびに、表示制御用プログラムを記録した記録媒体
JP2007142728A (ja) * 2005-11-17 2007-06-07 Sharp Corp 携帯端末、情報処理方法、プログラム、およびプログラムを記録したコンピュータで読取り可能な記録媒体

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11606531B2 (en) * 2020-02-19 2023-03-14 Beijing Xiaomi Mobile Software Co., Ltd. Image capturing method, apparatus, and storage medium

Also Published As

Publication number Publication date
JP6874593B2 (ja) 2021-05-19
JP2019041191A (ja) 2019-03-14

Similar Documents

Publication Publication Date Title
WO2019039196A1 (fr) Dispositif de détection d'altération de données d'image, procédé de détection d'altération de données d'image et structure de données de données d'image
US8521007B2 (en) Information processing method, information processing device, scene metadata extraction device, loss recovery information generation device, and programs
KR20160044981A (ko) 동영상 처리 장치 및 방법
WO2019039194A1 (fr) Dispositif de reproduction d'image vocale, procédé de reproduction d'image vocale et structure des données de données d'image
WO2019039192A1 (fr) Dispositif de reproduction d'image, appareil de traitement d'informations, procédé de reproduction d'image et structure de données de données d'image
CN104065908A (zh) 用于创建和再现生动图片文件的设备和方法
JP2011030224A (ja) マルチメディア字幕表示システム及びマルチメディア字幕表示方法
JP5910379B2 (ja) 情報処理装置、情報処理方法、表示制御装置および表示制御方法
JP4070742B2 (ja) オーディオファイルとテキストを同期化させる同期信号の埋込/検出方法及び装置
JP2004153764A (ja) メタデータ制作装置及び検索装置
US20130073934A1 (en) Image display apparatus, image display method, and computer readable medium
JP5371574B2 (ja) 背景映像中の顔画像を避けるように歌詞字幕を表示するカラオケ装置
BRPI0616365A2 (pt) sistema e mÉtodo para geraÇço de marca d'Água em projetor digital de cinema
JP5711242B2 (ja) 音声コンテンツをビデオコンテンツに加える方法及び当該方法を実施する装置
JP2009283020A (ja) 記録装置、再生装置、及びプログラム
WO2019043871A1 (fr) Dispositif de détermination de cadencement d'affichage, procédé de détermination de cadencement d'affichage, et programme
CN112151048B (zh) 音视图数据生成以及处理的方法
KR101934393B1 (ko) 전자문서의 이미지 자동변환을 통한 강의영상 컨텐츠 제작시스템
KR100577558B1 (ko) 오디오 컨텐츠와 텍스트를 동기화시키는 동기신호삽입/검출 방법 및 장치
JP7197688B2 (ja) 再生制御装置、プログラムおよび再生制御方法
CN108124147B (zh) 有声png图片的合成方法及合成系统
JP7102826B2 (ja) 情報処理方法および情報処理装置
JP5779279B2 (ja) コンテンツ情報処理装置及びコンテンツ情報処理方法
CN113436591A (zh) 音高信息生成方法、装置、计算机设备及存储介质
JP6413828B2 (ja) 情報処理方法、情報処理装置、及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18847927

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18847927

Country of ref document: EP

Kind code of ref document: A1