WO2013115203A1 - Information processing system, information processing method, information processing device, and control method and control program therefor, and communication terminal, and control method and control program therefor - Google Patents

Information processing system, information processing method, information processing device, and control method and control program therefor, and communication terminal, and control method and control program therefor Download PDF

Info

Publication number
WO2013115203A1
WO2013115203A1 PCT/JP2013/051954 JP2013051954W WO2013115203A1 WO 2013115203 A1 WO2013115203 A1 WO 2013115203A1 JP 2013051954 W JP2013051954 W JP 2013051954W WO 2013115203 A1 WO2013115203 A1 WO 2013115203A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
local feature
local
learning object
information processing
Prior art date
Application number
PCT/JP2013/051954
Other languages
French (fr)
Japanese (ja)
Inventor
野村 俊之
山田 昭雄
岩元 浩太
亮太 間瀬
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Publication of WO2013115203A1 publication Critical patent/WO2013115203A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes

Definitions

  • the present invention relates to a technique for identifying a learning object in a video imaged using local feature amounts.
  • Patent Document 1 describes a technique for obtaining the name of an object (plant, insect, etc.) based on a video from a camera-equipped mobile phone and an inquiry mail.
  • Japanese Patent Application Laid-Open No. 2004-228561 describes a technique that improves the recognition speed by clustering feature amounts when a query image is recognized using a model dictionary generated in advance from a model image.
  • An object of the present invention is to provide a technique for solving the above-described problems.
  • a system provides: Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object.
  • First local feature quantity storage means for storing the local feature quantity in association with each other; N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions.
  • Second local feature quantity generating means for generating the second local feature quantity of A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number.
  • the image in the video Learning object recognition means for recognizing that the learning object exists in It is characterized by providing.
  • the method according to the present invention comprises: Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object.
  • An information processing method in an information processing system including first local feature storage means for storing a local feature in association with each other, N feature points are extracted from an image in the captured video, and n second regions each consisting of a feature vector from one dimension to j dimension for each of the n local regions including each of the n feature points.
  • a second local feature generation step for generating a local feature A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number.
  • an apparatus provides: N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions.
  • Second local feature quantity generating means for generating the second local feature quantity of First transmitting means for transmitting the m second local feature amounts to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature amounts;
  • First receiving means for receiving information indicating a learning object included in the captured image from the information processing apparatus; It is characterized by providing.
  • the method according to the present invention comprises: N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions.
  • a second local feature generation step of generating the second local feature of A first transmission step of transmitting the m second local feature quantities to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature quantities;
  • a program provides: N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions.
  • a second local feature generation step of generating the second local feature of A first transmission step of transmitting the m second local feature quantities to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature quantities;
  • an apparatus provides: Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object.
  • First local feature quantity storage means for storing the local feature quantity in association with each other; N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points.
  • Second receiving means for receiving the second local feature amount from the communication terminal;
  • a smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number.
  • the method according to the present invention comprises: Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object.
  • a control method for an information processing apparatus including first local feature storage means for storing a local feature in association with each other, N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points.
  • a second receiving step of receiving the second local feature amount of the communication terminal from the communication terminal A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number.
  • a program provides: Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object.
  • a control program for an information processing apparatus including a first local feature storage unit that stores a local feature in association with each other, N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points.
  • a smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number.
  • the learning object in the image in the video can be recognized in real time.
  • the information processing system 100 is a system that recognizes a learning object in real time.
  • the information processing system 100 includes a first local feature quantity storage unit 110, an imaging unit 120, a second local feature quantity generation unit 130, and a learning target object recognition 140.
  • the first local feature quantity storage unit 110 generates each of the learning object 111 and m local regions including each of m feature points of the image of the learning object 111 from one dimension to i dimension.
  • the m first local feature quantities 112 composed of the feature vectors up to are stored in association with each other.
  • the second local feature quantity generation unit 130 extracts n feature points 131 from the image 101 in the video captured by the imaging unit 120.
  • the second local feature value generation unit 130 for n local regions 132 each including the n feature points 131, n second local feature values each consisting of a feature vector from 1 dimension to j dimension. 133 is generated.
  • the learning object recognition 140 selects a smaller number of dimensions from the dimension number i of the feature vector of the first local feature quantity 112 and the dimension number j of the feature vector of the second local feature quantity 133.
  • the learning object recognition 140 adds the m second local feature amounts consisting of feature vectors up to the selected number of dimensions to the n second local feature amounts 133 consisting of feature vectors up to the selected number of dimensions.
  • the learning target in the image in the video can be recognized in real time.
  • the learning target in the video is recognized by collating the local feature generated from the video captured by the communication terminal with the local feature stored in the local feature DB of the learning target recognition server. To do. Then, the recognized learning object is notified by adding its name, related information, and / or link information.
  • the name, related information, and / or link information can be notified in association with the learning object in the image in the video in real time.
  • FIG. 2 is a block diagram illustrating a configuration of the information processing system 200 according to the present embodiment.
  • the information processing system 200 in FIG. 2 includes a communication terminal 210 having an imaging function, connected via a network 240, a learning object recognition server 220 that recognizes a learning object from an image captured by the communication terminal 210, And a related information providing server 230 that provides related information to the communication terminal 210.
  • the communication terminal 210 displays the captured video on the display unit. And each name of the learning object recognized by the learning object recognition server 220 based on the local feature-value produced
  • the learning object recognition server 220 corresponds to the local feature DB 221 that stores the learning object and the local feature in association with each other, the related information DB 222 that stores the related information in correspondence with the learning object, and the learning object. And a link information DB 223 for storing link information.
  • the learning object recognition server 220 returns the name of the learning object recognized based on the collation with the local feature quantity of the local feature quantity DB 221 from the local feature quantity of the video received from the communication terminal 210.
  • related information such as introduction corresponding to the learning object recognized from the related information DB 222 is searched and returned to the communication terminal 210.
  • the link information to the related information providing server 230 corresponding to the learning object recognized from the link information DB 223 is searched and returned to the communication terminal 210.
  • the name of the learning object, the related information corresponding to the learning object, and the link information for the learning object may be provided separately or may be provided at the same time.
  • the related information providing server 230 has a related information DB 231 that stores related information corresponding to the learning object. Access is made based on link information provided corresponding to the learning object recognized by the learning object recognition server 220. Then, the related information corresponding to the recognized learning object is searched from the related information DB 231 and returned to the communication terminal 210 that has transmitted the local feature amount of the video including the learning object. Therefore, although one related information providing server 230 is shown in FIG. 2, as many related information providing servers 230 as the number of link destinations are connected. In that case, the selection of an appropriate link destination by the learning object recognition server 220 or a plurality of link destinations are displayed on the communication terminal 210 and the selection is performed by the user.
  • FIG. 2 shows an example in which the name is superimposed on the learning object in the captured video.
  • the display of the related information corresponding to the learning object and the link information for the learning object will be described with reference to FIG.
  • FIG. 3 is a diagram illustrating a display screen example of the communication terminal 210 in the information processing system 200 according to the present embodiment.
  • the display screen 310 in FIG. 3 includes a captured image 311 and operation buttons 312.
  • the learning target is recognized by collating the local feature generated from the video in the upper left diagram with the local feature DB 221 of the learning target recognition server 220.
  • a video 321 is displayed in which the captured video and the learning object name and related information 322 to 325 are superimposed.
  • the related information may be output by voice through the speaker 340.
  • the lower part of FIG. 3 is an example of a display screen that displays link information corresponding to the learning object.
  • the learning target is recognized by collating the local feature generated from the video in the lower left figure with the local feature DB 221 of the learning target recognition server 220.
  • a video 331 is displayed in which the captured video is superimposed with the learning object name and link information 332 to 335.
  • the linked related information providing server 230 is accessed, and the related information retrieved from the related information DB 231 is displayed on the communication terminal 210, or the communication terminal 210 receives audio. Output.
  • FIG. 4 is a sequence diagram showing an operation procedure of related information notification in the information processing system 200 according to the present embodiment.
  • step S400 an application and / or data is downloaded from the learning object recognition server 220 to the communication terminal 210.
  • step S401 the application is activated and initialized to perform the processing of this embodiment.
  • step S403 the communication terminal captures an image by the imaging unit and acquires a video.
  • step S405 a local feature amount is generated from the video.
  • step S407 the local feature amount is encoded together with the feature point coordinates.
  • the encoded local feature amount is transmitted from the communication terminal to the learning object recognition server 220 in step S409.
  • step S411 the learning target object recognition server 220 refers to the local feature amount DB 221 generated and stored for the image of the learning target object, and recognizes the learning target object in the video.
  • step S413 the related information is acquired by referring to the related information DB 222 corresponding to the recognized learning object.
  • step S415 the learning object name and related information are transmitted from the learning object recognition server 220 to the communication terminal 210.
  • step S417 the communication terminal 210 notifies the received learning object name and related information (see the upper part of FIG. 3). Note that it is desirable that the learning object name is displayed and the related information is displayed or output by voice.
  • FIG. 5 is a sequence diagram showing an operation procedure of link information notification in the information processing system 200 according to the present embodiment.
  • the same step number is attached
  • steps S400 and S401 although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
  • the learning object recognition server 220 that has recognized the learning object in the landscape from the local feature amount of the video received from the communication terminal 210 in step S411 refers to the link information DB 223 in step S513 and recognizes the learning object. Get link information corresponding to. In step S515, the learning object name and link information are transmitted from the learning object recognition server 220 to the communication terminal 210.
  • step S517 the communication terminal 210 displays the received learning object name and link information superimposed on the video (see the lower part of FIG. 3).
  • step S519 an instruction from the user of link information is awaited. If there is a user's link destination instruction, in step S521, the related information providing server 230 that is the link destination is accessed with the learning object ID.
  • step S523 the related information providing server 230 acquires related information (including document data and audio data) from the related information DB 231 using the received learning object ID.
  • step S525 the related information is returned to the access source communication terminal 210.
  • the communication terminal 210 that has received the reply of the related information displays or outputs the received related information in step S527.
  • FIG. 6 is a block diagram illustrating a functional configuration of the communication terminal 210 according to the present embodiment.
  • the imaging unit 601 inputs a video as a query image.
  • the local feature value generation unit 602 generates a local feature value from the video from the imaging unit 601.
  • the local feature amount transmission unit 603 encodes the generated local feature amount together with the feature point coordinates by the encoding unit 603a and transmits the encoded local feature amount to the learning object recognition server 220 via the communication control unit 604.
  • the learning object recognition result receiving unit 605 receives the learning object recognition result from the learning object recognition server 220 via the communication control unit 604.
  • the display screen generation unit 606 generates a display screen of the received learning object recognition result and notifies the user.
  • the related information receiving unit 607 receives related information via the communication control unit 604. Then, the display screen generation unit 606 and the sound generation unit 698 generate a display screen and sound data of the received related information and notify the user. Note that the related information received by the related information receiving unit 607 includes related information from the learning object recognition server 220 or related information from the related information providing server 230.
  • the link information receiving unit 609 receives link information from the related information providing server 230 via the communication control unit 604. Then, the display screen generation unit 606 generates a display screen of the received link information and notifies the user.
  • the link destination access unit 610 accesses the link destination related information providing server 230 based on the click of link information by an operation unit (not shown).
  • the learning object recognition result receiving unit 605, the related information receiving unit 607, and the link information receiving unit 609 are not provided, but are provided as one information receiving unit that receives information received via the communication control unit 604. May be.
  • FIG. 7 is a block diagram showing a functional configuration of the learning object recognition server 220 according to the present embodiment.
  • the local feature receiving unit 702 decodes the local feature received from the communication terminal 210 via the communication control unit 701 by the decoding unit 702a.
  • the learning object recognition unit 703 recognizes the learning object by comparing the received local feature quantity with the local feature quantity of the local feature quantity DB 221 that stores the local feature quantity corresponding to the learning object.
  • the learning object recognition result transmission unit 704 transmits the learning object recognition result (learning object name) to the communication terminal 210.
  • the related information acquisition unit 705 refers to the related information DB 222 and acquires related information corresponding to the recognized learning object.
  • the related information DB related information transmission unit 706 transmits the acquired related information to the communication terminal 210.
  • the learning object recognition server 220 transmits related information, as shown in FIG. 4, transmitting the learning object recognition result and the related information as a single transmission data reduces communication traffic. desirable.
  • the link information acquisition unit 707 refers to the link information DB 223 and acquires link information corresponding to the recognized learning object.
  • the link information transmission unit 708 transmits the acquired link information to the communication terminal 210.
  • the learning object recognition server 220 transmits the learning object recognition result and the link information as a single transmission data because communication traffic is reduced. desirable.
  • the learning object recognition server 220 transmits the learning object recognition result, the related information, and the link information, the transmission of one piece of transmission data after acquiring all the information is to reduce the communication traffic. This is desirable.
  • the configuration of the related information providing server 230 includes various linkable providers, and a description of the configuration is omitted.
  • FIG. 8 is a diagram illustrating a configuration of the local feature DB 221 according to the present embodiment. Note that the present invention is not limited to such a configuration.
  • the local feature DB 221 stores a first local feature 803, a second local feature 804, ..., an mth local feature 805 in association with the learning object ID 801 and the name 802.
  • Each local feature quantity stores a feature vector composed of 1-dimensional to 150-dimensional elements hierarchized by 25 dimensions corresponding to 5 ⁇ 5 subregions (see FIG. 11F).
  • m is a positive integer and may be a different number corresponding to the learning object ID.
  • the feature point coordinates used for the matching process are stored together with the respective local feature amounts.
  • FIG. 9 is a diagram showing a configuration of the related information DB 222 according to the present embodiment. Note that the present invention is not limited to such a configuration.
  • the related information DB 222 stores related display data 903 and related audio data 904 that are related information in association with the learning object ID 901 and the learning object name 902.
  • the related information DB 222 may be provided integrally with the local feature DB 221.
  • FIG. 10 is a diagram showing a configuration of the link information DB 223 according to the present embodiment. Note that the present invention is not limited to such a configuration.
  • the link information DB 223 stores ink information, for example, a URL (Uniform Resource Locator) 1003 and display data 10904 on the display screen in association with the learning object ID 1001 and the learning object name 1002.
  • the link information DB 223 may be provided integrally with the local feature amount DB 221 and the related information DB 222.
  • the related information DB 231 of the related information providing server 230 is the same as the related information DB 222 of the learning object recognition server 220, and a description thereof is omitted to avoid duplication.
  • FIG. 11A is a block diagram illustrating a configuration of a local feature value generation unit 702 according to the present embodiment.
  • the local feature quantity generation unit 702 includes a feature point detection unit 1111, a local region acquisition unit 1112, a sub region division unit 1113, a sub region feature vector generation unit 1114, and a dimension selection unit 1115.
  • the feature point detection unit 1111 detects a large number of characteristic points (feature points) from the image data, and outputs the coordinate position, scale (size), and angle of each feature point.
  • the local region acquisition unit 1112 acquires a local region where feature amount extraction is performed from the coordinate value, scale, and angle of each detected feature point.
  • the sub area dividing unit 1113 divides the local area into sub areas.
  • the sub-region dividing unit 1113 can divide the local region into 16 blocks (4 ⁇ 4 blocks) or divide the local region into 25 blocks (5 ⁇ 5 blocks).
  • the number of divisions is not limited. In the present embodiment, the case where the local area is divided into 25 blocks (5 ⁇ 5 blocks) will be described below as a representative.
  • the sub-region feature vector generation unit 1114 generates a feature vector for each sub-region of the local region.
  • a gradient direction histogram can be used as the feature vector of the sub-region.
  • the dimension selection unit 1115 selects a dimension to be output as a local feature amount (for example, thinning out) so that the correlation between feature vectors of adjacent sub-regions becomes low based on the positional relationship of the sub-regions.
  • the dimension selection unit 1115 can not only select a dimension but also determine a selection priority. That is, the dimension selection unit 1115 can select dimensions with priorities so that, for example, dimensions in the same gradient direction are not selected between adjacent sub-regions. Then, the dimension selection unit 1115 outputs a feature vector composed of the selected dimensions as a local feature amount.
  • the dimension selection part 1115 can output a local feature-value in the state which rearranged the dimension based on the priority.
  • 11B to 11F are diagrams showing processing of the local feature quantity generation unit 602 according to the present embodiment.
  • FIG. 11B is a diagram showing a series of processing of feature point detection / local region acquisition / sub-region division / feature vector generation in the local feature quantity generation unit 602.
  • Such a series of processes is described in US Pat. No. 6,711,293, David G. Lowe, “Distinctive image features from scale-invariant key points” (USA), International Journal of Computer Vision, 60 (2), 2004. Year, p. 91-110.
  • An image 1121 in FIG. 11B is a diagram illustrating a state in which feature points are detected from an image in the video in the feature point detection unit 1111 in FIG. 11A.
  • the starting point of the arrow of the feature point data 1121a indicates the coordinate position of the feature point
  • the length of the arrow indicates the scale (size)
  • the direction of the arrow indicates the angle.
  • the scale (size) and direction brightness, saturation, hue, and the like can be selected according to the target image.
  • FIG. 11B the case of six directions at intervals of 60 degrees will be described, but the present invention is not limited to this.
  • the local region acquisition unit 1112 in FIG. 11A generates a Gaussian window 1122a around the starting point of the feature point data 1121a, and generates a local region 1122 that substantially includes the Gaussian window 1122a.
  • the local region acquisition unit 1112 generates a square local region 1122, but the local region may be circular or have another shape. This local region is acquired for each feature point. If the local area is circular, there is an effect that the robustness is improved with respect to the imaging direction.
  • the sub-region dividing unit 1113 shows a state in which the scale and angle of each pixel included in the local region 1122 of the feature point data 1121a are divided into sub-regions 1123.
  • the gradient direction is not limited to 6 directions, but may be quantized to an arbitrary quantization number such as 4 directions, 8 directions, and 10 directions.
  • the sub-region feature vector generation unit 1114 may add up the magnitudes of the gradients instead of adding up the simple frequencies.
  • the sub-region feature vector generation unit 1114 when the sub-region feature vector generation unit 1114 aggregates the gradient histogram, the sub-region feature vector generation unit 1114 assigns weight values not only to the sub-region to which the pixel belongs, but also to sub-regions (such as adjacent blocks) that are close to each other according to the distance between the sub-regions. You may make it add. Further, the sub-region feature vector generation unit 1114 may add weight values to gradient directions before and after the quantized gradient direction. Note that the feature vector of the sub-region is not limited to the gradient direction histogram, and may be any one having a plurality of dimensions (elements) such as color information. In the present embodiment, it is assumed that a gradient direction histogram is used as the feature vector of the sub-region.
  • the dimension selection unit 1115 selects (decimates) a dimension (element) to be output as a local feature amount based on the positional relationship between the sub-regions so that the correlation between feature vectors of adjacent sub-regions becomes low. More specifically, the dimension selection unit 1115 selects dimensions such that at least one gradient direction differs between adjacent sub-regions, for example.
  • the dimension selection unit 1115 mainly uses adjacent subregions as adjacent subregions. However, the adjacent subregions are not limited to adjacent subregions. A sub-region within a predetermined distance may be a nearby sub-region.
  • FIG. 11C shows an example in which a dimension is selected from a feature vector 1131 of a 150-dimensional gradient histogram generated by dividing a local region into 5 ⁇ 5 block sub-regions and quantizing gradient directions into six directions 1131a.
  • FIG. 11C is a diagram showing a state of feature vector dimension number selection processing in the local feature value generation unit 602.
  • the dimension selection unit 1115 selects a feature vector 1132 of a half 75-dimensional gradient histogram from a feature vector 1131 of a 150-dimensional gradient histogram.
  • dimensions can be selected so that dimensions in the same gradient direction are not selected in adjacent left and right and upper and lower sub-region blocks.
  • the dimension selection unit 1115 selects the feature vector 1133 of the 50-dimensional gradient histogram from the feature vector 1132 of the 75-dimensional gradient histogram.
  • the dimension can be selected so that only one direction is the same (the remaining one direction is different) between the sub-region blocks positioned at an angle of 45 degrees.
  • the dimension selection unit 1115 selects the feature vector 1134 of the 25-dimensional gradient histogram from the feature vector 1133 of the 50-dimensional gradient histogram, the gradient direction selected between the sub-region blocks located at an angle of 45 degrees. Dimension can be selected so that does not match.
  • the dimension selection unit 1115 selects one gradient direction from each sub-region from the first dimension to the 25th dimension, selects two gradient directions from the 26th dimension to the 50th dimension, and starts from the 51st dimension. Three gradient directions are selected up to 75 dimensions.
  • the gradient directions should not be overlapped between adjacent sub-area blocks and that all gradient directions should be selected uniformly.
  • the dimensions be selected uniformly from the entire local region. Note that the dimension selection method illustrated in FIG. 11C is an example, and is not limited to this selection method.
  • FIG. 11D is a diagram illustrating an example of the selection order of feature vectors from sub-regions in the local feature value generation unit 602.
  • the dimension selection unit 1115 can determine the priority of selection so as to select not only the dimensions but also the dimensions that contribute to the features of the feature points in order. That is, the dimension selection unit 1115 can select dimensions with priorities so that, for example, dimensions in the same gradient direction are not selected between adjacent sub-area blocks. Then, the dimension selection unit 1115 outputs a feature vector composed of the selected dimensions as a local feature amount. In addition, the dimension selection part 1115 can output a local feature-value in the state which rearranged the dimension based on the priority.
  • the dimension selection unit 1115 adds dimensions in the order of the sub-region blocks as shown in the matrix 1141 in FIG. 11D, for example, between 1 to 25 dimensions, 26 dimensions to 50 dimensions, and 51 dimensions to 75 dimensions. It may be selected.
  • the dimension selection unit 1115 can select the gradient direction by increasing the priority order of the sub-region blocks close to the center.
  • 11E is a diagram illustrating an example of element numbers of 150-dimensional feature vectors in accordance with the selection order of FIG. 11D.
  • the element number of the feature vector is 6 ⁇ p + q.
  • the matrix 1161 in FIG. 11F is a diagram showing that the 150-dimensional order according to the selection order in FIG. 11E is hierarchized in units of 25 dimensions.
  • the matrix 1161 in FIG. 11F is a diagram illustrating a configuration example of local feature amounts obtained by selecting the elements illustrated in FIG. 11E according to the priority order illustrated in the matrix 1141 in FIG. 11D.
  • the dimension selection unit 1115 can output dimension elements in the order shown in FIG. 11F. Specifically, for example, when outputting a 150-dimensional local feature amount, the dimension selection unit 1115 can output all 150-dimensional elements in the order shown in FIG. 11F.
  • the dimension selection unit 1115 When the dimension selection unit 1115 outputs, for example, a 25-dimensional local feature, the element 1171 in the first row (76th, 45th, 83rd,..., 120th) shown in FIG. 11F is shown in FIG. 11F. Can be output in order (from left to right). For example, when outputting a 50-dimensional local feature value, the dimension selection unit 1115 adds the elements 1172 in the second row shown in FIG. 11F in the order shown in FIG. To the right).
  • the local feature amount has a hierarchical structure arrangement. That is, for example, in the 25-dimensional local feature quantity and the 150-dimensional local feature quantity, the arrangement of the elements 1171 to 1176 in the first 25-dimensional local feature quantity is the same.
  • the dimension selection unit 1115 selects a dimension hierarchically (progressively), thereby depending on the application, communication capacity, terminal specification, etc. Feature quantities can be extracted and output.
  • the dimension selection unit 1115 can select images hierarchically, sort the dimensions based on the priority order, and output them, thereby collating images using local feature amounts of different dimensions. . For example, when images are collated using a 75-dimensional local feature value and a 50-dimensional local feature value, the distance between the local feature values can be calculated by using only the first 50 dimensions.
  • the priorities shown in the matrix 1141 in FIG. 11D to FIG. 11F are merely examples, and the order of selecting dimensions is not limited to this.
  • the order of blocks may be the order shown in the matrix 1142 in FIG. 11D or the matrix 1143 in FIG. 11D in addition to the example of the matrix 1141 in FIG. 11D.
  • the priority order may be determined so that dimensions are selected from all the sub-regions.
  • the vicinity of the center of the local region may be important, and the priority order may be determined so that the selection frequency of the sub-region near the center is increased.
  • the information indicating the dimension selection order may be defined in the program, for example, or may be stored in a table or the like (selection order storage unit) referred to when the program is executed.
  • the dimension selection unit 1115 may select a dimension by selecting one sub-region block. That is, 6 dimensions are selected in a certain sub-region, and 0 dimensions are selected in other sub-regions close to the sub-region. Even in such a case, it can be said that the dimension is selected for each sub-region so that the correlation between adjacent sub-regions becomes low.
  • the shape of the local region and sub-region is not limited to a square, and can be any shape.
  • the local region acquisition unit 1112 may acquire a circular local region.
  • the sub-region dividing unit 1113 can divide the circular local region into, for example, nine or seventeen sub-regions into concentric circles having a plurality of local regions.
  • the dimension selection unit 1115 can select a dimension in each sub-region.
  • the dimension of the feature vector generated while maintaining the information amount of the local feature value is hierarchically selected.
  • the This processing enables real-time learning object recognition and recognition result display while maintaining recognition accuracy.
  • the configuration and processing of the local feature value generation unit 602 are not limited to this example. Naturally, other processes that enable real-time object recognition and recognition result display while maintaining recognition accuracy can be applied.
  • FIG. 11G is a block diagram showing the encoding unit 603a according to the present embodiment. Note that the encoding unit is not limited to this example, and other encoding processes can be applied.
  • the encoding unit 603a has a coordinate value scanning unit 1181 that inputs the coordinates of feature points from the feature point detection unit 1111 of the local feature quantity generation unit 602 and scans the coordinate values.
  • the coordinate value scanning unit 1181 scans the image according to a specific scanning method, and converts the two-dimensional coordinate values (X coordinate value and Y coordinate value) of the feature points into one-dimensional index values.
  • This index value is a scanning distance from the origin according to scanning. There is no restriction on the scanning direction.
  • the sorting unit 1182 has a sorting unit 1182 that sorts the index values of feature points and outputs permutation information after sorting.
  • the sorting unit 1182 sorts, for example, in ascending order. You may also sort in descending order.
  • a difference calculation unit 1183 that calculates a difference value between two adjacent index values in the sorted index value and outputs a series of difference values is provided.
  • the differential encoding unit 1184 that encodes a sequence of difference values in sequence order.
  • the sequence of the difference value may be encoded with a fixed bit length, for example.
  • the bit length may be specified in advance, but this requires the number of bits necessary to express the maximum possible difference value, so the encoding size is small. Don't be. Therefore, when encoding with a fixed bit length, the differential encoding unit 1184 can determine the bit length based on the input sequence of difference values.
  • the difference encoding unit 1184 obtains the maximum value of the difference value from the input series of difference values, obtains the number of bits (expression number of bits) necessary to express the maximum value, A series of difference values can be encoded with the obtained number of expression bits.
  • the local feature encoding unit 1185 that encodes the local feature of the corresponding feature point in the same permutation as the index value of the sorted feature point.
  • the local feature amount encoding unit 1185 encodes a local feature amount that is dimension-selected from 150-dimensional local feature amounts for one feature point, for example, one dimension with one byte, and the number of dimensions. Can be encoded.
  • FIG. 11H is a diagram illustrating processing of the learning object recognition unit 703 according to the present embodiment.
  • FIG. 11H shows a state in which the local feature amount generated from the video 311 captured by the communication terminal 210 in the display screen 310 in FIG. FIG.
  • local feature amounts are generated according to the present embodiment. Then, it is verified whether or not the local feature amounts 1191 to 1194 stored in the local feature amount DB 221 corresponding to each learning object are in the local feature amounts generated from the video 311.
  • the learning object recognition unit 703 associates each feature point where the local feature quantity stored in the local feature quantity DB 221 matches the local feature quantity like a thin line.
  • the learning target object recognition unit 703 determines that the feature points match when a predetermined ratio or more of the local feature amounts match. Then, the learning object recognition unit 703 recognizes that it is a target learning object if the positional relationship between the sets of associated feature points is a linear relationship. If such recognition is performed, it is possible to recognize by size difference, orientation difference (difference in viewpoint), or inversion. In addition, since recognition accuracy can be obtained if there are a predetermined number or more of associated feature points, the learning object can be recognized even if a part of the feature points is hidden from view.
  • FIG. 11H four different learning objects in the landscape that match the local feature quantities 1191 to 1194 of the four learning objects in the local feature quantity DB 221 are recognized with a precision corresponding to the accuracy of the local feature quantity. Is done.
  • FIG. 12A is a block diagram illustrating a hardware configuration of the communication terminal 210 according to the present embodiment.
  • a CPU 1210 is a processor for arithmetic control, and implements each functional component of the communication terminal 210 by executing a program.
  • the ROM 1220 stores fixed data and programs such as initial data and programs.
  • the communication control unit 604 is a communication control unit, and in the present embodiment, communicates with the learning object recognition server 220 and the related information providing server 230 via a network. Note that the number of CPUs 1210 is not limited to one, and may be a plurality of CPUs or may include a GPU (GraphicsGraphProcessing Unit) for image processing.
  • the RAM 1240 is a random access memory that the CPU 1210 uses as a work area for temporary storage.
  • the RAM 1240 has an area for storing data necessary for realizing the present embodiment.
  • An input video 1241 indicates an input video imaged and input by the imaging unit 601.
  • the feature point data 1242 indicates feature point data including the feature point coordinates, scale, and angle detected from the input video 1241.
  • the local feature value generation table 1243 is a local feature value generation table that holds data until a local feature value is generated (see FIG. 12B).
  • the local feature value 1244 is generated using the local feature value generation table 1243 and indicates a local feature value to be sent to the learning object recognition server 220 via the communication control unit 604.
  • a learning object recognition result 1245 indicates a learning object recognition result returned from the learning object recognition server 220 via the communication control unit 604.
  • the related information / link information 1246 indicates related information and link information returned from the learning object recognition server 220 or related information returned from the related information providing server 230.
  • the display screen data 1247 indicates display screen data for notifying the user of information including the learning object recognition result 1245 and related information / link information 1246. In the case of outputting audio, audio data may be included.
  • Input / output data 1248 indicates input / output data input / output via the input / output interface 1260.
  • Transmission / reception data 1249 indicates transmission / reception data transmitted / received via the communication control unit 604.
  • the storage 1250 stores a database, various parameters, or the following data or programs necessary for realizing the present embodiment.
  • a display format 1251 indicates a display format for displaying information including the learning object recognition result 1245 and related information / link information 1246.
  • the storage 1250 stores the following programs.
  • the communication terminal control program 1252 indicates a communication terminal control program that controls the entire communication terminal 210.
  • the communication terminal control program 1252 includes the following modules.
  • the local feature value generation module 1253 indicates a module that generates a local feature value from the input video according to FIGS. 11B to 11F in the communication terminal control program 1252.
  • the local feature quantity generation module 1253 is composed of the illustrated module group, but detailed description thereof is omitted here.
  • the encoding module 1254 indicates a module that encodes the local feature generated by the local feature generating module 1253 for transmission.
  • the information reception notification module 1255 is a module for receiving the learning object recognition result 1245 and the related information / link information 1246 and notifying the user by display or voice.
  • the link destination access module 1256 is a module that accesses a link destination based on a user instruction to the link information received and notified.
  • the input / output interface 1260 interfaces input / output data with input / output devices.
  • the input / output interface 1260 is connected to a display unit 1261, a touch panel or keyboard as the operation unit 1262, a speaker 1263, a microphone 1264, and an imaging unit 601.
  • the input / output device is not limited to the above example.
  • a GPS (Global Positioning System) position generation unit 1265 is mounted, and acquires the current position based on a signal from a GPS satellite.
  • FIG. 12A only data and programs essential to the present embodiment are shown, and data and programs not related to the present embodiment are not shown.
  • FIG. 12B is a diagram showing a local feature generation table 1243 in the communication terminal 210 according to the present embodiment.
  • a plurality of detected feature points 1202, feature point coordinates 1203, and local region information 1204 corresponding to the feature points are stored in association with the input image ID 1201.
  • a local feature quantity 1209 is generated for each detected feature point 1202 from the above data.
  • Data collected by combining these with feature point coordinates is a local feature quantity 1244 transmitted to the learning object recognition server 220 generated from the captured landscape.
  • FIG. 13 is a flowchart illustrating a processing procedure of the communication terminal 210 according to the present embodiment. This flowchart is executed by the CPU 1210 of FIG. 12A using the RAM 1240, and implements each functional component of FIG.
  • step S1311 it is determined whether or not there is a video input for recognizing the learning object.
  • step S1321 data reception is determined.
  • step S1331 it is determined whether the instruction is a link destination by the user. Otherwise, other processing is performed in step S1341. Note that description of normal transmission processing is omitted.
  • step S1313 If there is video input, the process proceeds to step S1313, and local feature generation processing is executed based on the input video (see FIG. 14A).
  • step S1315 local feature quantities and feature point coordinates are encoded (see FIGS. 14B and 14C).
  • step S1317 the encoded data is transmitted to the learning object recognition server 220.
  • step S 1323 it is determined whether the learning object recognition result and the related information are received from the learning object recognition server 220 or the related information is received from the related information providing server 230. If it is reception from the learning object recognition server 220, it will progress to step S1325 and will alert
  • FIG. 14A is a flowchart illustrating a processing procedure of local feature generation processing S1313 according to the present embodiment.
  • step S1411 the position coordinates, scale, and angle of the feature points are detected from the input video.
  • step S1413 a local region is acquired for one of the feature points detected in step S1411.
  • step S1415 the local area is divided into sub-areas.
  • step S1417 a feature vector for each sub-region is generated to generate a feature vector for the local region. The processing of steps S1411 to S1417 is illustrated in FIG. 11B.
  • step S1419 dimension selection is performed on the feature vector of the local region generated in step S1417.
  • the dimension selection is illustrated in FIGS. 11D to 11F.
  • step S1421 it is determined whether the generation of local features and dimension selection have been completed for all feature points detected in step S1411. If not completed, the process returns to step S1413 to repeat the process for the next one feature point.
  • FIG. 14B is a flowchart illustrating a processing procedure of the encoding processing S1315 according to the present embodiment.
  • step S1431 the coordinate values of feature points are scanned in a desired order.
  • step S1433 the scanned coordinate values are sorted.
  • step S1435 a difference value of coordinate values is calculated in the sorted order.
  • step S1437 the difference value is encoded (see FIG. 14C).
  • step S1439 local feature amounts are encoded in the coordinate value sorting order. The difference value encoding and the local feature amount encoding may be performed in parallel.
  • FIG. 14C is a flowchart illustrating a processing procedure of difference value encoding processing S1437 according to the present embodiment.
  • step S1441 it is determined whether or not the difference value is within a range that can be encoded. If it is within the range which can be encoded, it will progress to step S1447 and will encode a difference value. Then, control goes to a step S1449. If it is not within the range that can be encoded (outside the range), the process proceeds to step S1443 to encode the escape code.
  • step S1445 the difference value is encoded by an encoding method different from the encoding in step S1447. Then, control goes to a step S1449.
  • step S1449 it is determined whether the processed difference value is the last element in the series of difference values. If it is the last, the process ends. When it is not the last, it returns to step S1441 again and the process with respect to the next difference value of the series of a difference value is performed.
  • FIG. 15 is a block diagram illustrating a hardware configuration of the learning object recognition server 220 according to the present embodiment.
  • a CPU 1510 is a processor for arithmetic control, and implements each functional component of the learning object recognition server 220 in FIG. 7 by executing a program.
  • the ROM 1520 stores fixed data and programs such as initial data and programs.
  • the communication control unit 701 is a communication control unit, and in this embodiment, communicates with the communication terminal 210 or the related information providing server 230 via a network. Note that the number of CPUs 1510 is not limited to one, and may be a plurality of CPUs or may include a GPU for image processing.
  • the RAM 1540 is a random access memory that the CPU 1510 uses as a work area for temporary storage.
  • the RAM 1540 has an area for storing data necessary for realizing the present embodiment.
  • the received local feature value 1541 indicates a local feature value including the feature point coordinates received from the communication terminal 210.
  • the read local feature value 1542 indicates the local feature value when including the feature point coordinates read from the local feature value DB 221.
  • the learning target object recognition result 1543 indicates a learning target object recognition result recognized from collation between the received local feature value and the local feature value stored in the local feature value DB 221.
  • the related information 1544 indicates related information retrieved from the related information DB 222 in correspondence with the learning object of the learning object recognition result 1543.
  • the link information 1545 indicates link information searched from the link information DB 223 corresponding to the learning object of the learning object recognition result 1543.
  • Transmission / reception data 1546 indicates transmission / reception data transmitted / received via the communication control unit 701.
  • the storage 1550 stores a database, various parameters, or the following data or programs necessary for realizing the present embodiment.
  • the local feature DB 221 is a local feature DB similar to that shown in FIG.
  • the related information DB 222 is a related information DB similar to that shown in FIG.
  • the link information DB 223 shows the same link information DB as shown in FIG.
  • the storage 1550 stores the following programs.
  • the learning target object recognition server control program 1551 indicates a learning target object recognition server control program that controls the entire learning target object recognition server 220.
  • the local feature DB creation module 1552 indicates a module that generates a local feature from a learning target image and stores it in the local feature DB 221 in the learning target recognition server control program 1551.
  • the learning target object recognition module 1553 is a module that recognizes the learning target object in the learning target object recognition server control program 1551 by comparing the received local feature value with the local feature value stored in the local feature value DB 221.
  • the related information / link information acquisition module 1554 indicates a module that acquires related information and link information from the related information DB 222 and the link information DB 223 corresponding to the recognized learning object.
  • the recognition result / information transmission module 1555 indicates a module that transmits a recognized learning object name, acquired related information, and link information.
  • FIG. 15 shows only data and programs essential to the present embodiment, and does not illustrate data and programs not related to the present embodiment.
  • FIG. 16 is a flowchart showing a processing procedure of the learning object recognition server 220 according to the present embodiment. This flowchart is executed by the CPU 1510 of FIG. 15 using the RAM 1540, and implements each functional component of the learning object recognition server 220 of FIG.
  • step S1611 it is determined whether or not a local feature DB is generated.
  • step S1621 it is determined whether a local feature amount is received from the communication terminal. Otherwise, other processing is performed in step S1641.
  • step S1613 If the local feature DB is generated, the process advances to step S1613 to execute a local feature DB generation process (see FIG. 17). If a local feature is received, the process proceeds to step S1623 to perform learning object recognition processing (see FIGS. 18A and 18B).
  • step S1625 related information and link information corresponding to the recognized learning object are acquired. Then, the recognized learning object name, related information, and link information are transmitted to the communication terminal 210.
  • FIG. 17 is a flowchart showing a processing procedure of local feature DB generation processing S1613 according to the present embodiment.
  • step S1701 an image of a learning object is acquired.
  • step S1703 the position coordinates, scale, and angle of the feature points are detected.
  • step S1705 a local region is acquired for one of the feature points detected in step S1703.
  • step S1707 the local area is divided into sub-areas.
  • step S1709 a feature vector for each sub-region is generated to generate a local region feature vector. The processing from step S1705 to S1709 is illustrated in FIG. 11B.
  • step S1711 dimension selection is performed on the feature vector of the local region generated in step S1709.
  • the dimension selection is illustrated in FIGS. 11D to 11F.
  • hierarchization is performed in dimension selection, but it is desirable to store all generated feature vectors.
  • step S1713 it is determined whether generation of local feature values and dimension selection have been completed for all feature points detected in step S1703. If not completed, the process returns to step S1705 to repeat the process for the next one feature point. If all feature points have been completed, the process advances to step S1715 to register local feature values and feature point coordinates in the local feature value DB 221 in association with the learning object.
  • step S1717 it is determined whether there is an image of another learning object. If there is an image of another learning object, the process returns to step S1701 to acquire an image of another learning object and repeat the process.
  • FIG. 18A is a flowchart showing a processing procedure of learning object recognition processing S1623 according to the present embodiment.
  • step S1811 the local feature amount of one learning object is acquired from the local feature amount DB 221.
  • step S1813 the local feature amount of the learning object is collated with the local feature amount received from the communication terminal 210 (see FIG. 18B).
  • step S1815 it is determined whether or not they match. If they match, the process proceeds to step S1821, and the matched learning object is stored as being in the image captured by the communication terminal 210.
  • step S1817 it is determined whether or not all learning objects registered in the local feature DB 221 have been collated, and if there is any remaining, the process returns to step S1811 to repeat the collation of the next learning object.
  • the field may be limited in advance in order to reduce the load on the learning time object recognition server or the drill time process by improving the processing speed.
  • FIG. 18B is a flowchart showing a processing procedure of collation processing S1813 according to the present embodiment.
  • step S1833 a smaller number of dimensions is selected between the dimension number i of the local feature quantity in the local feature quantity DB 221 and the dimension number j of the received local feature quantity.
  • step S1835 data of the selected number of dimensions of the p-th local feature amount of the learning target stored in the local feature amount DB 221 is acquired. That is, the number of dimensions selected from the first one dimension is acquired.
  • step S1837 the p-th local feature value acquired in step S1835 and the local feature values of all feature points generated from the input video are sequentially checked to determine whether or not they are similar.
  • step S1839 it is determined whether or not the similarity exceeds the threshold value ⁇ from the result of collation between the local feature amounts.
  • the local feature amount matches the input image and the learning object in step S1841.
  • a combination with the positional relationship of feature points is stored.
  • q which is a parameter for the number of matched feature points, is incremented by one.
  • the feature point of the learning object is advanced to the next feature point (p ⁇ p + 1), and if all feature points of the learning object have not been collated (p ⁇ m), the process returns to step S1835.
  • the threshold value ⁇ can be changed according to the recognition accuracy required by the learning object.
  • the learning object has a low correlation with other learning objects, accurate recognition is possible even if the recognition accuracy is lowered.
  • step S1845 it is determined whether or not the ratio of the feature point number q that matches the local feature amount of the feature point of the input video among the feature point number p of the learning object exceeds the threshold value ⁇ . If it exceeds, the process proceeds to step S1849, and it is further determined as a learning target candidate whether the positional relationship between the feature point of the input video and the feature point of the learning target has a relationship that allows linear transformation. .
  • step S1841 the positional relationship between the feature point of the input image and the feature point of the learning target stored as the local feature amount matches in step S1841 is possible even by a change such as rotation, inversion, or change of the viewpoint position. Or whether the positional relationship cannot be changed. Since such a determination method is geometrically known, detailed description thereof is omitted. If it is determined in step S1851 that the linear conversion is possible, the process proceeds to step S953 to determine that the collated learning object exists in the input video. Note that the threshold value ⁇ can be changed according to the recognition accuracy required by the learning object.
  • the learning object has a low correlation with other learning objects or a feature can be determined even from a part, accurate recognition is possible even if there are few matching feature points. That is, even if a part is hidden and cannot be seen, or if a characteristic part is visible, the learning object can be recognized.
  • the processing for storing all the learning objects in the local feature DB 221 and collating all the learning objects is very heavy. Therefore, for example, before recognizing the learning object from the input video, it is conceivable that the user selects a range of the learning object from the menu, searches the range from the local feature DB 221 and collates the range. Also, the load can be reduced by storing only the local feature amount in the range used by the user in the local feature amount DB 221.
  • the information processing system according to the present embodiment is different from the second embodiment in that related information is automatically accessed from a link destination even if the user does not perform a link destination access operation. Since other configurations and operations are the same as those of the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
  • FIG. 19 is a sequence diagram showing an operation procedure of the information processing system according to the present embodiment.
  • operations similar to those in FIG. 5 of the second embodiment are denoted by the same step numbers, and description thereof is omitted.
  • steps S400 and S401 although there is a possibility that there is a difference between applications and data, download, activation and initialization are performed in the same manner as in FIGS.
  • the learning object recognition server 220 that has recognized the learning object in the video from the local feature amount of the video received from the communication terminal 210 in step S411 refers to the link information DB 223 in step S513 and recognizes the learning target object. Get link information corresponding to.
  • a link destination is selected in step S1915.
  • the selection of the link destination may be performed based on, for example, an instruction from a user using the communication terminal 210 or user recognition by the learning object recognition server 220, but detailed description thereof is omitted here.
  • step S1917 access is performed with the learning object ID that recognizes the related information providing server 230 of the link destination based on the link information.
  • the communication terminal ID that has transmitted the local feature amount of the video by the link destination access is also transmitted.
  • the related information providing server 230 acquires learning object related information (including document data and voice data) corresponding to the learning object ID accompanying the access from the related information DB 231.
  • the related information is returned to the access source communication terminal 210.
  • the transmitted communication terminal ID is used.
  • the communication terminal 210 that has received the reply of the related information displays or outputs the received related information in step S527.
  • the learning object recognition server 220 may be configured to receive a reply from the link destination and relay it to the communication terminal 210.
  • the communication terminal 210 may be configured such that when link information is received, automatic access to the link destination is performed and a reply from the link destination is notified.
  • the information processing system according to the present embodiment is obtained by applying the second and third embodiments to a learning object including a language. Since other configurations and operations are the same as those in the second embodiment or the third embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
  • FIG. 20A is a sequence diagram illustrating an operation procedure of book recognition in the information processing system according to the present embodiment.
  • the same step number is attached
  • steps S400 and S401 although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
  • step S2013 the back cover of the book and the advertisement image of the book are taken by the communication terminal 210.
  • the video is represented by a back cover or a promotional image, but it may be a book case, a cover, an imprint, a table of contents, or other book-related images.
  • the learning object recognition server 220 receives the local feature amount from the communication terminal 210, and recognizes the book with reference to the local feature amount DB 221 in step S2021. If it is a response such as a book name, the recognition result is transmitted from the learning object recognition server 220 to the communication terminal 210 in step S2023. When introducing the contents of a book by display or voice, in step S2025, the contents introduction DB 2022 is referred to, and contents introduction data corresponding to the recognized book is acquired. In step S2027, the learning object recognition server 220 transmits the recognition result and the content introduction data to the communication terminal 210.
  • the communication terminal 210 receives the recognition result or the content introduction from the learning object recognition server 220, and notifies the user of the recognition result or the content by display and / or voice in step S2029.
  • step S2031 when content introduction is acquired from the link related information providing server 230, in step S2031, the link information DB 223 is referred to, and the link destination corresponding to the recognized book is acquired.
  • step S2033 the learning object recognition server 220 accesses the link destination.
  • step S2035 the linked related information providing server 230 refers to the content introduction DB 2023 and acquires content introduction data corresponding to the book.
  • step S2037 the related information providing server 230 transmits the recognition result and the content introduction data to the communication terminal 210.
  • the communication terminal 210 receives the recognition result or the content introduction from the related information providing server 230, and notifies the user of the recognition result or the content by display and / or voice in step S2039.
  • the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2033.
  • the learning target object recognition server 220 automatically accesses the link destination. However, the learning target object recognition server 220 returns the link destination to the communication terminal 210, and issues a link instruction in the communication terminal 210. It may be configured to wait.
  • FIG. 20B is a sequence diagram illustrating an operation procedure of page recognition in the information processing system according to the present embodiment.
  • the same step number is attached
  • steps S400 and S401 although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
  • a book is opened by the communication terminal 210 and a page is shot.
  • a page may be a two-page spread, a single page, a part of the page, a photograph, a diagram, a table, or the like in the page.
  • the learning target object recognition server 220 receives the local feature value from the communication terminal 210, and recognizes the page with reference to the local feature value DB 221 in step S2051.
  • the page information DB 2024 is referred to, and page information based on the reading voice corresponding to the recognized page is acquired.
  • page data of page reading voice is transmitted from the learning object recognition server 220 to the communication terminal 210.
  • the communication terminal 210 receives the page data from the learning object recognition server 220, and notifies the user of the page content by reproducing the page reading voice in step S2057.
  • step S2061 when acquiring page information from the related information providing server 230 of the link destination, in step S2061, the link information corresponding to the recognized page is acquired with reference to the link information DB 223. In step S2063, the link destination is accessed from the learning object recognition server 220.
  • step S2065 the link related information providing server 230 refers to the page information DB 2025 and acquires page information corresponding to the page.
  • step S2067 the page information of the page reading voice is transmitted from the related information providing server 230 to the communication terminal 210.
  • the communication terminal 210 receives the page data from the related information providing server 230, and notifies the user of the page content by reproducing the page reading voice in step S2069.
  • the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2063.
  • the learning object recognition server 220 may return a link destination to the communication terminal 210 and wait for a link instruction in the communication terminal 210, as in FIG. 20B.
  • FIG. 20C is a sequence diagram illustrating an operation procedure for kanji recognition in the information processing system according to the present embodiment.
  • the same step number is attached
  • steps S400 and S401 although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
  • step S2073 the communication terminal 210 shoots kanji, idioms, and sentences in the book.
  • the cover may be taken as shown in FIG. 20A or the page may be taken as shown in FIG. 20B.
  • the learning object recognition server 220 receives the local feature amount from the communication terminal 210, and recognizes kanji, idioms, and sentences with reference to the local feature amount DB 221 in step S2081.
  • the dictionary DB 2026 is referred to, and the display or voice reading or meaning corresponding to the recognized kanji, idiom, or sentence is acquired.
  • the learning object recognition server 220 transmits display / audio data indicating the reading and meaning to the communication terminal 210.
  • the communication terminal 210 receives display / audio data indicating how to read and the meaning from the learning object recognition server 220, and notifies the user of the reading and the meaning by displaying and reproducing the sound in step S2087.
  • step S2091 when acquiring the reading and meaning from the related information providing server 230 of the link destination, in step S2091, the link information corresponding to the recognized kanji, idiom, and sentence is acquired with reference to the link information DB 223. In step S2093, the learning target object recognition server 220 accesses the link destination.
  • step S2095 the linked related information providing server 230 refers to the dictionary DB 2027, and acquires readings and meanings corresponding to kanji, idioms, and sentences.
  • step S2097 the related information providing server 230 transmits display / audio data indicating the reading and meaning to the communication terminal 210.
  • the communication terminal 210 receives the page data from the related information providing server 230, and notifies the user in step S2099 by displaying kanji, idioms, how to read and meaning the sentence, and reproducing the voice data.
  • the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2093.
  • the display and voice notification in FIG. 20C is preferably displayed for images including a plurality of kanji, idioms, and sentences, and if one kanji, idiom, sentence video, or a part thereof, is notified by voice. desirable.
  • the learning target object recognition server 220 may return the link destination to the communication terminal 210 and wait for a link instruction in the communication terminal 210.
  • FIG. 21A is a diagram showing a configuration of the content introduction DB 2022 or 2023 according to the present embodiment.
  • the content introduction DB 2022 and the content introduction DB 2023 have basically the same configuration, but considering the storage capacity, the content introduction DB 2023 provides more detailed content or more items than the content introduction DB 2022. it can.
  • the content introduction DB 2022 or 2023 stores a work name 2112, a writer 2113, a publisher 2114, an issue date 2115, and content introduction information 2116 including display data and audio data in association with the book ID 2111. All may be included in the content introduction information.
  • FIG. 21B is a diagram showing a configuration of the page information DB 2024 or 2025 according to the present embodiment. Note that the page information DB 2024 and the page information DB 2025 have basically the same configuration, but considering the storage capacity, the page information DB 2025 provides more detailed contents or more items than the page information DB 2024. it can.
  • the page information DB 2024 or 2025 stores the first reading data / speaker 2124 and the second reading data / speaker 2125 in association with the page number 2122 and the chapter / part information 2123 in association with the book ID 2121.
  • FIG. 21C is a diagram showing a configuration of the dictionary DB 2026 or 2027 according to the present embodiment.
  • the dictionary DB 2026 and the dictionary DB 2027 have basically the same configuration, but considering the storage capacity, the dictionary DB 2027 can prepare more detailed contents or more items than the dictionary DB 2026.
  • the dictionary DB 2026 or 2027 has, for example, three parts, a kanji DB 2130, an idiom DB 2140, and a sentence DB 2150. All may be integrated.
  • the kanji DB 2130 stores kanji reading data 2132 composed of display and voice, reading data 2133, and explanation data (meaning / usage) 2134 in association with the kanji ID 2131.
  • the idiom DB 2140 stores reading data 2142 and comment data (meaning / usage) 2143 composed of display and voice in association with the idiom ID 2141.
  • the text DB 2150 stores reading data 2152 and comment data (meaning / usage) 2153 composed of display and voice in association with the text ID 2151.
  • the sentence DB 2150 may include proverbs, haiku, waka, and the like.
  • FIG. 21D is a diagram showing a configuration of the translation dictionary DB 2100 according to the present embodiment.
  • FIG. 21D illustrates the configuration of a translation dictionary from Japanese to a foreign language, but the same applies to other translation dictionaries.
  • the translation dictionary DB 2100 includes, for example, three parts, a word DB 2160, a phrase DB 2170, and a sentence DB 2180. All may be integrated.
  • the word DB 2160 stores, in association with the Japanese ID 2161, English word data 2162 composed of notation and voice, other language data 2163, and explanation data (meaning / usage) 2164.
  • the phrase DB 2170 stores English phrase data 2172 composed of notation and speech, other language phrase data 2173, and explanation data (meaning / usage) 2174 in association with the Japanese phrase ID 2171.
  • the sentence DB 2180 stores, in association with the Japanese sentence ID 2181, English sentence data 2182 composed of notation and speech, other language sentence data 2183, and explanation data (meaning / usage) 2184.
  • the phrase DB 2170 and the sentence DB 2180 may include proverbs, haiku, waka, poetry, and the like.
  • the information processing system according to the present embodiment is obtained by applying the second embodiment and the third embodiment to a learning object including sound. Since other configurations and operations are the same as those in the second embodiment or the third embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
  • FIG. 22A is a sequence diagram showing an operation procedure of music recognition in the information processing system according to the present embodiment.
  • the same step number is attached
  • steps S400 and S401 although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
  • step S2213 the communication terminal 210 shoots a music jacket, a CD, or a concert promotional image.
  • the image is represented by a jacket or a promotional image, but other music-related images may be used.
  • the learning object recognition server 220 receives the local feature amount from the communication terminal 210, and performs music recognition with reference to the local feature amount DB 221 in step S2221. If the response is an album name, a performer, concert information, etc., the recognition result is transmitted from the learning object recognition server 220 to the communication terminal 210 in step S2223.
  • the contents introduction DB 2222 is referred to, and contents introduction data corresponding to the recognized music is acquired.
  • the learning object recognition server 220 transmits the recognition result and the content introduction data to the communication terminal 210.
  • the communication terminal 210 receives the recognition result or the content introduction from the learning object recognition server 220, and notifies the user of the recognition result or the content by display and / or voice in step S2229.
  • step S2231 when content introduction is acquired from the link destination related information providing server 230, in step S2231, the link information DB 223 is referred to, and the link destination corresponding to the recognized music is acquired. In step S2233, the learning target object recognition server 220 accesses the link destination.
  • step S2235 the linked related information providing server 230 refers to the content introduction DB 2023 and acquires content introduction data corresponding to music.
  • step S2237 the related information providing server 230 transmits the recognition result and the content introduction data to the communication terminal 210.
  • the communication terminal 210 receives the recognition result or the content introduction from the related information providing server 230, and notifies the user of the recognition result or the content with display and / or voice in step S2239.
  • the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2233.
  • the learning object recognition server 220 automatically accesses the link destination. However, the learning object recognition server 220 returns the link destination to the communication terminal 210, and gives a link instruction in the communication terminal 210. It may be configured to wait.
  • FIG. 22B is a sequence diagram showing an operation procedure of music recognition in the information processing system according to the present embodiment.
  • the same step number is attached
  • steps S400 and S401 although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
  • step S2243 the cover or page of the score is photographed by the communication terminal 210.
  • the page may be two pages, one page, or a part of the page.
  • the learning object recognition server 220 receives the local feature amount from the communication terminal 210, and in step S2251, refers to the local feature amount DB 221 to perform song recognition.
  • step S2253 the performance information which is music performance data corresponding to the recognized music is acquired with reference to the performance information DB 2224.
  • step S ⁇ b> 2255 the song audio data is transmitted from the learning object recognition server 220 to the communication terminal 210.
  • the communication terminal 210 receives the music performance data from the learning object recognition server 220, reproduces the music, and notifies the user in step S2257.
  • step S2261 when the performance information is acquired from the linked related information providing server 230, in step S2261, the link destination corresponding to the recognized music is acquired with reference to the link information DB 223. In step S 2263, the link destination is accessed from the learning object recognition server 220.
  • the linked related information providing server 230 refers to the performance information DB 2225 in step S2265 and acquires performance information corresponding to the song.
  • the music performance data is transmitted from the related information providing server 230 to the communication terminal 210.
  • the communication terminal 210 receives the music performance data from the related information providing server 230, and reproduces the music and notifies the user in step S2269.
  • the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2263.
  • the learning object recognition server 220 may return the link destination to the communication terminal 210 and wait for a link instruction in the communication terminal 210, as in FIG. 22B.
  • FIG. 22C is a sequence diagram showing an operation procedure of sound recognition in the information processing system according to the present embodiment.
  • the same step number is attached
  • steps S400 and S401 although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
  • step S2273 the communication terminal 210 shoots notes and bars in the score.
  • the learning object recognition server 220 receives the local feature amount from the communication terminal 210, and recognizes a note or a measure with reference to the local feature amount DB 221 in step S2281. Next, in step S2283, the sound information DB 2226 is referred to, and a sound or a sound string corresponding to the recognized note or measure is acquired. In step S2285, the learning object recognition server 220 transmits sound and sound string data to the communication terminal 210.
  • the communication terminal 210 receives the sound data from the learning object recognition server 220 and notifies the user by reproducing the sound in step S2287.
  • step S2291 when the sound information is acquired from the linked related information providing server 230, in step S2291, the link information corresponding to the recognized sound or sound string is acquired with reference to the link information DB 223. In step S2293, the learning target object recognition server 220 accesses the link destination.
  • step S2295 the linked related information providing server 230 refers to the sound information DB 2227 and acquires sound data corresponding to the sound.
  • step S2297 the related information providing server 230 transmits sound data to the communication terminal 210.
  • the communication terminal 210 receives the sound data from the related information providing server 230, and notifies the user by reproducing the sound data in step S2299.
  • the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2293.
  • the learning object recognition server 220 may return the link destination to the communication terminal 210 and wait for a link instruction in the communication terminal 210, as in FIG. 22C.
  • FIG. 23A is a diagram showing a configuration of the content introduction DB 2222 or 2223 according to the present embodiment.
  • the content introduction DB 2222 and the content introduction DB 2223 have basically the same configuration, but considering the storage capacity, the content introduction DB 2223 provides more detailed content or more items than the content introduction DB 2222. it can.
  • the content introduction DB 2222 or 2223 stores content introduction information 2315 including a performer / singer 2312, a recording location 2313, a recording date / release date 2314, display data and audio data in association with the CD / DVD / record jacket ID 2311. To do.
  • the CD / DVD / record jacket ID 2311 may be a concert ID.
  • Each CD / DVD / record jacket ID 2311 includes a plurality of song IDs 2316 and a song introduction 2317. All may be stored as content introduction information.
  • FIG. 23B is a diagram showing a configuration of the performance information DB 2224 or 2225 according to the present embodiment.
  • the performance information DB 2224 and the performance information DB 2225 have basically the same configuration, but considering the storage capacity, the performance information DB 2225 provides more detailed contents or more items than the performance information DB 2224. it can.
  • the performance information DB 2224 or 2225 stores a song name 2322, a first song reproduction data 2323 by the first player, and second song reproduction data 2324 by the second player in association with the song ID 2321.
  • the performer can be replaced by a conductor or singer.
  • FIG. 23C is a diagram showing a configuration of the sound information DB 2226 or 2227 according to the present embodiment.
  • the sound information DB 2226 and the sound information DB 2227 have basically the same configuration, but considering the storage capacity, the sound information DB 2227 provides more detailed contents or more items than the sound information DB 2226. it can.
  • the sound information DB 2226 or 2227 includes a measure DB 2330 that stores reproduction data in units of measures and a sound DB 2340 that stores reproduction data in units of sounds.
  • the measure DB 2330 stores a song name (or song ID) 2332 including the measure and measure reproduction data 2333 in association with the measure ID 2331.
  • the sound DB 2340 stores a sound name / floor name 2342, first sound reproduction data 2343 by Viano, second sound reproduction data 2344 by violin, and third sound reproduction data 2345 by flute in association with the sound ID 2341.
  • the type of musical instrument is not limited to this example. It may be the voice of a singer.
  • the information processing system according to the present embodiment is an application of the second and third embodiments to an exhibit. Since other configurations and operations are the same as those in the second embodiment or the third embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted. Exhibits include materials from museums and folk museums, paintings and sculptures from museums, and exhibitions from exhibitions and exhibitions.
  • the exhibit can be recognized and related information can be learned.
  • FIG. 24A is a sequence diagram illustrating an operation procedure for exhibit recognition in the information processing system according to the present embodiment.
  • the same step number is attached
  • steps S400 and S401 although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
  • step S 2413 the exhibit is photographed by the communication terminal 210.
  • the learning object recognition server 220 receives the local feature amount from the communication terminal 210, and recognizes the exhibit with reference to the local feature amount DB 221 in step S2421. If it is a response such as an exhibit name, the recognition result is transmitted from the learning object recognition server 220 to the communication terminal 210 in step S2423.
  • the contents of the exhibit are introduced by display or voice
  • the contents introduction DB 2422 is referred to and content introduction data corresponding to the recognized exhibit is acquired.
  • the learning object recognition server 220 transmits the recognition result and the content introduction data to the communication terminal 210.
  • the communication terminal 210 receives the recognition result or the content introduction from the learning object recognition server 220, and notifies the user of the recognition result or the content by display and / or voice in step S2229.
  • step S2231 when content introduction is acquired from the linked related information providing server 230, in step S2231, the link information DB 223 is referred to, and the link destination corresponding to the recognized exhibit is acquired. In step S2233, the learning target object recognition server 220 accesses the link destination.
  • step S2235 the linked related information providing server 230 refers to the content introduction DB 2023 and acquires content introduction data corresponding to the exhibit.
  • step S2237 the related information providing server 230 transmits the recognition result and the content introduction data to the communication terminal 210.
  • the communication terminal 210 receives the recognition result or the content introduction from the related information providing server 230, and notifies the user of the recognition result or the content with display and / or voice in step S2239.
  • the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2233.
  • the learning object recognition server 220 automatically accesses the link destination. However, the learning object recognition server 220 returns the link destination to the communication terminal 210, and gives a link instruction in the communication terminal 210. It may be configured to wait.
  • FIG. 24B is a diagram showing a configuration of the content introduction DB 2422 or 2423 according to the present embodiment.
  • the content introduction DB 2422 and the content introduction DB 2423 basically have the same configuration, but considering the storage capacity, the content introduction DB 2423 provides more detailed contents or more items than the content introduction DB 2422. it can.
  • the content introduction DB 2422 or 242 will be described separately from the local feature DB 221, but may be provided integrally with the local feature.
  • the content introduction DB 2022 or 2023 stores a name (author, age) 2402, related display data 2403, and related audio data 2404 in association with the exhibit ID 2401.
  • FIG. 25 is a sequence diagram showing an operation procedure of mathematical expression recognition in the information processing system according to the present embodiment.
  • the same step number is attached
  • steps S400 and S401 although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
  • step S2503 the communication terminal 210 captures a mathematical expression.
  • a mathematical expression for example, photographing of a graph image or a straight / curved image may be used.
  • the learning object recognition server 220 receives the local feature amount from the communication terminal 210, and in step S2511, refers to the local feature amount DB 221 and recognizes a mathematical formula or the like. Next, in step S2513, the formula DB 2522 is referred to, and formula related data including formulas and calculation examples including variables corresponding to the recognized formulas is acquired. In step S ⁇ b> 2517, the learning target object recognition server 220 transmits data of mathematical formulas and calculation examples to the communication terminal 210.
  • the communication terminal 210 receives the data of the mathematical formula and the calculation example from the learning object recognition server 220, and notifies the user of the data of the mathematical formula and the calculation example in step S2519.
  • step S2519 in communication terminal 210, it is determined whether the user has input a value to a variable in the mathematical expression. If there is no variable input, the process ends.
  • step S2521 where the variable is substituted into the received mathematical expression to calculate the mathematical expression.
  • step S2523 the calculation result is displayed. If necessary, the calculation result is transmitted to the learning object recognition server 220 in step S2525.
  • the calculation result from the communication terminal 210 can be accumulated in step S2527 and used for information collection or the like.
  • the link destination corresponding to the recognized mathematical formula or the like is acquired with reference to the link information DB 223 in step S2531.
  • the link destination is accessed from the learning object recognition server 220.
  • step S2535 the linked related information providing server 230 refers to the formula DB 2523 and acquires formula related data.
  • step S2537 the mathematical formula related data is transmitted from the related information providing server 230 to the communication terminal 210.
  • the communication terminal 210 receives the mathematical expression related data from the related information providing server 230, and notifies the user of the mathematical expression and the data of the calculation example in step S2539.
  • the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2533.
  • the learning object recognition server 220 automatically accesses the link destination. However, the learning object recognition server 220 returns the link destination to the communication terminal 210, and issues a link instruction in the communication terminal 210. It may be configured to wait.
  • FIG. 26A is a diagram showing a configuration of the formula DB 2522 or 2523 according to the present embodiment. Note that the formula DB 2522 and the formula DB 2523 have basically the same configuration, but considering the storage capacity, the formula DB 2523 can provide more detailed contents or more items than the formula DB 2522.
  • the mathematical formula DB 2522 will be described separately from the local feature value DB 221, but may be provided integrally with the local feature value.
  • the formula DB 2522 or 2523 stores a formula name 2612, formula data 2613 that represents a formula with a symbol, a variable 2614 used in the formula, and a constant 2615 in the formula in association with the formula ID 2611.
  • FIG. 26B is a diagram showing a configuration of a calculation parameter table 2600 according to the present embodiment.
  • the calculation parameter table 2600 is a table created in the RAM of the communication terminal or the server when a calculation is executed by substituting variables or constants into mathematical expressions.
  • each variable value 2622 used in the formula each constant value 2623 used in the formula, each variable value 2622, and the calculation result value 2624 using the constant value 2623 are associated with the formula ID 2621.
  • the information processing system according to the present embodiment searches for the learning object and notifies the user if the learning object to be searched is registered.
  • the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
  • the user can search for a desired learning object in real time.
  • 27A and 27B are diagrams illustrating examples of display screens of the communication terminal in the information processing system according to the present embodiment.
  • FIG. 27A The upper part of FIG. 27A is a diagram showing an example of searching for a work created by a child from an exhibition place or a placement place.
  • the communication terminal 2710 takes a picture of a work made by a child and obtains a video 2720.
  • a local feature amount is generated based on the video 2720 of this work, and is registered in the local feature amount DB of the communication terminal 2710.
  • the communication terminal 2710 captures an image 2730 of the exhibition place and place of the work.
  • the communication terminal 2710 generates a local feature amount based on the video 2730.
  • the position of the work created by the child is recognized by comparing the local feature amount of the previously registered work with the local feature value of the image 2730 of the exhibition place and placement place of the work.
  • a comment 2731 such as “I am here” is superimposed on the work at the recognized position as a search result.
  • the position of the child's work can be located in real time according to the dimension-selected local features.
  • FIG. 27A The lower part of FIG. 27A is a diagram showing an example of finding where a child who participated in a school performance or a concert is.
  • a picture of a child is taken by the communication terminal 2710 to obtain a video 2740.
  • a local feature value is generated based on the video 2740 and registered in the local feature value DB of the communication terminal 2710.
  • the communication terminal 2710 captures an image 2750 of a school performance or performance.
  • the communication terminal 2710 generates a local feature amount based on the video 2750.
  • the position of the child is recognized by comparing the local feature amount from the previously registered child's photograph with the local feature amount of the video 2750 of the school performance or performance.
  • a comment 2751 such as “I am here” is superimposed and displayed as a search result for the child at the recognized position.
  • the position of the child is determined in real time according to the dimension-selected local features. Can be located.
  • FIG. 27B is a process in which the process in the lower part of FIG. 27A is further improved.
  • the left diagram and the central diagram in FIG. 27B are the same as the left diagram and the right diagram in the lower part of FIG. 27A.
  • the communication terminal 2710 zooms in on the position of the child, so that an enlarged image of the child can be acquired.
  • FIG. 28 is a block diagram showing a functional configuration of a communication terminal 2710 according to this embodiment.
  • the same functional components as those in FIG. 6 of the second embodiment are denoted by the same reference numerals, and description thereof is omitted.
  • the registration / search determination unit 2801 determines whether the video captured by the communication terminal 2710 by the imaging unit 601 is a video registered as a search target in the local feature DB 2821 or a video for searching a search object.
  • the determination by the registration / search determination unit 2801 may be an operation by a user, or may be automatically determined based on an area ratio of an object in the image on the image screen. For example, when a search object is registered, the search object is imaged on the entire screen, so an area ratio equal to or greater than a predetermined threshold is set as the registration screen.
  • the local feature amount registration unit 2802 determines that the search object is registered, the local feature amount registration unit 2802 registers the local feature amount generated by the local feature amount generation unit 602 in the local feature amount DB 2821. On the other hand, if it is determined that the search is for a registered search object, the search object recognition unit 2803 generates the local feature value generated by the local feature value generation unit 602 and the local feature value of the search object registered in the local feature value DB 2821. Are matched.
  • reporting part 2804 will alert
  • the zoom control unit 2805 controls the imaging unit 601 to zoom in on the position of the search object in order to enlarge the image of the search object.
  • the configuration of the local feature DB 2821 is the same as the configuration shown in FIG. 8 of the second embodiment except that a new local feature of a search object is registered, and thus the description thereof is omitted.
  • the search object DB 2822 stores information input by the user regarding the search object, and may be provided in the local feature DB 2821 instead of being essential.
  • FIG. 29 is a flowchart showing a processing procedure of the communication terminal according to the present embodiment. This flowchart is also executed by the CPU 1210 of FIG. 12A using the RAM 1240, and implements the functional configuration unit of FIG. In FIG. 29, the local feature quantity generation processing is the same as that in FIG. 14, and therefore the same step number S1313 is assigned and description thereof is omitted.
  • step S2911 it is determined whether or not the search object (the work or the child in FIG. 27A) is registered. Further, in step S2921, it is determined whether or not search object search processing is performed. If it is neither, other processing is executed in step S2941.
  • step S2913 If it is a registration process, it will progress to step S2913 and will acquire the image of a search thing.
  • step S1313 a local feature amount of the acquired search object image is generated.
  • step S2917 the generated local feature value is associated with the search object and registered in the local feature value DB 2821. At the same time, necessary search object information is registered in the search object DB 2822.
  • step S2923 If it is a search process, it will progress to step S2923 and the image
  • step S1313 a local feature amount of the acquired video is generated.
  • step S2927 whether the local feature amount of the search object matches at least a part of the local feature amount of the video is collated, and the search object is recognized. If the search object is not found, the process returns from step S2929 to step S2923 to acquire another video (actually, the imaging direction / area of the communication terminal 2710 is changed), and the search for the search object is repeated.
  • step S2931 determines whether or not to perform zoom processing. Such determination may be set by the user.
  • step S2933 an enlarged image of the search object zoomed in is acquired.
  • step S2935 a comment indicating the presence of the search object is displayed at the position of the search object (see FIG. 27A).
  • the information processing system according to the present embodiment is different from the first to eighth embodiments in that the communication terminal performs all processes including learning object recognition. Since other configurations and operations are the same as those of the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
  • all processing can be performed only by the communication terminal based on the local feature amount in the image in the video.
  • FIG. 30 is a block diagram illustrating a functional configuration of the communication terminal 3010 according to the present embodiment.
  • the same reference numerals are assigned to the same functional components as those in FIG.
  • the learning target object recognition unit 3003 recognizes the learning target object by collating the local feature value generated by the local feature value generation unit 602 with the local feature value stored in the local feature value DB 3021. Then, the recognition result is notified from the learning target recognition result notification unit 3004. Note that the learning object recognition unit 3003 and the local feature DB 3021 are obtained by arranging the functional components included in the common learning object recognition server 220 in the communication terminal 3010, and the functions thereof are the same. Omitted.
  • the learning target recognition result notification unit 3004 also shows the processing including the display screen generation unit 606 and the voice generation unit 608 in FIG. 6 based on the notification information, and the processing is the same, so the description is omitted. To do.
  • the related information acquisition unit 3005 acquires related information from the related information DB 3022 corresponding to the recognized learning object. Also, the related information notification unit 3006 notifies the user of related information.
  • the link information acquisition unit 3007 acquires link information from the link information DB 3023 corresponding to the recognized learning object. Moreover, the link information alerting
  • These functional components are also configured by arranging the functional components included in the learning object recognition server 220 in the communication terminal 3010, and the functions thereof are the same, and the description thereof will be omitted.
  • the link destination access unit 3009 accesses the link destination related information providing server 230 using the acquired link information.
  • the present invention may be applied to a system composed of a plurality of devices, or may be applied to a single device. Furthermore, the present invention can also be applied to a case where a control program that realizes the functions of the embodiments is supplied directly or remotely to a system or apparatus. Therefore, in order to realize the functions of the present invention with a computer, a control program installed in the computer, a medium storing the control program, and a WWW (World Wide Web) server that downloads the control program are also included in the scope of the present invention. include.
  • Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object.
  • First local feature quantity storage means for storing the local feature quantity in association with each other; N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions.
  • Second local feature quantity generating means for generating the second local feature quantity of A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number.
  • the image in the video Learning object recognition means for recognizing that the learning object exists in An information processing system comprising: (Appendix 2)
  • the first local feature amount storage means stores the m first local feature amounts generated from the images of the learning objects in association with the plurality of learning objects, respectively.
  • the information processing system according to appendix 1 wherein the learning object recognition unit recognizes a plurality of learning objects included in the image captured by the imaging unit.
  • (Appendix 3) The information processing system according to appendix 1 or 2, further comprising notification means for notifying a recognition result of the learning object recognition means.
  • the notification unit further notifies information related to the recognition result.
  • the information processing system according to appendix 3 or 4 wherein the notifying unit further notifies link information for acquiring information related to the recognition result.
  • the notification means includes related information acquisition means for acquiring information related to the recognition result according to link information, The information processing system according to appendix 3, wherein the related information acquired according to link information is notified.
  • the first local feature quantity storage means further comprises a registration means for registering a local feature quantity of the learning object to be searched, The information processing system according to any one of supplementary notes 3 to 6, wherein the notification means notifies the learning object recognized by the learning object recognition means as a search result.
  • the learning object is a learning object including characters, The information processing system according to any one of supplementary notes 3 to 6, wherein the notification means notifies the contents of the learning object.
  • the learning object is a learning object related to sound, The information processing system according to any one of appendices 3 to 6, wherein the notification unit notifies the contents of the learning object by playing a sound.
  • the learning object is an exhibit, The information processing system according to any one of supplementary notes 3 to 6, wherein the notification means notifies the description of the learning object.
  • the learning object is a learning object including a mathematical formula, The information processing system according to any one of appendices 3 to 6, wherein the notification unit calculates a mathematical expression of the learning object and notifies a calculation result.
  • the first local feature amount and the second local feature amount are a plurality of dimensions formed by dividing a local region including a feature point extracted from an image into a plurality of sub-regions, and comprising histograms of gradient directions in the plurality of sub-regions.
  • the information processing system according to any one of appendices 1 to 11, wherein the information processing system is generated by generating a feature vector of (Appendix 13)
  • the first local feature quantity and the second local feature quantity are generated by selecting a dimension having a larger correlation between adjacent sub-regions from the generated feature vectors of a plurality of dimensions.
  • the information processing system according to attachment 12. (Appendix 14)
  • the plurality of dimensions of the feature vector is a predetermined dimension so that it can be selected in order from the dimension that contributes to the feature of the feature point and from the first dimension in accordance with the improvement in accuracy required for the local feature amount.
  • the second local feature quantity generation means corresponds to the correlation of the learning target object, and the second local feature quantity having a smaller number of dimensions for the learning target object having a lower correlation with another learning target object.
  • the first local feature quantity storage means corresponds to the correlation of the learning object, and the first local feature quantity having a smaller number of dimensions for the learning object having a lower correlation with another learning object. 16.
  • Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object.
  • An information processing method using an information processing system including a first local feature amount storage unit that stores a local feature amount in association with each other, N feature points are extracted from an image in the captured video, and n second regions each consisting of a feature vector from one dimension to j dimension for each of the n local regions including each of the n feature points.
  • a second local feature generation step for generating a local feature A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number.
  • the image in the video Recognizing that the learning object exists in An information processing method comprising: (Appendix 18) N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions.
  • Second local feature quantity generating means for generating the second local feature quantity of First transmitting means for transmitting the m second local feature amounts to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature amounts;
  • First receiving means for receiving information indicating a learning object included in the captured image from the information processing apparatus;
  • a communication terminal comprising: (Appendix 19) N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions.
  • a control method for a communication terminal comprising: (Appendix 20) N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions.
  • a second local feature generation step of generating the second local feature of A first transmission step of transmitting the m second local feature quantities to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature quantities;
  • First local feature quantity storage means for storing the local feature quantity in association with each other; N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points.
  • Second receiving means for receiving the second local feature amount from the communication terminal; A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number.
  • An information processing apparatus comprising: (Appendix 22) Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object.
  • a control method for an information processing apparatus including first local feature storage means for storing a local feature in association with each other, N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points.
  • a method for controlling an information processing apparatus comprising: (Appendix 23) Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object.
  • a control program for an information processing apparatus including a first local feature storage unit that stores a local feature in association with each other, N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points.
  • a computer-readable storage medium storing a control program for an information processing apparatus, characterized by causing a computer to execute.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

Provided is a technology for recognizing an object for study in an image in a video in real time. An object for study, and m number of first local features, which comprise feature vectors having from 1 to i dimensions for each of m number of local regions including m number of feature points in an image of the object for study, are associated and stored. Next, n number of feature points are extracted from an image in a captured video, and n number of second local features, which comprise feature vectors having from 1 to j dimensions for each of n number of local regions including n number of feature points, are generated. The number of dimensions (i) of the first local features or the number of dimensions (j) of the second local features, whichever is the smaller number of dimensions, is selected. The object for study is recognized to be present in the image from the video when a prescribed proportion or more of the m number of first local features up to the selected number of dimensions is determined to correspond to the n number of second local features up to the selected number of dimensions.

Description

情報処理システム、情報処理方法、情報処理装置およびその制御方法と制御プログラム、通信端末およびその制御方法と制御プログラムInformation processing system, information processing method, information processing apparatus and control method and control program thereof, communication terminal and control method and control program thereof
 本発明は、局所特徴量を使用して撮像した映像内の学習対象物を同定するための技術に関する。 The present invention relates to a technique for identifying a learning object in a video imaged using local feature amounts.
 上記技術分野において、特許文献1には、カメラ付き携帯電話からの映像と問合せメールとに基づいて、物体(植物・昆虫等)の名前を得る技術が記載されている。また、特許文献2には、あらかじめモデル画像から生成されたモデル辞書を使用して、クエリ画像を認識する場合に、特徴量をクラスタリングすることにより認識速度を向上した技術が記載されている。 In the above technical field, Patent Document 1 describes a technique for obtaining the name of an object (plant, insect, etc.) based on a video from a camera-equipped mobile phone and an inquiry mail. Japanese Patent Application Laid-Open No. 2004-228561 describes a technique that improves the recognition speed by clustering feature amounts when a query image is recognized using a model dictionary generated in advance from a model image.
特開2003-132062号公報JP 2003-132062 A 特開2011-221688号公報JP 2011-221688 A
 しかしながら、上記文献に記載の技術では、リアルタイムに映像中の画像内の学習対象物を認識することができなかった。 However, with the technique described in the above document, the learning object in the image in the video cannot be recognized in real time.
 本発明の目的は、上述の課題を解決する技術を提供することにある。 An object of the present invention is to provide a technique for solving the above-described problems.
 上記目的を達成するため、本発明に係るシステムは、
 学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段と、
 撮像手段が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成手段と、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する学習対象物認識手段と、
 を備えることを特徴とする。
In order to achieve the above object, a system according to the present invention provides:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. First local feature quantity storage means for storing the local feature quantity in association with each other;
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. Second local feature quantity generating means for generating the second local feature quantity of
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Learning object recognition means for recognizing that the learning object exists in
It is characterized by providing.
 上記目的を達成するため、本発明に係る方法は、
 学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた情報処理システムにおける情報処理方法であって、
 撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する認識ステップと、
 を備えることを特徴とする。
In order to achieve the above object, the method according to the present invention comprises:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. An information processing method in an information processing system including first local feature storage means for storing a local feature in association with each other,
N feature points are extracted from an image in the captured video, and n second regions each consisting of a feature vector from one dimension to j dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a local feature;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
It is characterized by providing.
 上記目的を達成するため、本発明に係る装置は、
 撮像手段が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成手段と、
 前記m個の第2局所特徴量を、局所特徴量の照合に基づいて撮像した前記画像に含まれる学習対象物を認識する情報処理装置に送信する第1送信手段と、
 前記情報処理装置から、撮像した前記画像に含まれる学習対象物を示す情報を受信する第1受信手段と、
 を備えることを特徴とする。
In order to achieve the above object, an apparatus according to the present invention provides:
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. Second local feature quantity generating means for generating the second local feature quantity of
First transmitting means for transmitting the m second local feature amounts to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature amounts;
First receiving means for receiving information indicating a learning object included in the captured image from the information processing apparatus;
It is characterized by providing.
 上記目的を達成するため、本発明に係る方法は、
 撮像手段が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
 前記m個の第2局所特徴量を、局所特徴量の照合に基づいて撮像した前記画像に含まれる学習対象物を認識する情報処理装置に送信する第1送信ステップと、
 前記情報処理装置から、撮像した前記画像に含まれる学習対象物を示す情報を受信する第1受信ステップと、
 を含むことを特徴とする。
In order to achieve the above object, the method according to the present invention comprises:
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. A second local feature generation step of generating the second local feature of
A first transmission step of transmitting the m second local feature quantities to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature quantities;
A first receiving step of receiving information indicating a learning object included in the captured image from the information processing apparatus;
It is characterized by including.
 上記目的を達成するため、本発明に係るプログラムは、
 撮像手段が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
 前記m個の第2局所特徴量を、局所特徴量の照合に基づいて撮像した前記画像に含まれる学習対象物を認識する情報処理装置に送信する第1送信ステップと、
 前記情報処理装置から、撮像した前記画像に含まれる学習対象物を示す情報を受信する第1受信ステップと、
 をコンピュータに実行させることを特徴とする。
In order to achieve the above object, a program according to the present invention provides:
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. A second local feature generation step of generating the second local feature of
A first transmission step of transmitting the m second local feature quantities to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature quantities;
A first receiving step of receiving information indicating a learning object included in the captured image from the information processing apparatus;
Is executed by a computer.
 上記目的を達成するため、本発明に係る装置は、
 学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段と、
 通信端末が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を、前記通信端末から受信する第2受信手段と、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する認識手段と、
 認識した前記学習対象物を示す情報を前記通信端末に送信する第2送信手段と、
 を備えることを特徴とする。
In order to achieve the above object, an apparatus according to the present invention provides:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. First local feature quantity storage means for storing the local feature quantity in association with each other;
N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. Second receiving means for receiving the second local feature amount from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing means for recognizing that the learning object exists in
Second transmission means for transmitting information indicating the recognized learning object to the communication terminal;
It is characterized by providing.
 上記目的を達成するため、本発明に係る方法は、
 学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた情報処理装置の制御方法であって、
 通信端末が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を、前記通信端末から受信する第2受信ステップと、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する認識ステップと、
 認識した前記学習対象物を示す情報を前記通信端末に送信する第2送信ステップと、
 を含むことを特徴とする。
In order to achieve the above object, the method according to the present invention comprises:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. A control method for an information processing apparatus including first local feature storage means for storing a local feature in association with each other,
N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. A second receiving step of receiving the second local feature amount of the communication terminal from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
A second transmission step of transmitting information indicating the recognized learning object to the communication terminal;
It is characterized by including.
 上記目的を達成するため、本発明に係るプログラムは、
 学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた情報処理装置の制御プログラムであって、
 通信端末が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を、前記通信端末から受信する第2受信ステップと、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する認識ステップと、
 認識した前記学習対象物を示す情報を前記通信端末に送信する第2送信ステップと、
 をコンピュータに実行させることを特徴とする。
In order to achieve the above object, a program according to the present invention provides:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. A control program for an information processing apparatus including a first local feature storage unit that stores a local feature in association with each other,
N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. A second receiving step of receiving the second local feature amount of the communication terminal from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
A second transmission step of transmitting information indicating the recognized learning object to the communication terminal;
Is executed by a computer.
 本発明によれば、リアルタイムに映像中の画像内の学習対象物を認識することができる。 According to the present invention, the learning object in the image in the video can be recognized in real time.
本発明の第1実施形態に係る情報処理システムの構成を示すブロック図である。It is a block diagram which shows the structure of the information processing system which concerns on 1st Embodiment of this invention. 本発明の第2実施形態に係る情報処理システムの構成を示すブロック図である。It is a block diagram which shows the structure of the information processing system which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る情報処理システムにおける通信端末の表示画面例を示す図である。It is a figure which shows the example of a display screen of the communication terminal in the information processing system which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る情報処理システムにおける関連情報報知の動作手順を示すシーケンス図である。It is a sequence diagram which shows the operation | movement procedure of the relevant information alerting | reporting in the information processing system which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る情報処理システムにおけるリンク情報報知の動作手順を示すシーケンス図である。It is a sequence diagram which shows the operation | movement procedure of the link information alerting | reporting in the information processing system which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る通信端末の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the communication terminal which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る学習対象物認識サーバの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the learning target object recognition server which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量DBの構成を示す図である。It is a figure which shows the structure of local feature-value DB which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る関連情報DBの構成を示す図である。It is a figure which shows the structure of related information DB which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係るリンク情報DBの構成を示す図である。It is a figure which shows the structure of link information DB which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成部の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the local feature-value production | generation part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成の手順を説明する図である。It is a figure explaining the procedure of the local feature-value production | generation which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成の手順を説明する図である。It is a figure explaining the procedure of the local feature-value production | generation which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成部でのサブ領域の選択順位を示す図である。It is a figure which shows the selection order of the sub area | region in the local feature-value production | generation part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成部での特徴ベクトルの選択順位を示す図である。It is a figure which shows the selection order of the feature vector in the local feature-value production | generation part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成部での特徴ベクトルの階層化を示す図である。It is a figure which shows hierarchization of the feature vector in the local feature-value production | generation part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る符号化部の構成を示す図である。It is a figure which shows the structure of the encoding part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る学習対象物認識部の処理を示す図である。It is a figure which shows the process of the learning target object recognition part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る通信端末のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the communication terminal which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る通信端末における局所特徴量生成テーブルを示す図である。It is a figure which shows the local feature-value production | generation table in the communication terminal which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る通信端末の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the communication terminal which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the local feature-value production | generation process which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る符号化処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the encoding process which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る差分値の符号化処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the encoding process of the difference value which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る学習対象物認識サーバのハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the learning target object recognition server which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る学習対象物認識サーバの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the learning target object recognition server which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量DB生成処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of local feature-value DB production | generation processing which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る学習対象物認識処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the learning target object recognition process which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る照合処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the collation process which concerns on 2nd Embodiment of this invention. 本発明の第3実施形態に係る情報処理システムの動作手順を示すシーケンス図である。It is a sequence diagram which shows the operation | movement procedure of the information processing system which concerns on 3rd Embodiment of this invention. 本発明の第4実施形態に係る情報処理システムにおける本の認識の動作手順を示すシーケンス図である。It is a sequence diagram which shows the operation | movement procedure of the recognition of the book in the information processing system which concerns on 4th Embodiment of this invention. 本発明の第4実施形態に係る情報処理システムにおけるページ認識の動作手順を示すシーケンス図である。It is a sequence diagram which shows the operation | movement procedure of the page recognition in the information processing system which concerns on 4th Embodiment of this invention. 本発明の第4実施形態に係る情報処理システムにおける漢字認識の動作手順を示すシーケンス図である。It is a sequence diagram which shows the operation | movement procedure of the kanji recognition in the information processing system which concerns on 4th Embodiment of this invention. 本発明の第4実施形態に係る内容紹介DBの構成を示す図である。It is a figure which shows the structure of content introduction DB which concerns on 4th Embodiment of this invention. 本発明の第4実施形態に係るページ情報DBの構成を示す図である。It is a figure which shows the structure of page information DB which concerns on 4th Embodiment of this invention. 本発明の第4実施形態に係る辞書DBの構成を示す図である。It is a figure which shows the structure of dictionary DB which concerns on 4th Embodiment of this invention. 本発明の第4実施形態に係る翻訳のための辞書DBの構成を示す図である。It is a figure which shows the structure of dictionary DB for the translation which concerns on 4th Embodiment of this invention. 本発明の第5実施形態に係る情報処理システムにおける音楽認識の動作手順を示すシーケンス図である。It is a sequence diagram which shows the operation | movement procedure of the music recognition in the information processing system which concerns on 5th Embodiment of this invention. 本発明の第5実施形態に係る情報処理システムにおける曲認識の動作手順を示すシーケンス図である。It is a sequence diagram which shows the operation | movement procedure of the music recognition in the information processing system which concerns on 5th Embodiment of this invention. 本発明の第5実施形態に係る情報処理システムにおける音認識の動作手順を示すシーケンス図である。It is a sequence diagram which shows the operation | movement procedure of the sound recognition in the information processing system which concerns on 5th Embodiment of this invention. 本発明の第5実施形態に係る内容紹介DBの構成を示す図である。It is a figure which shows the structure of content introduction DB which concerns on 5th Embodiment of this invention. 本発明の第5実施形態に係る演奏情報DBの構成を示す図である。It is a figure which shows the structure of performance information DB which concerns on 5th Embodiment of this invention. 本発明の第5実施形態に係る音情報DBの構成を示す図である。It is a figure which shows the structure of sound information DB which concerns on 5th Embodiment of this invention. 本発明の第6実施形態に係る情報処理システムにおける展示物認識の動作手順を示すシーケンス図である。It is a sequence diagram which shows the operation | movement procedure of the exhibit recognition in the information processing system which concerns on 6th Embodiment of this invention. 本発明の第6実施形態に係る内容紹介DBの構成を示す図である。It is a figure which shows the structure of content introduction DB which concerns on 6th Embodiment of this invention. 本発明の第7実施形態に係る情報処理システムにおける数式認識の動作手順を示すシーケンス図である。It is a sequence diagram which shows the operation | movement procedure of numerical formula recognition in the information processing system which concerns on 7th Embodiment of this invention. 本発明の第7実施形態に係る数式DBの構成を示す図である。It is a figure which shows the structure of numerical formula DB which concerns on 7th Embodiment of this invention. 本発明の第7実施形態に係る演算パラメータテーブルの構成を示す図である。It is a figure which shows the structure of the calculation parameter table which concerns on 7th Embodiment of this invention. 本発明の第8実施形態に係る情報処理システムにおける通信端末の表示画面例を示す図である。It is a figure which shows the example of a display screen of the communication terminal in the information processing system which concerns on 8th Embodiment of this invention. 本発明の第8実施形態に係る情報処理システムにおける通信端末の表示画面例を示す図である。It is a figure which shows the example of a display screen of the communication terminal in the information processing system which concerns on 8th Embodiment of this invention. 本発明の第8実施形態に係る通信端末の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the communication terminal which concerns on 8th Embodiment of this invention. 本発明の第8実施形態に係る通信端末の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the communication terminal which concerns on 8th Embodiment of this invention. 本発明の第9実施形態に係る通信端末の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the communication terminal which concerns on 9th Embodiment of this invention.
 以下に、図面を参照して、本発明の実施の形態について例示的に詳しく説明する。ただし、以下の実施の形態に記載されている構成要素は単なる例示であり、本発明の技術範囲をそれらのみに限定する趣旨のものではない。なお、本明細書で使用される「学習対象物」との文言は、教育界において使用されるあらゆる教材や作品を広く含む物である。 Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the drawings. However, the constituent elements described in the following embodiments are merely examples, and are not intended to limit the technical scope of the present invention only to them. Note that the term “learning object” used in this specification includes a wide variety of teaching materials and works used in the educational world.
 [第1実施形態]
 本発明の第1実施形態としての情報処理システム100について、図1を用いて説明する。情報処理システム100は、学習対象物をリアルタイムに認識するシステムである。
[First Embodiment]
An information processing system 100 as a first embodiment of the present invention will be described with reference to FIG. The information processing system 100 is a system that recognizes a learning object in real time.
 図1に示すように、情報処理システム100は、第1局所特徴量記憶部110と、撮像部120と、第2局所特徴量生成部130と、学習対象物認識140と、を含む。第1局所特徴量記憶部110は、学習対象物111と、学習対象物111の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量112とを、対応付けて記憶する。第2局所特徴量生成部130は、撮像部120が撮像した映像中の画像101からn個の特徴点131を抽出する。そして、第2局所特徴量生成部130は、n個の特徴点131のそれぞれを含むn個の局所領域132について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量133を生成する。学習対象物認識140は、第1局所特徴量112の特徴ベクトルの次元数iおよび第2局所特徴量133の特徴ベクトルの次元数jのうち、より少ない次元数を選択する。そして、学習対象物認識140は、選択された次元数までの特徴ベクトルからなるn個の第2局所特徴量133に、選択された次元数までの特徴ベクトルからなるm個の第1局所特徴量112の所定割合以上が対応すると判定した場合に、映像中の画像101に学習対象物111が存在すると認識する。 As shown in FIG. 1, the information processing system 100 includes a first local feature quantity storage unit 110, an imaging unit 120, a second local feature quantity generation unit 130, and a learning target object recognition 140. The first local feature quantity storage unit 110 generates each of the learning object 111 and m local regions including each of m feature points of the image of the learning object 111 from one dimension to i dimension. The m first local feature quantities 112 composed of the feature vectors up to are stored in association with each other. The second local feature quantity generation unit 130 extracts n feature points 131 from the image 101 in the video captured by the imaging unit 120. Then, the second local feature value generation unit 130, for n local regions 132 each including the n feature points 131, n second local feature values each consisting of a feature vector from 1 dimension to j dimension. 133 is generated. The learning object recognition 140 selects a smaller number of dimensions from the dimension number i of the feature vector of the first local feature quantity 112 and the dimension number j of the feature vector of the second local feature quantity 133. Then, the learning object recognition 140 adds the m second local feature amounts consisting of feature vectors up to the selected number of dimensions to the n second local feature amounts 133 consisting of feature vectors up to the selected number of dimensions. When it is determined that the predetermined ratio of 112 or more corresponds, it is recognized that the learning object 111 exists in the image 101 in the video.
 本実施形態によれば、リアルタイムに映像中の画像内の学習対象物を認識することができる。 According to the present embodiment, the learning target in the image in the video can be recognized in real time.
 [第2実施形態]
 次に、本発明の第2実施形態に係る情報処理システムについて説明する。本実施形態においては、通信端末により撮像した映像から生成した局所特徴量と、学習対象物認識サーバの局所特徴量DBに格納された局所特徴量との照合により、映像中の学習対象物を認識する。そして、認識した学習対象物に、その名称、関連情報、および/または、リンク情報を付加して報知する。
[Second Embodiment]
Next, an information processing system according to the second embodiment of the present invention will be described. In the present embodiment, the learning target in the video is recognized by collating the local feature generated from the video captured by the communication terminal with the local feature stored in the local feature DB of the learning target recognition server. To do. Then, the recognized learning object is notified by adding its name, related information, and / or link information.
 本実施形態によれば、リアルタイムに映像中の画像内の学習対象物に対応付けて、その名称、関連情報、および/または、リンク情報を報知できる。 According to the present embodiment, the name, related information, and / or link information can be notified in association with the learning object in the image in the video in real time.
 《情報処理システムの構成》
 図2は、本実施形態に係る情報処理システム200の構成を示すブロック図である。
<Configuration of information processing system>
FIG. 2 is a block diagram illustrating a configuration of the information processing system 200 according to the present embodiment.
 図2の情報処理システム200は、ネットワーク240を介して接続された、撮像機能を有した通信端末210と、通信端末210が撮像した映像から学習対象物を認識する学習対象物認識サーバ220と、通信端末210に関連情報と提供する関連情報提供サーバ230と、を備える。 The information processing system 200 in FIG. 2 includes a communication terminal 210 having an imaging function, connected via a network 240, a learning object recognition server 220 that recognizes a learning object from an image captured by the communication terminal 210, And a related information providing server 230 that provides related information to the communication terminal 210.
 通信端末210は、撮像された映像を表示部に表示する。そして、図2の表示画面211のように、撮像された映像に対して局所特徴量生成部で生成された局所特徴量に基づいて、学習対象物認識サーバ220で認識した学習対象物の各名称を、重畳表示する。なお、通信端末210は、図示のように、撮像機能を有した携帯電話や他の通信端末も含む複数の通信端末を代表するものである。 The communication terminal 210 displays the captured video on the display unit. And each name of the learning object recognized by the learning object recognition server 220 based on the local feature-value produced | generated by the local feature-value production | generation part with respect to the imaged image | video like the display screen 211 of FIG. Are superimposed and displayed. As shown in the figure, the communication terminal 210 represents a plurality of communication terminals including a mobile phone having an imaging function and other communication terminals.
 学習対象物認識サーバ220は、学習対象物と局所特徴量とを対応付けて記憶する局所特徴量DB221と、学習対象物に対応して関連情報を記憶する関連情報DB222と、学習対象物に対応してリンク情報を記憶するリンク情報DB223と、を有する。学習対象物認識サーバ220は、通信端末210から受信した映像の局所特徴量から、局所特徴量DB221の局所特徴量との照合に基づいて認識した学習対象物の名称を返信する。また、関連情報DB222から認識した学習対象物に対応する紹介などの関連情報を検索して通信端末210に返信する。また、リンク情報DB223から認識した学習対象物に対応する関連情報提供サーバ230へのリンク情報を検索して通信端末210に返信する。学習対象物の名称と、学習対象物に対応する関連情報と、学習対象物に対するリンク情報とは、それぞれ別個に提供されても、複数が同時に提供されてもよい。 The learning object recognition server 220 corresponds to the local feature DB 221 that stores the learning object and the local feature in association with each other, the related information DB 222 that stores the related information in correspondence with the learning object, and the learning object. And a link information DB 223 for storing link information. The learning object recognition server 220 returns the name of the learning object recognized based on the collation with the local feature quantity of the local feature quantity DB 221 from the local feature quantity of the video received from the communication terminal 210. In addition, related information such as introduction corresponding to the learning object recognized from the related information DB 222 is searched and returned to the communication terminal 210. Moreover, the link information to the related information providing server 230 corresponding to the learning object recognized from the link information DB 223 is searched and returned to the communication terminal 210. The name of the learning object, the related information corresponding to the learning object, and the link information for the learning object may be provided separately or may be provided at the same time.
 関連情報提供サーバ230は、学習対象物に対応した関連情報を格納する関連情報DB231を有する。学習対象物認識サーバ220が認識した学習対象物に対応して提供されたリンク情報に基づいて、アクセスされる。そして、認識した学習対象物に対応した関連情報を関連情報DB231から検索して、学習対象物を含む映像の局所特徴量を送信した通信端末210に返信する。したがって、図2には、1つの関連情報提供サーバ230を示したが、リンク先の数だけの関連情報提供サーバ230が接続される。その場合には、学習対象物認識サーバ220による適切なリンク先の選択、あるいは複数のリンク先を通信端末210に表示して、ユーザによる選択を行なうことになる。 The related information providing server 230 has a related information DB 231 that stores related information corresponding to the learning object. Access is made based on link information provided corresponding to the learning object recognized by the learning object recognition server 220. Then, the related information corresponding to the recognized learning object is searched from the related information DB 231 and returned to the communication terminal 210 that has transmitted the local feature amount of the video including the learning object. Therefore, although one related information providing server 230 is shown in FIG. 2, as many related information providing servers 230 as the number of link destinations are connected. In that case, the selection of an appropriate link destination by the learning object recognition server 220 or a plurality of link destinations are displayed on the communication terminal 210 and the selection is performed by the user.
 なお、図2には、撮像した映像中の学習対象物に名称を重畳表示する例を図示した。学習対象物に対応する関連情報と、学習対象物に対するリンク情報との表示については、図3にしたがって説明する。 FIG. 2 shows an example in which the name is superimposed on the learning object in the captured video. The display of the related information corresponding to the learning object and the link information for the learning object will be described with reference to FIG.
 (通信端末の表示画面例)
 図3は、本実施形態に係る情報処理システム200における通信端末210の表示画面例を示す図である。
(Example of communication terminal display screen)
FIG. 3 is a diagram illustrating a display screen example of the communication terminal 210 in the information processing system 200 according to the present embodiment.
 図3の上段は、学習対象物に対応する関連情報を表示する表示画面例である。図3の表示画面310は、撮像した映像311と操作ボタン312とを含んでいる。上段左図の映像から生成した局所特徴量と、学習対象物認識サーバ220の局所特徴量DB221との照合により学習対象物を認識する。その結果、上段右図の表示画面320には、撮像した映像と、学習対象物名称および関連情報322~325とを重畳した映像321を表示する。同時に、スピーカ340により関連情報を音声出力してもよい。 3 is an example of a display screen that displays related information corresponding to the learning object. The display screen 310 in FIG. 3 includes a captured image 311 and operation buttons 312. The learning target is recognized by collating the local feature generated from the video in the upper left diagram with the local feature DB 221 of the learning target recognition server 220. As a result, on the display screen 320 in the upper right diagram, a video 321 is displayed in which the captured video and the learning object name and related information 322 to 325 are superimposed. At the same time, the related information may be output by voice through the speaker 340.
 図3の下段は、学習対象物に対応するリンク情報を表示する表示画面例である。下段左図の映像から生成した局所特徴量と、学習対象物認識サーバ220の局所特徴量DB221との照合により学習対象物を認識する。その結果、下段右図の表示画面330には、撮像した映像と、学習対象物名称およびリンク情報332~335とを重畳した映像331を表示する。図示しないが、表示されたリンク情報をクリックすることにより、リンク先の関連情報提供サーバ230がアクセスされて、関連情報DB231から検索した関連情報を通信端末210に表示する、あるいは通信端末210から音声出力する。 The lower part of FIG. 3 is an example of a display screen that displays link information corresponding to the learning object. The learning target is recognized by collating the local feature generated from the video in the lower left figure with the local feature DB 221 of the learning target recognition server 220. As a result, on the display screen 330 in the lower right diagram, a video 331 is displayed in which the captured video is superimposed with the learning object name and link information 332 to 335. Although not shown, by clicking the displayed link information, the linked related information providing server 230 is accessed, and the related information retrieved from the related information DB 231 is displayed on the communication terminal 210, or the communication terminal 210 receives audio. Output.
 《情報処理システムの動作手順》
 以下、図4および図5を参照して、本実施形態における情報処理システム200の動作手順を説明する。なお、図4および図5には、認識された学習対象物名のみの表示例は示していないが、学習対象物認識後に学習対象物名を通信端末210に送信すればよい。また、学習対象物名と、関連情報と、リンク情報との表示は、図4と図5とを組み合わせれば実現できる。
<< Operation procedure of information processing system >>
Hereinafter, an operation procedure of the information processing system 200 in the present embodiment will be described with reference to FIGS. 4 and 5. 4 and 5 do not show display examples of only recognized learning object names, the learning object names may be transmitted to the communication terminal 210 after the learning object recognition. Further, the display of the learning object name, the related information, and the link information can be realized by combining FIG. 4 and FIG.
 (関連情報報知の動作手順)
 図4は、本実施形態に係る情報処理システム200における関連情報報知の動作手順を示すシーケンス図である。
(Related information notification procedure)
FIG. 4 is a sequence diagram showing an operation procedure of related information notification in the information processing system 200 according to the present embodiment.
 まず、必要であれば、ステップS400において、学習対象物認識サーバ220からから通信端末210にアプリケーションおよび/またはデータのダウンロードを行なう。そして、ステップS401において、本実施形態の処理を行なうための、アプリケーションの起動と初期化が行なわれる。 First, if necessary, in step S400, an application and / or data is downloaded from the learning object recognition server 220 to the communication terminal 210. In step S401, the application is activated and initialized to perform the processing of this embodiment.
 ステップS403において、通信端末は、撮像部で撮像して映像を取得する。ステップS405において、映像から局所特徴量を生成する。続いて、ステップS407において局所特徴量を特徴点座標と共に符号化する。符号化された局所特徴量は、ステップS409において、通信端末から学習対象物認識サーバ220に送信される。 In step S403, the communication terminal captures an image by the imaging unit and acquires a video. In step S405, a local feature amount is generated from the video. Subsequently, in step S407, the local feature amount is encoded together with the feature point coordinates. The encoded local feature amount is transmitted from the communication terminal to the learning object recognition server 220 in step S409.
 学習対象物認識サーバ220では、ステップS411において、学習対象物の画像に対して生成され記憶された局所特徴量DB221を参照して、映像中の学習対象物の認識を行なう。そして、ステップS413において、認識した学習対象物に対応する関連情報DB222を参照して、関連情報を取得する。ステップS415において、学習対象物名と関連情報とを、学習対象物認識サーバ220から通信端末210に送信する。 In step S411, the learning target object recognition server 220 refers to the local feature amount DB 221 generated and stored for the image of the learning target object, and recognizes the learning target object in the video. In step S413, the related information is acquired by referring to the related information DB 222 corresponding to the recognized learning object. In step S415, the learning object name and related information are transmitted from the learning object recognition server 220 to the communication terminal 210.
 通信端末210は、ステップS417において、受信した学習対象物名と関連情報とを報知する(図3の上段参照)。なお、学習対象物名は表示、関連情報は表示あるいは音声出力されるのが望ましい。 In step S417, the communication terminal 210 notifies the received learning object name and related information (see the upper part of FIG. 3). Note that it is desirable that the learning object name is displayed and the related information is displayed or output by voice.
 (リンク情報報知の動作手順)
 図5は、本実施形態に係る情報処理システム200におけるリンク情報報知の動作手順を示すシーケンス図である。なお、図4と同様の動作手順には同じステップ番号を付して、説明は省略する。
(Operation procedure of link information notification)
FIG. 5 is a sequence diagram showing an operation procedure of link information notification in the information processing system 200 according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4, and description is abbreviate | omitted.
 ステップS400およびS401においては、アプリケーションやデータの相違の可能性はあるが、図4と同様にダウンロードおよび起動と初期化が行なわれる。 In steps S400 and S401, although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
 ステップS411において通信端末210から受信した映像の局所特徴量から、景観中の学習対象物を認識した学習対象物認識サーバ220は、ステップS513において、リンク情報DB223を参照して、認識した学習対象物に対応するリンク情報を取得する。ステップS515において、学習対象物名とリンク情報とを、学習対象物認識サーバ220から通信端末210に送信する。 The learning object recognition server 220 that has recognized the learning object in the landscape from the local feature amount of the video received from the communication terminal 210 in step S411 refers to the link information DB 223 in step S513 and recognizes the learning object. Get link information corresponding to. In step S515, the learning object name and link information are transmitted from the learning object recognition server 220 to the communication terminal 210.
 通信端末210は、ステップS517において、受信した学習対象物名とリンク情報とを映像に重畳して表示する(図3の下段参照)。そして、ステップS519において、リンク情報のユーザによる指示を待つ。ユーザのリンク先指示があれば、ステップS521において、リンク先である関連情報提供サーバ230に、学習対象物IDを持ってアクセスする。 In step S517, the communication terminal 210 displays the received learning object name and link information superimposed on the video (see the lower part of FIG. 3). In step S519, an instruction from the user of link information is awaited. If there is a user's link destination instruction, in step S521, the related information providing server 230 that is the link destination is accessed with the learning object ID.
 関連情報提供サーバ230は、ステップS523において、受信した学習対象物IDを使用して関連情報DB231から関連情報(文書データや音声データを含む)を取得する。そして、ステップS525において、アクセス元の通信端末210に関連情報を返信する。 In step S523, the related information providing server 230 acquires related information (including document data and audio data) from the related information DB 231 using the received learning object ID. In step S525, the related information is returned to the access source communication terminal 210.
 関連情報の返信を受けた通信端末210は、ステップS527において、受信した関連情報を表示あるいは音声出力する。 The communication terminal 210 that has received the reply of the related information displays or outputs the received related information in step S527.
 《通信端末の機能構成》
 図6は、本実施形態に係る通信端末210の機能構成を示すブロック図である。
<Functional configuration of communication terminal>
FIG. 6 is a block diagram illustrating a functional configuration of the communication terminal 210 according to the present embodiment.
 図6において、撮像部601は、クエリ画像として映像を入力する。局所特徴量生成部602は、撮像部601からの映像から局所特徴量を生成する。局所特徴量送信部603は、生成された局所特徴量を特徴点座標と共に、符号化部603aにより符号化し、通信制御部604を介して学習対象物認識サーバ220に送信する。 In FIG. 6, the imaging unit 601 inputs a video as a query image. The local feature value generation unit 602 generates a local feature value from the video from the imaging unit 601. The local feature amount transmission unit 603 encodes the generated local feature amount together with the feature point coordinates by the encoding unit 603a and transmits the encoded local feature amount to the learning object recognition server 220 via the communication control unit 604.
 学習対象物認識結果受信部605は、通信制御部604を介して学習対象物認識サーバ220から学習対象物認識結果を受信する。そして、表示画面生成部606は、受信した学習対象物認識結果の表示画面を生成して、ユーザに報知する。 The learning object recognition result receiving unit 605 receives the learning object recognition result from the learning object recognition server 220 via the communication control unit 604. The display screen generation unit 606 generates a display screen of the received learning object recognition result and notifies the user.
 また、関連情報受信部607は、通信制御部604を介して関連情報を受信する。そして、表示画面生成部606および音声生成部698は、受信した関連情報の表示画面および音声データを生成して、ユーザに報知する。なお、関連情報受信部607が受信する関連情報は、学習対象物認識サーバ220からの関連情報あるいは関連情報提供サーバ230からの関連情報を含む。 Also, the related information receiving unit 607 receives related information via the communication control unit 604. Then, the display screen generation unit 606 and the sound generation unit 698 generate a display screen and sound data of the received related information and notify the user. Note that the related information received by the related information receiving unit 607 includes related information from the learning object recognition server 220 or related information from the related information providing server 230.
 また、リンク情報受信部609は、通信制御部604を介して関連情報提供サーバ230からリンク情報を受信する。そして、表示画面生成部606は、受信したリンク情報の表示画面を生成して、ユーザに報知する。リンク先アクセス部610は、図示しない操作部によるリンク情報のクリックに基づいて、リンク先の関連情報提供サーバ230をアクセスする。 Also, the link information receiving unit 609 receives link information from the related information providing server 230 via the communication control unit 604. Then, the display screen generation unit 606 generates a display screen of the received link information and notifies the user. The link destination access unit 610 accesses the link destination related information providing server 230 based on the click of link information by an operation unit (not shown).
 なお、学習対象物認識結果受信部605と関連情報受信部607とリンク情報受信部609とは、それぞれ設けずに、通信制御部604を介して受信した情報を受信する1つの情報受信部として設けてもよい。 Note that the learning object recognition result receiving unit 605, the related information receiving unit 607, and the link information receiving unit 609 are not provided, but are provided as one information receiving unit that receives information received via the communication control unit 604. May be.
 《学習対象物認識サーバの機能構成》
 図7は、本実施形態に係る学習対象物認識サーバ220の機能構成を示すブロック図である。
<< Functional structure of the learning object recognition server >>
FIG. 7 is a block diagram showing a functional configuration of the learning object recognition server 220 according to the present embodiment.
 図7において、局所特徴量受信部702は、通信制御部701を介して通信端末210から受信した局所特徴量を復号部702aで復号する。学習対象物認識部703は、受信した局所特徴量を、学習対象物に対応する局所特徴量を格納する局所特徴量DB221の局所特徴量と照合して、学習対象物を認識する。学習対象物認識結果送信部704は、学習対象物の認識結果(学習対象物名)を通信端末210に送信する。 In FIG. 7, the local feature receiving unit 702 decodes the local feature received from the communication terminal 210 via the communication control unit 701 by the decoding unit 702a. The learning object recognition unit 703 recognizes the learning object by comparing the received local feature quantity with the local feature quantity of the local feature quantity DB 221 that stores the local feature quantity corresponding to the learning object. The learning object recognition result transmission unit 704 transmits the learning object recognition result (learning object name) to the communication terminal 210.
 関連情報取得部705は、関連情報DB222を参照して、認識した学習対象物に対応する関連情報を取得する。関連情報DB関連情報送信部706は、取得した関連情報を通信端末210に送信する。なお、学習対象物認識サーバ220が関連情報を送信する場合は、図4のように、学習対象物認識結果と関連情報とを1つの送信データで送信するのが、通信トラフィックの削減となるので望ましい。 The related information acquisition unit 705 refers to the related information DB 222 and acquires related information corresponding to the recognized learning object. The related information DB related information transmission unit 706 transmits the acquired related information to the communication terminal 210. When the learning object recognition server 220 transmits related information, as shown in FIG. 4, transmitting the learning object recognition result and the related information as a single transmission data reduces communication traffic. desirable.
 リンク情報取得部707は、リンク情報DB223を参照して、認識した学習対象物に対応するリンク情報を取得する。リンク情報送信部708は、取得したリンク情報を通信端末210に送信する。なお、リンク情報を送信する場合は、図5のように、学習対象物認識サーバ220が学習対象物認識結果とリンク情報とを1つの送信データで送信するのが、通信トラフィックの削減となるので望ましい。 The link information acquisition unit 707 refers to the link information DB 223 and acquires link information corresponding to the recognized learning object. The link information transmission unit 708 transmits the acquired link information to the communication terminal 210. When transmitting link information, as shown in FIG. 5, the learning object recognition server 220 transmits the learning object recognition result and the link information as a single transmission data because communication traffic is reduced. desirable.
 当然ながら、学習対象物認識サーバ220が学習対象物認識結果と関連情報とリンク情報とを送信する場合は、全情報を取得してから1つの送信データで送信するのが、通信トラフィックの削減となるので望ましい。 Of course, when the learning object recognition server 220 transmits the learning object recognition result, the related information, and the link information, the transmission of one piece of transmission data after acquiring all the information is to reduce the communication traffic. This is desirable.
 なお、関連情報提供サーバ230の構成については、種々のリンク可能なプロバイダを含み、その構成についての説明は省略する。 It should be noted that the configuration of the related information providing server 230 includes various linkable providers, and a description of the configuration is omitted.
 (局所特徴量DB)
 図8は、本実施形態に係る局所特徴量DB221の構成を示す図である。なお、かかる構成に限定されない。
(Local feature DB)
FIG. 8 is a diagram illustrating a configuration of the local feature DB 221 according to the present embodiment. Note that the present invention is not limited to such a configuration.
 局所特徴量DB221は、学習対象物ID801と名称802に対応付けて、第1番局所特徴量803、第2番局所特徴量804、…、第m番局所特徴量805を記憶する。各局所特徴量は、5×5のサブ領域に対応して、25次元ずつに階層化された1次元から150次元の要素からなる特徴ベクトルを記憶する(図11F参照)。 The local feature DB 221 stores a first local feature 803, a second local feature 804, ..., an mth local feature 805 in association with the learning object ID 801 and the name 802. Each local feature quantity stores a feature vector composed of 1-dimensional to 150-dimensional elements hierarchized by 25 dimensions corresponding to 5 × 5 subregions (see FIG. 11F).
 なお、mは正の整数であり、学習対象物IDに対応して異なる数でよい。また、本実施形態においては、それぞれの局所特徴量と共に照合処理に使用される特徴点座標が記憶される。 Note that m is a positive integer and may be a different number corresponding to the learning object ID. In the present embodiment, the feature point coordinates used for the matching process are stored together with the respective local feature amounts.
 (関連情報DB)
 図9は、本実施形態に係る関連情報DB222の構成を示す図である。なお、かかる構成に限定されない。
(Related information DB)
FIG. 9 is a diagram showing a configuration of the related information DB 222 according to the present embodiment. Note that the present invention is not limited to such a configuration.
 関連情報DB222は、学習対象物ID901と学習対象物名902に対応付けて、関連情報である関連表示データ903と関連音声データ904とを記憶する。なお、関連情報DB222は、局所特徴量DB221と一体に設けてもよい。 The related information DB 222 stores related display data 903 and related audio data 904 that are related information in association with the learning object ID 901 and the learning object name 902. The related information DB 222 may be provided integrally with the local feature DB 221.
 (リンク情報DB)
 図10は、本実施形態に係るリンク情報DB223の構成を示す図である。なお、かかる構成に限定されない。
(Link information DB)
FIG. 10 is a diagram showing a configuration of the link information DB 223 according to the present embodiment. Note that the present invention is not limited to such a configuration.
 リンク情報DB223は、学習対象物ID1001と学習対象物名1002に対応付けて、インク情報である、例えばURL(Uniform Resource Locator)1003と表示画面への表示デー10904とを記憶する。なお、リンク情報DB223は、局所特徴量DB221や関連情報DB222と一体に設けてもよい。 The link information DB 223 stores ink information, for example, a URL (Uniform Resource Locator) 1003 and display data 10904 on the display screen in association with the learning object ID 1001 and the learning object name 1002. The link information DB 223 may be provided integrally with the local feature amount DB 221 and the related information DB 222.
 なお、関連情報提供サーバ230の関連情報DB231は、学習対象物認識サーバ220の関連情報DB222と同様であり、重複を避けるため説明は省略する。 Note that the related information DB 231 of the related information providing server 230 is the same as the related information DB 222 of the learning object recognition server 220, and a description thereof is omitted to avoid duplication.
 《局所特徴量生成部》
 図11Aは、本実施形態に係る局所特徴量生成部702の構成を示すブロック図である。
<< Local feature generator >>
FIG. 11A is a block diagram illustrating a configuration of a local feature value generation unit 702 according to the present embodiment.
 局所特徴量生成部702は、特徴点検出部1111、局所領域取得部1112、サブ領域分割部1113、サブ領域特徴ベクトル生成部1114、および次元選定部1115を含んで構成される。 The local feature quantity generation unit 702 includes a feature point detection unit 1111, a local region acquisition unit 1112, a sub region division unit 1113, a sub region feature vector generation unit 1114, and a dimension selection unit 1115.
 特徴点検出部1111は、画像データから特徴的な点(特徴点)を多数検出し、各特徴点の座標位置、スケール(大きさ)、および角度を出力する。 The feature point detection unit 1111 detects a large number of characteristic points (feature points) from the image data, and outputs the coordinate position, scale (size), and angle of each feature point.
 局所領域取得部1112は、検出された各特徴点の座標値、スケール、および角度から、特徴量抽出を行う局所領域を取得する。 The local region acquisition unit 1112 acquires a local region where feature amount extraction is performed from the coordinate value, scale, and angle of each detected feature point.
 サブ領域分割部1113は、局所領域をサブ領域に分割する。たとえば、サブ領域分割部1113は、局所領域を16ブロック(4×4ブロック)に分割することも、局所領域を25ブロック(5×5ブロック)に分割することもできる。なお、分割数は限定されない。本実施形態においては、以下、局所領域を25ブロック(5×5ブロック)に分割する場合を代表して説明する。 The sub area dividing unit 1113 divides the local area into sub areas. For example, the sub-region dividing unit 1113 can divide the local region into 16 blocks (4 × 4 blocks) or divide the local region into 25 blocks (5 × 5 blocks). The number of divisions is not limited. In the present embodiment, the case where the local area is divided into 25 blocks (5 × 5 blocks) will be described below as a representative.
 サブ領域特徴ベクトル生成部1114は、局所領域のサブ領域ごとに特徴ベクトルを生成する。サブ領域の特徴ベクトルとしては、たとえば、勾配方向ヒストグラムを用いることができる。 The sub-region feature vector generation unit 1114 generates a feature vector for each sub-region of the local region. As the feature vector of the sub-region, for example, a gradient direction histogram can be used.
 次元選定部1115は、サブ領域の位置関係に基づいて、近接するサブ領域の特徴ベクトル間の相関が低くなるように、局所特徴量として出力する次元を選定する(たとえば、間引きする)。また、次元選定部1115は、単に次元を選定するだけではなく、選定の優先順位を決定することができる。すなわち、次元選定部1115は、たとえば、隣接するサブ領域間では同一の勾配方向の次元が選定されないように、優先順位をつけて次元を選定することができる。そして、次元選定部1115は、選定した次元から構成される特徴ベクトルを、局所特徴量として出力する。なお、次元選定部1115は、優先順位に基づいて次元を並び替えた状態で、局所特徴量を出力することができる。 The dimension selection unit 1115 selects a dimension to be output as a local feature amount (for example, thinning out) so that the correlation between feature vectors of adjacent sub-regions becomes low based on the positional relationship of the sub-regions. In addition, the dimension selection unit 1115 can not only select a dimension but also determine a selection priority. That is, the dimension selection unit 1115 can select dimensions with priorities so that, for example, dimensions in the same gradient direction are not selected between adjacent sub-regions. Then, the dimension selection unit 1115 outputs a feature vector composed of the selected dimensions as a local feature amount. In addition, the dimension selection part 1115 can output a local feature-value in the state which rearranged the dimension based on the priority.
 《局所特徴量生成部の処理》
 図11B~図11Fは、本実施形態に係る局所特徴量生成部602の処理を示す図である。
<< Processing of local feature generator >>
11B to 11F are diagrams showing processing of the local feature quantity generation unit 602 according to the present embodiment.
 まず、図11Bは、局所特徴量生成部602における、特徴点検出/局所領域取得/サブ領域分割/特徴ベクトル生成の一連の処理を示す図である。かかる一連の処理については、米国特許第6711293号明細書や、David G. Lowe著、「Distinctive image features from scale-invariant key points」、(米国)、International Journal of Computer Vision、60(2)、2004年、p. 91-110を参照されたい。 First, FIG. 11B is a diagram showing a series of processing of feature point detection / local region acquisition / sub-region division / feature vector generation in the local feature quantity generation unit 602. Such a series of processes is described in US Pat. No. 6,711,293, David G. Lowe, “Distinctive image features from scale-invariant key points” (USA), International Journal of Computer Vision, 60 (2), 2004. Year, p. 91-110.
 (特徴点検出部)
 図11Bの画像1121は、図11Aの特徴点検出部1111において、映像中の画像から特徴点を検出した状態を示す図である。以下、1つの特徴点データ1121aを代表させて局所特徴量の生成を説明する。特徴点データ1121aの矢印の起点が特徴点の座標位置を示し、矢印の長さがスケール(大きさ)を示し、矢印の方向が角度を示す。ここで、スケール(大きさ)や方向は、対象映像にしたがって輝度や彩度、色相などを選択できる。また、図11Bの例では、60度間隔で6方向の場合を説明するが、これに限定されない。
(Feature point detector)
An image 1121 in FIG. 11B is a diagram illustrating a state in which feature points are detected from an image in the video in the feature point detection unit 1111 in FIG. 11A. Hereinafter, the generation of local feature amounts will be described by using one feature point data 1121a as a representative. The starting point of the arrow of the feature point data 1121a indicates the coordinate position of the feature point, the length of the arrow indicates the scale (size), and the direction of the arrow indicates the angle. Here, as the scale (size) and direction, brightness, saturation, hue, and the like can be selected according to the target image. Further, in the example of FIG. 11B, the case of six directions at intervals of 60 degrees will be described, but the present invention is not limited to this.
 (局所領域取得部)
 図11Aの局所領域取得部1112は、例えば、特徴点データ1121aの起点を中心にガウス窓1122aを生成し、このガウス窓1122aをほぼ含む局所領域1122を生成する。図11Bの例では、局所領域取得部1112は正方形の局所領域1122を生成したが、局所領域は円形であっても他の形状であってもよい。この局所領域を各特徴点について取得する。局所領域が円形であれば、撮影方向に対してロバスト性が向上するという効果がある。
(Local area acquisition unit)
For example, the local region acquisition unit 1112 in FIG. 11A generates a Gaussian window 1122a around the starting point of the feature point data 1121a, and generates a local region 1122 that substantially includes the Gaussian window 1122a. In the example of FIG. 11B, the local region acquisition unit 1112 generates a square local region 1122, but the local region may be circular or have another shape. This local region is acquired for each feature point. If the local area is circular, there is an effect that the robustness is improved with respect to the imaging direction.
 (サブ領域分割部)
 次に、サブ領域分割部1113において、上記特徴点データ1121aの局所領域1122に含まれる各画素のスケールおよび角度をサブ領域1123に分割した状態が示されている。なお、図11Bでは4×4=16画素をサブ領域とする5×5=25のサブ領域に分割した例を示す。しかし、サブ領域は、4×4=16や他の形状、分割数であってもよい。
(Sub-region division part)
Next, the sub-region dividing unit 1113 shows a state in which the scale and angle of each pixel included in the local region 1122 of the feature point data 1121a are divided into sub-regions 1123. FIG. 11B shows an example in which 4 × 4 = 16 pixels are divided into 5 × 5 = 25 subregions. However, the sub-region may be 4 × 4 = 16, other shapes, or the number of divisions.
 (サブ領域特徴ベクトル生成部)
 サブ領域特徴ベクトル生成部1114は、サブ領域内の各画素のスケールを6方向の角度単位にヒストグラムを生成して量子化し、サブ領域の特徴ベクトル1124とする。すなわち、特徴点検出部1111が出力する角度に対して正規化された方向である。そして、サブ領域特徴ベクトル生成部1114は、サブ領域ごとに量子化された6方向の頻度を集計し、ヒストグラムを生成する。この場合、サブ領域特徴ベクトル生成部1114は、各特徴点に対して生成される25サブ領域ブロック×6方向=150次元のヒストグラムにより構成される特徴ベクトルを出力する。また、勾配方向を6方向に量子化するだけに限らず、4方向、8方向、10方向など任意の量子化数に量子化してよい。勾配方向をD方向に量子化する場合、量子化前の勾配方向をG(0~2πラジアン)とすると、勾配方向の量子化値Qq(q=0,…,D-1)は、例えば式(1)や式(2)などで求めることができるが、これに限られない。
(Sub-region feature vector generator)
The sub-region feature vector generation unit 1114 generates and quantizes the histogram of the scale of each pixel in the sub-region in units of angles in six directions, and sets it as a sub-region feature vector 1124. That is, the direction is normalized with respect to the angle output by the feature point detection unit 1111. Then, the sub-region feature vector generation unit 1114 aggregates the frequencies in the six directions quantized for each sub-region, and generates a histogram. In this case, the sub-region feature vector generation unit 1114 outputs a feature vector constituted by a histogram of 25 sub-region blocks × 6 directions = 150 dimensions generated for each feature point. In addition, the gradient direction is not limited to 6 directions, but may be quantized to an arbitrary quantization number such as 4 directions, 8 directions, and 10 directions. When the gradient direction is quantized in the D direction, if the gradient direction before quantization is G (0 to 2π radians), the quantized value Qq (q = 0,..., D−1) in the gradient direction can be expressed by, for example, Although it can obtain | require by (1), Formula (2), etc., it is not restricted to this.
 Qq=floor(G×D/2π)    …(1)
 Qq=round(G×D/2π)modD …(2)
 ここで、floor()は小数点以下を切り捨てる関数、round()は四捨五入を行う関数、modは剰余を求める演算である。また、サブ領域特徴ベクトル生成部1114は勾配ヒストグラムを生成するときに、単純な頻度を集計するのではなく、勾配の大きさを加算して集計してもよい。また、サブ領域特徴ベクトル生成部1114は勾配ヒストグラムを集計するときに、画素が属するサブ領域だけではなく、サブ領域間の距離に応じて近接するサブ領域(隣接するブロックなど)にも重み値を加算するようにしてもよい。また、サブ領域特徴ベクトル生成部1114は量子化された勾配方向の前後の勾配方向にも重み値を加算するようにしてもよい。なお、サブ領域の特徴ベクトルは勾配方向ヒストグラムに限られず、色情報など、複数の次元(要素)を有するものであればよい。本実施形態においては、サブ領域の特徴ベクトルとして、勾配方向ヒストグラムを用いることとして説明する。
Qq = floor (G × D / 2π) (1)
Qq = round (G × D / 2π) mod D (2)
Here, floor () is a function for rounding off the decimal point, round () is a function for rounding off, and mod is an operation for obtaining a remainder. Further, when generating the gradient histogram, the sub-region feature vector generation unit 1114 may add up the magnitudes of the gradients instead of adding up the simple frequencies. In addition, when the sub-region feature vector generation unit 1114 aggregates the gradient histogram, the sub-region feature vector generation unit 1114 assigns weight values not only to the sub-region to which the pixel belongs, but also to sub-regions (such as adjacent blocks) that are close to each other according to the distance between the sub-regions. You may make it add. Further, the sub-region feature vector generation unit 1114 may add weight values to gradient directions before and after the quantized gradient direction. Note that the feature vector of the sub-region is not limited to the gradient direction histogram, and may be any one having a plurality of dimensions (elements) such as color information. In the present embodiment, it is assumed that a gradient direction histogram is used as the feature vector of the sub-region.
 (次元選定部)
 次に、図11C~図11Fにしたがって、局所特徴量生成部602における、次元選定部1115に処理を説明する。
(Dimension selection part)
Next, processing will be described in the dimension selection unit 1115 in the local feature quantity generation unit 602 according to FIGS. 11C to 11F.
 次元選定部1115は、サブ領域の位置関係に基づいて、近接するサブ領域の特徴ベクトル間の相関が低くなるように、局所特徴量として出力する次元(要素)を選定する(間引きする)。より具体的には、次元選定部1115は、例えば、隣接するサブ領域間では少なくとも1つの勾配方向が異なるように次元を選定する。なお、本実施形態では、次元選定部1115は近接するサブ領域として主に隣接するサブ領域を用いることとするが、近接するサブ領域は隣接するサブ領域に限られず、例えば、対象のサブ領域から所定距離内にあるサブ領域を近接するサブ領域とすることもできる。 The dimension selection unit 1115 selects (decimates) a dimension (element) to be output as a local feature amount based on the positional relationship between the sub-regions so that the correlation between feature vectors of adjacent sub-regions becomes low. More specifically, the dimension selection unit 1115 selects dimensions such that at least one gradient direction differs between adjacent sub-regions, for example. In the present embodiment, the dimension selection unit 1115 mainly uses adjacent subregions as adjacent subregions. However, the adjacent subregions are not limited to adjacent subregions. A sub-region within a predetermined distance may be a nearby sub-region.
 図11Cは、局所領域を5×5ブロックのサブ領域に分割し、勾配方向を6方向1131aに量子化して生成された150次元の勾配ヒストグラムの特徴ベクトル1131から次元を選定する場合の一例を示す図である。図11Cの例では、150次元(5×5=25サブ領域ブロック×6方向)の特徴ベクトルから次元の選定が行われている。 FIG. 11C shows an example in which a dimension is selected from a feature vector 1131 of a 150-dimensional gradient histogram generated by dividing a local region into 5 × 5 block sub-regions and quantizing gradient directions into six directions 1131a. FIG. In the example of FIG. 11C, dimensions are selected from feature vectors of 150 dimensions (5 × 5 = 25 sub-region blocks × 6 directions).
  (局所領域の次元選定)
 図11Cは、局所特徴量生成部602における、特徴ベクトルの次元数の選定処理の様子を示す図である。
(Dimension selection of local area)
FIG. 11C is a diagram showing a state of feature vector dimension number selection processing in the local feature value generation unit 602.
 図11Cに示すように、次元選定部1115は、150次元の勾配ヒストグラムの特徴ベクトル1131から半分の75次元の勾配ヒストグラムの特徴ベクトル1132を選定する。この場合、隣接する左右、上下のサブ領域ブロックでは、同一の勾配方向の次元が選定されないように、次元を選定することができる。 As shown in FIG. 11C, the dimension selection unit 1115 selects a feature vector 1132 of a half 75-dimensional gradient histogram from a feature vector 1131 of a 150-dimensional gradient histogram. In this case, dimensions can be selected so that dimensions in the same gradient direction are not selected in adjacent left and right and upper and lower sub-region blocks.
 この例では、勾配方向ヒストグラムにおける量子化された勾配方向をq(q=0,1,2,3,4,5)とした場合に、q=0,2,4の要素を選定するブロックと、q=1,3,5の要素を選定するサブ領域ブロックとが交互に並んでいる。そして、図11Cの例では、隣接するサブ領域ブロックで選定された勾配方向を合わせると、全6方向となっている。 In this example, when the quantized gradient direction in the gradient direction histogram is q (q = 0, 1, 2, 3, 4, 5), a block for selecting elements of q = 0, 2, 4 and , Q = 1, 3, and 5 are alternately arranged with sub-region blocks for selecting elements. In the example of FIG. 11C, when the gradient directions selected in the adjacent sub-region blocks are combined, there are six directions.
 また、次元選定部1115は、75次元の勾配ヒストグラムの特徴ベクトル1132から50次元の勾配ヒストグラムの特徴ベクトル1133を選定する。この場合、斜め45度に位置するサブ領域ブロック間で、1つの方向のみが同一になる(残り1つの方向は異なる)ように次元を選定することができる。 Also, the dimension selection unit 1115 selects the feature vector 1133 of the 50-dimensional gradient histogram from the feature vector 1132 of the 75-dimensional gradient histogram. In this case, the dimension can be selected so that only one direction is the same (the remaining one direction is different) between the sub-region blocks positioned at an angle of 45 degrees.
 また、次元選定部1115は、50次元の勾配ヒストグラムの特徴ベクトル1133から25次元の勾配ヒストグラムの特徴ベクトル1134を選定する場合は、斜め45度に位置するサブ領域ブロック間で、選定される勾配方向が一致しないように次元を選定することができる。図11Cに示す例では、次元選定部1115は、1次元から25次元までは各サブ領域から1つの勾配方向を選定し、26次元から50次元までは2つの勾配方向を選定し、51次元から75次元までは3つの勾配方向を選定している。 In addition, when the dimension selection unit 1115 selects the feature vector 1134 of the 25-dimensional gradient histogram from the feature vector 1133 of the 50-dimensional gradient histogram, the gradient direction selected between the sub-region blocks located at an angle of 45 degrees. Dimension can be selected so that does not match. In the example shown in FIG. 11C, the dimension selection unit 1115 selects one gradient direction from each sub-region from the first dimension to the 25th dimension, selects two gradient directions from the 26th dimension to the 50th dimension, and starts from the 51st dimension. Three gradient directions are selected up to 75 dimensions.
 このように、隣接するサブ領域ブロック間で勾配方向が重ならないように、また全勾配方向が均等に選定されることが望ましい。また同時に、図11Cに示す例のように、局所領域の全体から均等に次元が選定されることが望ましい。なお、図11Cに示した次元選定方法は一例であり、この選定方法に限らない。 As described above, it is desirable that the gradient directions should not be overlapped between adjacent sub-area blocks and that all gradient directions should be selected uniformly. At the same time, as in the example shown in FIG. 11C, it is desirable that the dimensions be selected uniformly from the entire local region. Note that the dimension selection method illustrated in FIG. 11C is an example, and is not limited to this selection method.
  (局所領域の優先順位)
 図11Dは、局所特徴量生成部602における、サブ領域からの特徴ベクトルの選定順位の一例を示す図である。
(Local area priority)
FIG. 11D is a diagram illustrating an example of the selection order of feature vectors from sub-regions in the local feature value generation unit 602.
 次元選定部1115は、単に次元を選定するだけではなく、特徴点の特徴に寄与する次元から順に選定するように、選定の優先順位を決定することができる。すなわち、次元選定部1115は、例えば、隣接するサブ領域ブロック間では同一の勾配方向の次元が選定されないように、優先順位をつけて次元を選定することができる。そして、次元選定部1115は、選定した次元から構成される特徴ベクトルを、局所特徴量として出力する。なお、次元選定部1115は、優先順位に基づいて次元を並び替えた状態で、局所特徴量を出力することができる。 The dimension selection unit 1115 can determine the priority of selection so as to select not only the dimensions but also the dimensions that contribute to the features of the feature points in order. That is, the dimension selection unit 1115 can select dimensions with priorities so that, for example, dimensions in the same gradient direction are not selected between adjacent sub-area blocks. Then, the dimension selection unit 1115 outputs a feature vector composed of the selected dimensions as a local feature amount. In addition, the dimension selection part 1115 can output a local feature-value in the state which rearranged the dimension based on the priority.
 すなわち、次元選定部1115は、1~25次元、26次元~50次元、51次元~75次元の間は、例えば図11Dのマトリクス1141に示すようなサブ領域ブロックの順番で次元を追加するように選定していってもよい。図11Dのマトリクス1141に示す優先順位を用いる場合、次元選定部1115は、中心に近いサブ領域ブロックの優先順位を高くして、勾配方向を選定していくことができる。 That is, the dimension selection unit 1115 adds dimensions in the order of the sub-region blocks as shown in the matrix 1141 in FIG. 11D, for example, between 1 to 25 dimensions, 26 dimensions to 50 dimensions, and 51 dimensions to 75 dimensions. It may be selected. When the priority order shown in the matrix 1141 in FIG. 11D is used, the dimension selection unit 1115 can select the gradient direction by increasing the priority order of the sub-region blocks close to the center.
 図11Eのマトリクス1151は、図11Dの選定順位にしたがって、150次元の特徴ベクトルの要素の番号の一例を示す図である。この例では、5×5=25ブロックをラスタスキャン順に番号p(p=0,1,…,25)で表し、量子化された勾配方向をq(q=0,1,2,3,4,5)とした場合に、特徴ベクトルの要素の番号を6×p+qとしている。 11E is a diagram illustrating an example of element numbers of 150-dimensional feature vectors in accordance with the selection order of FIG. 11D. In this example, 5 × 5 = 25 blocks are represented by numbers p (p = 0, 1,..., 25) in raster scan order, and the quantized gradient direction is represented by q (q = 0, 1, 2, 3, 4). , 5), the element number of the feature vector is 6 × p + q.
 図11Fのマトリクス1161は、図11Eの選定順位による150次元の順位が、25次元単位に階層化されていることを示す図である。すなわち、図11Fのマトリクス1161は、図11Dのマトリクス1141に示した優先順位にしたがって図11Eに示した要素を選定していくことにより得られる局所特徴量の構成例を示す図である。次元選定部1115は、図11Fに示す順序で次元要素を出力することができる。具体的には、次元選定部1115は、例えば150次元の局所特徴量を出力する場合、図11Fに示す順序で全150次元の要素を出力することができる。また、次元選定部1115は、例えば25次元の局所特徴量を出力する場合、図11Fに示す1行目(76番目、45番目、83番目、…、120番目)の要素1171を図11Fに示す順(左から右)に出力することができる。また、次元選定部1115は、例えば50次元の局所特徴量を出力する場合、図11Fに示す1行目に加えて、図11Fに示す2行目の要素1172を図11Fに示す順(左から右)に出力することができる。 The matrix 1161 in FIG. 11F is a diagram showing that the 150-dimensional order according to the selection order in FIG. 11E is hierarchized in units of 25 dimensions. In other words, the matrix 1161 in FIG. 11F is a diagram illustrating a configuration example of local feature amounts obtained by selecting the elements illustrated in FIG. 11E according to the priority order illustrated in the matrix 1141 in FIG. 11D. The dimension selection unit 1115 can output dimension elements in the order shown in FIG. 11F. Specifically, for example, when outputting a 150-dimensional local feature amount, the dimension selection unit 1115 can output all 150-dimensional elements in the order shown in FIG. 11F. When the dimension selection unit 1115 outputs, for example, a 25-dimensional local feature, the element 1171 in the first row (76th, 45th, 83rd,..., 120th) shown in FIG. 11F is shown in FIG. 11F. Can be output in order (from left to right). For example, when outputting a 50-dimensional local feature value, the dimension selection unit 1115 adds the elements 1172 in the second row shown in FIG. 11F in the order shown in FIG. To the right).
 ところで、図11Fに示す例では、局所特徴量は階層的な構造配列となっている。すなわち、例えば、25次元の局所特徴量と150次元の局所特徴量とにおいて、先頭の25次元分の局所特徴量における要素1171~1176の並びは同一となっている。このように、次元選定部1115は、階層的(プログレッシブ)に次元を選定することにより、アプリケーションや通信容量、端末スペックなどに応じて、任意の次元数の局所特徴量、すなわち任意のサイズの局所特徴量を抽出して出力することができる。また、次元選定部1115が、階層的に次元を選定し、優先順位に基づいて次元を並び替えて出力することにより、異なる次元数の局所特徴量を用いて、画像の照合を行うことができる。例えば、75次元の局所特徴量と50次元の局所特徴量を用いて画像の照合が行われる場合、先頭の50次元だけを用いることにより、局所特徴量間の距離計算を行うことができる。 Incidentally, in the example shown in FIG. 11F, the local feature amount has a hierarchical structure arrangement. That is, for example, in the 25-dimensional local feature quantity and the 150-dimensional local feature quantity, the arrangement of the elements 1171 to 1176 in the first 25-dimensional local feature quantity is the same. In this way, the dimension selection unit 1115 selects a dimension hierarchically (progressively), thereby depending on the application, communication capacity, terminal specification, etc. Feature quantities can be extracted and output. In addition, the dimension selection unit 1115 can select images hierarchically, sort the dimensions based on the priority order, and output them, thereby collating images using local feature amounts of different dimensions. . For example, when images are collated using a 75-dimensional local feature value and a 50-dimensional local feature value, the distance between the local feature values can be calculated by using only the first 50 dimensions.
 なお、図11Dのマトリクス1141から図11Fに示す優先順位は一例であり、次元を選定する際の順序はこれに限られない。例えば、ブロックの順番に関しては、図11Dのマトリクス1141の例の他に、図11Dのマトリクス1142や図11Dのマトリクス1143に示すような順番でもよい。また、例えば、全てのサブ領域からまんべんなく次元が選定されるように優先順位が定められることとしてもよい。また、局所領域の中央付近が重要として、中央付近のサブ領域の選定頻度が高くなるように優先順位が定められることとしてもよい。また、次元の選定順序を示す情報は、例えば、プログラムにおいて規定されていてもよいし、プログラムの実行時に参照されるテーブル等(選定順序記憶部)に記憶されていてもよい。 Note that the priorities shown in the matrix 1141 in FIG. 11D to FIG. 11F are merely examples, and the order of selecting dimensions is not limited to this. For example, the order of blocks may be the order shown in the matrix 1142 in FIG. 11D or the matrix 1143 in FIG. 11D in addition to the example of the matrix 1141 in FIG. 11D. Further, for example, the priority order may be determined so that dimensions are selected from all the sub-regions. Also, the vicinity of the center of the local region may be important, and the priority order may be determined so that the selection frequency of the sub-region near the center is increased. Further, the information indicating the dimension selection order may be defined in the program, for example, or may be stored in a table or the like (selection order storage unit) referred to when the program is executed.
 また、次元選定部1115は、サブ領域ブロックを1つ飛びに選択して、次元の選定を行ってもよい。すなわち、あるサブ領域では6次元が選定され、当該サブ領域に近接する他のサブ領域では0次元が選定される。このような場合においても、近接するサブ領域間の相関が低くなるようにサブ領域ごとに次元が選定されていると言うことができる。 Also, the dimension selection unit 1115 may select a dimension by selecting one sub-region block. That is, 6 dimensions are selected in a certain sub-region, and 0 dimensions are selected in other sub-regions close to the sub-region. Even in such a case, it can be said that the dimension is selected for each sub-region so that the correlation between adjacent sub-regions becomes low.
 また、局所領域やサブ領域の形状は、正方形に限られず、任意の形状とすることができる。例えば、局所領域取得部1112が、円状の局所領域を取得することとしてもよい。この場合、サブ領域分割部1113は、円状の局所領域を例えば複数の局所領域を有する同心円に9分割や17分割のサブ領域に分割することができる。この場合においても、次元選定部1115は、各サブ領域において、次元を選定することができる。 In addition, the shape of the local region and sub-region is not limited to a square, and can be any shape. For example, the local region acquisition unit 1112 may acquire a circular local region. In this case, the sub-region dividing unit 1113 can divide the circular local region into, for example, nine or seventeen sub-regions into concentric circles having a plurality of local regions. Even in this case, the dimension selection unit 1115 can select a dimension in each sub-region.
 以上、図11B~図11Fに示したように、本実施形態の局所特徴量生成部602によれば、局所特徴量の情報量を維持しながら生成された特徴ベクトルの次元が階層的に選定される。この処理により、認識精度を維持しながらリアルタイムでの学習対象物認識と認識結果の表示が可能となる。なお、局所特徴量生成部602の構成および処理は本例に限定されない。認識精度を維持しながらリアルタイムでの学習対象物認識と認識結果の表示が可能となる他の処理が当然に適用できる。 As described above, as shown in FIGS. 11B to 11F, according to the local feature value generation unit 602 of the present embodiment, the dimension of the feature vector generated while maintaining the information amount of the local feature value is hierarchically selected. The This processing enables real-time learning object recognition and recognition result display while maintaining recognition accuracy. Note that the configuration and processing of the local feature value generation unit 602 are not limited to this example. Naturally, other processes that enable real-time object recognition and recognition result display while maintaining recognition accuracy can be applied.
 (符号化部)
 図11Gは、本実施形態に係る符号化部603aを示すブロック図である。なお、符号化部は本例に限定されず、他の符号化処理も適用可能である。
(Encoding part)
FIG. 11G is a block diagram showing the encoding unit 603a according to the present embodiment. Note that the encoding unit is not limited to this example, and other encoding processes can be applied.
 符号化部603aは、局所特徴量生成部602の特徴点検出部1111から特徴点の座標を入力して、座標値を走査する座標値走査部1181を有する。座標値走査部1181は、画像をある特定の走査方法にしたがって走査し、特徴点の2次元座標値(X座標値とY座標値)を1次元のインデックス値に変換する。このインデックス値は、走査に従った原点からの走査距離である。なお、走査方向については、制限はない。 The encoding unit 603a has a coordinate value scanning unit 1181 that inputs the coordinates of feature points from the feature point detection unit 1111 of the local feature quantity generation unit 602 and scans the coordinate values. The coordinate value scanning unit 1181 scans the image according to a specific scanning method, and converts the two-dimensional coordinate values (X coordinate value and Y coordinate value) of the feature points into one-dimensional index values. This index value is a scanning distance from the origin according to scanning. There is no restriction on the scanning direction.
 また、特徴点のインデックス値をソートし、ソート後の順列の情報を出力するソート部1182を有する。ここでソート部1182は、例えば昇順にソートする。また降順にソートしてもよい。 Also, it has a sorting unit 1182 that sorts the index values of feature points and outputs permutation information after sorting. Here, the sorting unit 1182 sorts, for example, in ascending order. You may also sort in descending order.
 また、ソートされたインデックス値における、隣接する2つのインデックス値の差分値を算出し、差分値の系列を出力する差分算出部1183を有する。 Also, a difference calculation unit 1183 that calculates a difference value between two adjacent index values in the sorted index value and outputs a series of difference values is provided.
 そして、差分値の系列を系列順に符号化する差分符号化部1184を有する。差分値の系列の符号化は、例えば固定ビット長の符号化でもよい。固定ビット長で符号化する場合、そのビット長はあらかじめ規定されていてもよいが、これでは考えられうる差分値の最大値を表現するのに必要なビット数を要するため、符号化サイズは小さくならない。そこで、差分符号化部1184は、固定ビット長で符号化する場合、入力された差分値の系列に基づいてビット長を決定することができる。具体的には、例えば、差分符号化部1184は、入力された差分値の系列から差分値の最大値を求め、その最大値を表現するのに必要なビット数(表現ビット数)を求め、求められた表現ビット数で差分値の系列を符号化することができる。 And, it has a differential encoding unit 1184 that encodes a sequence of difference values in sequence order. The sequence of the difference value may be encoded with a fixed bit length, for example. When encoding with a fixed bit length, the bit length may be specified in advance, but this requires the number of bits necessary to express the maximum possible difference value, so the encoding size is small. Don't be. Therefore, when encoding with a fixed bit length, the differential encoding unit 1184 can determine the bit length based on the input sequence of difference values. Specifically, for example, the difference encoding unit 1184 obtains the maximum value of the difference value from the input series of difference values, obtains the number of bits (expression number of bits) necessary to express the maximum value, A series of difference values can be encoded with the obtained number of expression bits.
 一方、ソートされた特徴点のインデックス値と同じ順列で、対応する特徴点の局所特徴量を符号化する局所特徴量符号化部1185を有する。ソートされたインデックス値と同じ順列で符号化することで、差分符号化部1184で符号化された座標値と、それに対応する局所特徴量とを1対1で対応付けることが可能となる。局所特徴量符号化部1185は、本実施形態においては、1つの特徴点に対する150次元の局所特徴量から次元選定された局所特徴量を、例えば1次元を1バイトで符号化し、次元数のバイトで符号化することができる。 On the other hand, it has a local feature encoding unit 1185 that encodes the local feature of the corresponding feature point in the same permutation as the index value of the sorted feature point. By encoding with the same permutation as the sorted index value, it is possible to associate the coordinate value encoded by the differential encoding unit 1184 and the corresponding local feature amount on a one-to-one basis. In this embodiment, the local feature amount encoding unit 1185 encodes a local feature amount that is dimension-selected from 150-dimensional local feature amounts for one feature point, for example, one dimension with one byte, and the number of dimensions. Can be encoded.
 (学習対象物認識部)
 図11Hは、本実施形態に係る学習対象物認識部703の処理を示す図である。
(Learning object recognition unit)
FIG. 11H is a diagram illustrating processing of the learning object recognition unit 703 according to the present embodiment.
 図11Hは、図3において、表示画面310中の通信端末210が撮像した映像311から生成した局所特徴量を、あらかじめ局所特徴量DB221に格納された局所特徴量1191~1194と照合する様子を示す図である。 FIG. 11H shows a state in which the local feature amount generated from the video 311 captured by the communication terminal 210 in the display screen 310 in FIG. FIG.
 図11Hの左図の通信端末210で撮像された映像311からは、本実施形態に従い局所特徴量が生成される。そして、局所特徴量DB221に各学習対象物に対応して格納された局所特徴量1191~1194が、映像311から生成された局所特徴量中にあるか否かが照合される。 From the video 311 captured by the communication terminal 210 in the left diagram of FIG. 11H, local feature amounts are generated according to the present embodiment. Then, it is verified whether or not the local feature amounts 1191 to 1194 stored in the local feature amount DB 221 corresponding to each learning object are in the local feature amounts generated from the video 311.
 図11Hに示すように、学習対象物認識部703は、局所特徴量DB221に格納されている局所特徴量と局所特徴量が合致する各特徴点を細線のように関連付ける。なお、学習対象物認識部703は、局所特徴量の所定割合以上が一致する場合を特徴点の合致とする。そして、学習対象物認識部703は、関連付けられた特徴点の集合間の位置関係が線形関係であれば、対象の学習対象物であると認識する。このような認識を行なえば、サイズの大小や向きの違い(視点の違い)、あるいは反転などによっても認識が可能である。また、所定数以上の関連付けられた特徴点があれば認識精度が得られるので、一部が視界から隠れていても学習対象物の認識が可能である。 As shown in FIG. 11H, the learning object recognition unit 703 associates each feature point where the local feature quantity stored in the local feature quantity DB 221 matches the local feature quantity like a thin line. Note that the learning target object recognition unit 703 determines that the feature points match when a predetermined ratio or more of the local feature amounts match. Then, the learning object recognition unit 703 recognizes that it is a target learning object if the positional relationship between the sets of associated feature points is a linear relationship. If such recognition is performed, it is possible to recognize by size difference, orientation difference (difference in viewpoint), or inversion. In addition, since recognition accuracy can be obtained if there are a predetermined number or more of associated feature points, the learning object can be recognized even if a part of the feature points is hidden from view.
 図11Hにおいては、局所特徴量DB221の4つの学習対象物の局所特徴量1191~1194に合致する、景観内の異なる4つの学習対象物が局所特徴量の精度に対応する精密さを持って認識される。 In FIG. 11H, four different learning objects in the landscape that match the local feature quantities 1191 to 1194 of the four learning objects in the local feature quantity DB 221 are recognized with a precision corresponding to the accuracy of the local feature quantity. Is done.
 《通信端末のハードウェア構成》
 図12Aは、本実施形態に係る通信端末210のハードウェア構成を示すブロック図である。
<< Hardware configuration of communication terminal >>
FIG. 12A is a block diagram illustrating a hardware configuration of the communication terminal 210 according to the present embodiment.
 図12Aで、CPU1210は演算制御用のプロセッサであり、プログラムを実行することで通信端末210の各機能構成部を実現する。ROM1220は、初期データおよびプログラムなどの固定データおよびプログラムを記憶する。また、通信制御部604は通信制御部であり、本実施形態においては、ネットワークを介して学習対象物認識サーバ220や関連情報提供サーバ230と通信する。なお、CPU1210は1つに限定されず、複数のCPUであっても、あるいは画像処理用のGPU(Graphics Processing Unit)を含んでもよい。 In FIG. 12A, a CPU 1210 is a processor for arithmetic control, and implements each functional component of the communication terminal 210 by executing a program. The ROM 1220 stores fixed data and programs such as initial data and programs. The communication control unit 604 is a communication control unit, and in the present embodiment, communicates with the learning object recognition server 220 and the related information providing server 230 via a network. Note that the number of CPUs 1210 is not limited to one, and may be a plurality of CPUs or may include a GPU (GraphicsGraphProcessing Unit) for image processing.
 RAM1240は、CPU1210が一時記憶のワークエリアとして使用するランダムアクセスメモリである。RAM1240には、本実施形態の実現に必要なデータを記憶する領域が確保されている。入力映像1241は、撮像部601が撮像して入力した入力映像を示す。特徴点データ1242は、入力映像1241から検出した特徴点座標、スケール、角度を含む特徴点データを示す。局所特徴量生成テーブル1243は、局所特徴量を生成するまでのデータを保持する局所特徴量生成テーブルを示す(図12B参照)。局所特徴量1244は、局所特徴量生成テーブル1243を使って生成され、通信制御部604を介して学習対象物認識サーバ220に送る局所特徴量を示す。学習対象物認識結果1245は、通信制御部604を介して学習対象物認識サーバ220から返信された学習対象物認識結果を示す。関連情報/リンク情報1246は、学習対象物認識サーバ220から返信された関連情報やリンク情報、あるいは関連情報提供サーバ230から返信された関連情報を示す。表示画面データ1247は、ユーザに学習対象物認識結果1245や関連情報/リンク情報1246を含む情報を報知するための表示画面データを示す。なお、音声出力をする場合には、音声データが含まれてもよい。入出力データ1248は、入出力インタフェース1260を介して入出力される入出力データを示す。送受信データ1249は、通信制御部604を介して送受信される送受信データを示す。 The RAM 1240 is a random access memory that the CPU 1210 uses as a work area for temporary storage. The RAM 1240 has an area for storing data necessary for realizing the present embodiment. An input video 1241 indicates an input video imaged and input by the imaging unit 601. The feature point data 1242 indicates feature point data including the feature point coordinates, scale, and angle detected from the input video 1241. The local feature value generation table 1243 is a local feature value generation table that holds data until a local feature value is generated (see FIG. 12B). The local feature value 1244 is generated using the local feature value generation table 1243 and indicates a local feature value to be sent to the learning object recognition server 220 via the communication control unit 604. A learning object recognition result 1245 indicates a learning object recognition result returned from the learning object recognition server 220 via the communication control unit 604. The related information / link information 1246 indicates related information and link information returned from the learning object recognition server 220 or related information returned from the related information providing server 230. The display screen data 1247 indicates display screen data for notifying the user of information including the learning object recognition result 1245 and related information / link information 1246. In the case of outputting audio, audio data may be included. Input / output data 1248 indicates input / output data input / output via the input / output interface 1260. Transmission / reception data 1249 indicates transmission / reception data transmitted / received via the communication control unit 604.
 ストレージ1250には、データベースや各種のパラメータ、あるいは本実施形態の実現に必要な以下のデータまたはプログラムが記憶されている。表示フォーマット1251は、学習対象物認識結果1245や関連情報/リンク情報1246を含む情報を表示するための表示フォーマットを示す。 The storage 1250 stores a database, various parameters, or the following data or programs necessary for realizing the present embodiment. A display format 1251 indicates a display format for displaying information including the learning object recognition result 1245 and related information / link information 1246.
 ストレージ1250には、以下のプログラムが格納される。通信端末制御プログラム1252は、本通信端末210の全体を制御する通信端末制御プログラムを示す。通信端末制御プログラム1252には、以下のモジュールが含まれている。局所特徴量生成モジュール1253は、通信端末制御プログラム1252において、入力映像から図11B~図11Fにしたがって局所特徴量を生成するモジュールを示す。なお、局所特徴量生成モジュール1253は、図示のモジュール群から構成されるが、ここでは詳説は省略する。符号化モジュール1254は、局所特徴量生成モジュール1253により生成された局所特徴量を送信のために符号化するモジュールを示す。情報受信報知モジュール1255は、学習対象物認識結果1245や関連情報/リンク情報1246を受信して表示または音声によりユーザに報知するためのモジュールを示す。リンク先アクセスモジュール1256は、受信して報知したリンク情報へのユーザ指示に基づいて、リンク先をアクセスするモジュールである。 The storage 1250 stores the following programs. The communication terminal control program 1252 indicates a communication terminal control program that controls the entire communication terminal 210. The communication terminal control program 1252 includes the following modules. The local feature value generation module 1253 indicates a module that generates a local feature value from the input video according to FIGS. 11B to 11F in the communication terminal control program 1252. The local feature quantity generation module 1253 is composed of the illustrated module group, but detailed description thereof is omitted here. The encoding module 1254 indicates a module that encodes the local feature generated by the local feature generating module 1253 for transmission. The information reception notification module 1255 is a module for receiving the learning object recognition result 1245 and the related information / link information 1246 and notifying the user by display or voice. The link destination access module 1256 is a module that accesses a link destination based on a user instruction to the link information received and notified.
 入出力インタフェース1260は、入出力機器との入出力データをインタフェースする。入出力インタフェース1260には、表示部1261、操作部1262であるタッチパネルやキーボード、スピーカ1263、マイク1264、撮像部601が接続される。入出力機器は上記例に限定されない。また、GPS(Global Positioning System)位置生成部1265が搭載され、GPS衛星からの信号に基づいて現在位置を取得する。 The input / output interface 1260 interfaces input / output data with input / output devices. The input / output interface 1260 is connected to a display unit 1261, a touch panel or keyboard as the operation unit 1262, a speaker 1263, a microphone 1264, and an imaging unit 601. The input / output device is not limited to the above example. In addition, a GPS (Global Positioning System) position generation unit 1265 is mounted, and acquires the current position based on a signal from a GPS satellite.
 なお、図12Aには、本実施形態に必須なデータやプログラムのみが示されており、本実施形態に関連しないデータやプログラムは図示されていない。 In FIG. 12A, only data and programs essential to the present embodiment are shown, and data and programs not related to the present embodiment are not shown.
 (局所特徴量生成テーブル)
 図12Bは、本実施形態に係る通信端末210における局所特徴量生成テーブル1243を示す図である。
(Local feature generation table)
FIG. 12B is a diagram showing a local feature generation table 1243 in the communication terminal 210 according to the present embodiment.
 局所特徴量生成テーブル1243には、入力画像ID1201に対応付けて、複数の検出された検出特徴点1202,特徴点座標1203および特徴点に対応する局所領域情報1204が記憶される。そして、各検出特徴点1202,特徴点座標1203および局所領域情報1204に対応付けて、複数のサブ領域ID1205,サブ領域情報1206,各サブ領域に対応する特徴ベクトル1207および優先順位を含む選定次元1208が記憶される。 In the local feature quantity generation table 1243, a plurality of detected feature points 1202, feature point coordinates 1203, and local region information 1204 corresponding to the feature points are stored in association with the input image ID 1201. A selection dimension 1208 including a plurality of sub-region IDs 1205, sub-region information 1206, a feature vector 1207 corresponding to each sub-region, and a priority order in association with each detected feature point 1202, feature point coordinates 1203 and local region information 1204. Is memorized.
 以上のデータから各検出特徴点1202に対して局所特徴量1209が生成される。これらを特徴点座標と組みにして集めたデータが、撮像した景観から生成した学習対象物認識サーバ220に送信される局所特徴量1244である。 A local feature quantity 1209 is generated for each detected feature point 1202 from the above data. Data collected by combining these with feature point coordinates is a local feature quantity 1244 transmitted to the learning object recognition server 220 generated from the captured landscape.
 《通信端末の処理手順》
 図13は、本実施形態に係る通信端末210の処理手順を示すフローチャートである。このフローチャートは、図12AのCPU1210によってRAM1240を用いて実行され、図6の各機能構成部を実現する。
<< Processing procedure of communication terminal >>
FIG. 13 is a flowchart illustrating a processing procedure of the communication terminal 210 according to the present embodiment. This flowchart is executed by the CPU 1210 of FIG. 12A using the RAM 1240, and implements each functional component of FIG.
 まず、ステップS1311において、学習対象物の認識を行なうための映像入力があったか否かを判定する。また、ステップS1321においては、データ受信を判定する。また、ステップS1331においては、ユーザによるリンク先の指示かを判定する。いずれでもなければ、ステップS1341においてその他の処理を行なう。なお、通常の送信処理については説明を省略する。 First, in step S1311, it is determined whether or not there is a video input for recognizing the learning object. In step S1321, data reception is determined. In step S1331, it is determined whether the instruction is a link destination by the user. Otherwise, other processing is performed in step S1341. Note that description of normal transmission processing is omitted.
 映像入力があればステップS1313に進んで、入力映像に基づいて局所特徴量生成処理を実行する(図14A参照)。次に、ステップS1315において、局所特徴量および特徴点座標を符号化する(図14Bおよび図14C参照)。ステップS1317においては、符号化されたデータを学習対象物認識サーバ220に送信する。 If there is video input, the process proceeds to step S1313, and local feature generation processing is executed based on the input video (see FIG. 14A). Next, in step S1315, local feature quantities and feature point coordinates are encoded (see FIGS. 14B and 14C). In step S1317, the encoded data is transmitted to the learning object recognition server 220.
 データ受信の場合はステップS1323に進んで、学習対象物認識サーバ220からの学習対象物認識結果や関連情報の受信か、または関連情報提供サーバ230からの関連情報の受信か否かを判定する。学習対象物認識サーバ220からの受信であればステップS1325に進んで、受信した学習対象物認識結果、関連情報、リンク情報を表示や音声出力で報知する。一方、関連情報提供サーバ230からの受信であればステップS1327に進んで、受信した関連情報を表示や音声出力で報知する。 In the case of data reception, the process proceeds to step S 1323, where it is determined whether the learning object recognition result and the related information are received from the learning object recognition server 220 or the related information is received from the related information providing server 230. If it is reception from the learning object recognition server 220, it will progress to step S1325 and will alert | report the received learning object recognition result, related information, and link information by a display or audio | voice output. On the other hand, if it is reception from the related information provision server 230, it will progress to step S1327 and will alert | report the received related information by a display or audio | voice output.
 (局所特徴量生成処理)
 図14Aは、本実施形態に係る局所特徴量生成処理S1313の処理手順を示すフローチャートである。
(Local feature generation processing)
FIG. 14A is a flowchart illustrating a processing procedure of local feature generation processing S1313 according to the present embodiment.
 まず、ステップS1411において、入力映像から特徴点の位置座標、スケール、角度を検出する。ステップS1413において、ステップS1411で検出された特徴点の1つに対して局所領域を取得する。次に、ステップS1415において、局所領域をサブ領域に分割する。ステップS1417においては、各サブ領域の特徴ベクトルを生成して局所領域の特徴ベクトルを生成する。ステップS1411からS1417の処理は図11Bに図示されている。 First, in step S1411, the position coordinates, scale, and angle of the feature points are detected from the input video. In step S1413, a local region is acquired for one of the feature points detected in step S1411. Next, in step S1415, the local area is divided into sub-areas. In step S1417, a feature vector for each sub-region is generated to generate a feature vector for the local region. The processing of steps S1411 to S1417 is illustrated in FIG. 11B.
 次に、ステップS1419において、ステップS1417において生成された局所領域の特徴ベクトルに対して次元選定を実行する。次元選定については、図11D~図11Fに図示されている。 Next, in step S1419, dimension selection is performed on the feature vector of the local region generated in step S1417. The dimension selection is illustrated in FIGS. 11D to 11F.
 ステップS1421においては、ステップS1411で検出した全特徴点について局所特徴量の生成と次元選定とが終了したかを判定する。終了していない場合はステップS1413に戻って、次の1つの特徴点について処理を繰り返す。 In step S1421, it is determined whether the generation of local features and dimension selection have been completed for all feature points detected in step S1411. If not completed, the process returns to step S1413 to repeat the process for the next one feature point.
 (符号化処理)
 図14Bは、本実施形態に係る符号化処理S1315の処理手順を示すフローチャートである。
(Encoding process)
FIG. 14B is a flowchart illustrating a processing procedure of the encoding processing S1315 according to the present embodiment.
 まず、ステップS1431において、特徴点の座標値を所望の順序で走査する。次に、ステップS1433において、走査した座標値をソートする。ステップS1435において、ソートした順に座標値の差分値を算出する。ステップS1437においては、差分値を符号化する(図14C参照)。そして、ステップS1439において、座標値のソート順に局所特徴量を符号化する。なお、差分値の符号化と局所特徴量の符号化とは並列に行なってもよい。 First, in step S1431, the coordinate values of feature points are scanned in a desired order. Next, in step S1433, the scanned coordinate values are sorted. In step S1435, a difference value of coordinate values is calculated in the sorted order. In step S1437, the difference value is encoded (see FIG. 14C). In step S1439, local feature amounts are encoded in the coordinate value sorting order. The difference value encoding and the local feature amount encoding may be performed in parallel.
 (差分値の符号化処理)
 図14Cは、本実施形態に係る差分値の符号化処理S1437の処理手順を示すフローチャートである。
(Difference processing)
FIG. 14C is a flowchart illustrating a processing procedure of difference value encoding processing S1437 according to the present embodiment.
 まず、ステップS1441において、差分値が符号化可能な値域内であるか否かを判定する。符号化可能な値域内であればステップS1447に進んで、差分値を符号化する。そして、ステップS1449へ移行する。符号化可能な値域内でない場合(値域外)はステップS1443に進んで、エスケープコードを符号化する。そしてステップS1445において、ステップS1447の符号化とは異なる符号化方法で差分値を符号化する。そして、ステップS1449へ移行する。ステップS1449では、処理された差分値が差分値の系列の最後の要素であるかを判定する。最後である場合は、処理が終了する。最後でない場合は、再度ステップS1441に戻って、差分値の系列の次の差分値に対する処理が実行される。 First, in step S1441, it is determined whether or not the difference value is within a range that can be encoded. If it is within the range which can be encoded, it will progress to step S1447 and will encode a difference value. Then, control goes to a step S1449. If it is not within the range that can be encoded (outside the range), the process proceeds to step S1443 to encode the escape code. In step S1445, the difference value is encoded by an encoding method different from the encoding in step S1447. Then, control goes to a step S1449. In step S1449, it is determined whether the processed difference value is the last element in the series of difference values. If it is the last, the process ends. When it is not the last, it returns to step S1441 again and the process with respect to the next difference value of the series of a difference value is performed.
 《学習対象物認識サーバのハードウェア構成》
 図15は、本実施形態に係る学習対象物認識サーバ220のハードウェア構成を示すブロック図である。
<< Hardware configuration of learning object recognition server >>
FIG. 15 is a block diagram illustrating a hardware configuration of the learning object recognition server 220 according to the present embodiment.
 図15で、CPU1510は演算制御用のプロセッサであり、プログラムを実行することで図7の学習対象物認識サーバ220の各機能構成部を実現する。ROM1520は、初期データおよびプログラムなどの固定データおよびプログラムを記憶する。また、通信制御部701は通信制御部であり、本実施形態においては、ネットワークを介して通信端末210あるいは関連情報提供サーバ230と通信する。なお、CPU1510は1つに限定されず、複数のCPUであっても、あるいは画像処理用のGPUを含んでもよい。 15, a CPU 1510 is a processor for arithmetic control, and implements each functional component of the learning object recognition server 220 in FIG. 7 by executing a program. The ROM 1520 stores fixed data and programs such as initial data and programs. The communication control unit 701 is a communication control unit, and in this embodiment, communicates with the communication terminal 210 or the related information providing server 230 via a network. Note that the number of CPUs 1510 is not limited to one, and may be a plurality of CPUs or may include a GPU for image processing.
 RAM1540は、CPU1510が一時記憶のワークエリアとして使用するランダムアクセスメモリである。RAM1540には、本実施形態の実現に必要なデータを記憶する領域が確保されている。受信した局所特徴量1541は、通信端末210から受信した特徴点座標を含む局所特徴量を示す。読出した局所特徴量1542は、局所特徴量DB221から読み出した特徴点座標を含むと局所特徴量を示す。学習対象物認識結果1543は、受信した局所特徴量と局所特徴量DB221に格納された局所特徴量との照合から認識された、学習対象物認識結果を示す。関連情報1544は、学習対象物認識結果1543の学習対象物に対応して関連情報DB222から検索された関連情報を示す。リンク情報1545は、学習対象物認識結果1543の学習対象物に対応してリンク情報DB223から検索されたリンク情報を示す。送受信データ1546は、通信制御部701を介して送受信される送受信データを示す。 The RAM 1540 is a random access memory that the CPU 1510 uses as a work area for temporary storage. The RAM 1540 has an area for storing data necessary for realizing the present embodiment. The received local feature value 1541 indicates a local feature value including the feature point coordinates received from the communication terminal 210. The read local feature value 1542 indicates the local feature value when including the feature point coordinates read from the local feature value DB 221. The learning target object recognition result 1543 indicates a learning target object recognition result recognized from collation between the received local feature value and the local feature value stored in the local feature value DB 221. The related information 1544 indicates related information retrieved from the related information DB 222 in correspondence with the learning object of the learning object recognition result 1543. The link information 1545 indicates link information searched from the link information DB 223 corresponding to the learning object of the learning object recognition result 1543. Transmission / reception data 1546 indicates transmission / reception data transmitted / received via the communication control unit 701.
 ストレージ1550には、データベースや各種のパラメータ、あるいは本実施形態の実現に必要な以下のデータまたはプログラムが記憶されている。局所特徴量DB221は、図8に示したと同様の局所特徴量DBを示す。関連情報DB222は、図9に示したと同様の関連情報DBを示す。リンク情報DB223は、図10に示したと同様のリンク情報DBを示す。 The storage 1550 stores a database, various parameters, or the following data or programs necessary for realizing the present embodiment. The local feature DB 221 is a local feature DB similar to that shown in FIG. The related information DB 222 is a related information DB similar to that shown in FIG. The link information DB 223 shows the same link information DB as shown in FIG.
 ストレージ1550には、以下のプログラムが格納される。学習対象物認識サーバ制御プログラム1551は、本学習対象物認識サーバ220の全体を制御する学習対象物認識サーバ制御プログラムを示す。局所特徴量DB作成モジュール1552は、学習対象物認識サーバ制御プログラム1551において、学習対象物の画像から局所特徴量を生成して局所特徴量DB221に格納するモジュールを示す。学習対象物認識モジュール1553は、学習対象物認識サーバ制御プログラム1551において、受信した局所特徴量と局所特徴量DB221に格納された局所特徴量とを照合して学習対象物を認識するモジュールを示す。関連情報/リンク情報取得モジュール1554は、認識した学習対象物に対応して関連情報DB222やリンク情報DB223から関連情報やリンク情報を取得するモジュールを示す。認識結果/情報送信モジュール1555は、認識した学習対象物名、取得した関連情報やリンク情報を送信するモジュールを示す。 The storage 1550 stores the following programs. The learning target object recognition server control program 1551 indicates a learning target object recognition server control program that controls the entire learning target object recognition server 220. The local feature DB creation module 1552 indicates a module that generates a local feature from a learning target image and stores it in the local feature DB 221 in the learning target recognition server control program 1551. The learning target object recognition module 1553 is a module that recognizes the learning target object in the learning target object recognition server control program 1551 by comparing the received local feature value with the local feature value stored in the local feature value DB 221. The related information / link information acquisition module 1554 indicates a module that acquires related information and link information from the related information DB 222 and the link information DB 223 corresponding to the recognized learning object. The recognition result / information transmission module 1555 indicates a module that transmits a recognized learning object name, acquired related information, and link information.
 なお、図15には、本実施形態に必須なデータやプログラムのみが示されており、本実施形態に関連しないデータやプログラムは図示されていない。 Note that FIG. 15 shows only data and programs essential to the present embodiment, and does not illustrate data and programs not related to the present embodiment.
 《学習対象物認識サーバの処理手順》
 図16は、本実施形態に係る学習対象物認識サーバ220の処理手順を示すフローチャートである。このフローチャートは、図15のCPU1510によりRAM1540を使用して実行され、図7の学習対象物認識サーバ220の各機能構成部を実現する。
<< Processing procedure of the learning object recognition server >>
FIG. 16 is a flowchart showing a processing procedure of the learning object recognition server 220 according to the present embodiment. This flowchart is executed by the CPU 1510 of FIG. 15 using the RAM 1540, and implements each functional component of the learning object recognition server 220 of FIG.
 まず、ステップS1611において、局所特徴量DBの生成か否かを判定する。また、ステップS1621において、通信端末からの局所特徴量受信かを判定する。いずれでもなければ、ステップS1641において他の処理を行なう。 First, in step S1611, it is determined whether or not a local feature DB is generated. In step S1621, it is determined whether a local feature amount is received from the communication terminal. Otherwise, other processing is performed in step S1641.
 局所特徴量DBの生成であればステップS1613に進んで、局所特徴量DB生成処理を実行する(図17参照)。また、局所特徴量の受信であればステップS1623に進んで、学習対象物認識処理を行なう(図18Aおよび図18B参照)。 If the local feature DB is generated, the process advances to step S1613 to execute a local feature DB generation process (see FIG. 17). If a local feature is received, the process proceeds to step S1623 to perform learning object recognition processing (see FIGS. 18A and 18B).
 次に、ステップS1625において、認識した学習対象物に対応する関連情報やリンク情報を取得する。そして、認識した学習対象物名、関連情報、リンク情報を通信端末210に送信する。 Next, in step S1625, related information and link information corresponding to the recognized learning object are acquired. Then, the recognized learning object name, related information, and link information are transmitted to the communication terminal 210.
 (局所特徴量DB生成処理)
 図17は、本実施形態に係る局所特徴量DB生成処理S1613の処理手順を示すフローチャートである。
(Local feature DB generation processing)
FIG. 17 is a flowchart showing a processing procedure of local feature DB generation processing S1613 according to the present embodiment.
 まず、ステップS1701において、学習対象物の画像を取得する。ステップS1703においては、特徴点の位置座標、スケール、角度を検出する。ステップS1705において、ステップS1703で検出された特徴点の1つに対して局所領域を取得する。次に、ステップS1707において、局所領域をサブ領域に分割する。ステップS1709においては、各サブ領域の特徴ベクトルを生成して局所領域の特徴ベクトルを生成する。ステップS1705からS1709の処理は図11Bに図示されている。 First, in step S1701, an image of a learning object is acquired. In step S1703, the position coordinates, scale, and angle of the feature points are detected. In step S1705, a local region is acquired for one of the feature points detected in step S1703. Next, in step S1707, the local area is divided into sub-areas. In step S1709, a feature vector for each sub-region is generated to generate a local region feature vector. The processing from step S1705 to S1709 is illustrated in FIG. 11B.
 次に、ステップS1711において、ステップS1709において生成された局所領域の特徴ベクトルに対して次元選定を実行する。次元選定については、図11D~図11Fに図示されている。しかしながら、局所特徴量DB221の生成においては、次元選定における階層化を実行するが、生成された全ての特徴ベクトルを格納するのが望ましい。 Next, in step S1711, dimension selection is performed on the feature vector of the local region generated in step S1709. The dimension selection is illustrated in FIGS. 11D to 11F. However, in the generation of the local feature DB 221, hierarchization is performed in dimension selection, but it is desirable to store all generated feature vectors.
 ステップS1713においては、ステップS1703で検出した全特徴点について局所特徴量の生成と次元選定とが終了したかを判定する。終了していない場合はステップS1705に戻って、次の1つの特徴点について処理を繰り返す。全特徴点について終了した場合はステップS1715に進んで、学習対象物に対応付けて局所特徴量と特徴点座標とを局所特徴量DB221に登録する。 In step S1713, it is determined whether generation of local feature values and dimension selection have been completed for all feature points detected in step S1703. If not completed, the process returns to step S1705 to repeat the process for the next one feature point. If all feature points have been completed, the process advances to step S1715 to register local feature values and feature point coordinates in the local feature value DB 221 in association with the learning object.
 ステップS1717においては、他の学習対象物の画像があるか否かを判定する。他の学習対象物の画像があればステップS1701に戻って、他の学習対象物の画像を取得して処理を繰り返す。 In step S1717, it is determined whether there is an image of another learning object. If there is an image of another learning object, the process returns to step S1701 to acquire an image of another learning object and repeat the process.
 (学習対象物認識処理)
 図18Aは、本実施形態に係る学習対象物認識処理S1623の処理手順を示すフローチャートである。
(Learning object recognition processing)
FIG. 18A is a flowchart showing a processing procedure of learning object recognition processing S1623 according to the present embodiment.
 まず、ステップS1811において、局所特徴量DB221から1つの学習対象物の局所特徴量を取得する。そして、ステップS1813において、学習対象物の局所特徴量と通信端末210から受信した局所特徴量との照合を行なう(図18B参照)。 First, in step S1811, the local feature amount of one learning object is acquired from the local feature amount DB 221. In step S1813, the local feature amount of the learning object is collated with the local feature amount received from the communication terminal 210 (see FIG. 18B).
 ステップS1815において、合致したか否かを判定する。合致していればステップS1821に進んで、合致した学習対象物を、通信端末210が撮像した映像中にあるとして記憶する。 In step S1815, it is determined whether or not they match. If they match, the process proceeds to step S1821, and the matched learning object is stored as being in the image captured by the communication terminal 210.
 ステップS1817においては、局所特徴量DB221に登録されている全学習対象物を照合したかを判定し、残りがあればステップS1811に戻って次の学習対象物の照合を繰り返す。なお、かかる照合においては、処理速度の向上によるリルタイム処理あるいは学習対象物認識サーバの負荷低減のため、あらかじめ分野の限定を行なってもよい。 In step S1817, it is determined whether or not all learning objects registered in the local feature DB 221 have been collated, and if there is any remaining, the process returns to step S1811 to repeat the collation of the next learning object. In such collation, the field may be limited in advance in order to reduce the load on the learning time object recognition server or the drill time process by improving the processing speed.
 (照合処理)
 図18Bは、本実施形態に係る照合処理S1813の処理手順を示すフローチャートである。
(Verification process)
FIG. 18B is a flowchart showing a processing procedure of collation processing S1813 according to the present embodiment.
 まず、ステップS1831において、初期化として、パラメータp=1,q=0を設定する。次に、ステップS1833において、局所特徴量DB221の局所特徴量の次元数iと、受信した局所特徴量の次元数jとの、より少ない次元数を選択する。 First, in step S1831, parameters p = 1 and q = 0 are set as initialization. Next, in step S1833, a smaller number of dimensions is selected between the dimension number i of the local feature quantity in the local feature quantity DB 221 and the dimension number j of the received local feature quantity.
 ステップS1835~S1845のループにおいて、p>m(m=学習対象物の特徴点数)となるまで各局所特徴量の照合を繰り返す。まず、ステップS1835において、局所特徴量DB221に格納された学習対象物の第p番局所特徴量の選択された次元数のデータを取得する。すなわち、最初の1次元から選択された次元数を取得する。次に、ステップS1837において、ステップS1835において取得した第p番局所特徴量と入力映像から生成した全特徴点の局所特徴量を順に照合して、類似か否かを判定する。ステップS1839においては、局所特徴量間の照合の結果から類似度が閾値αを超えるか否かを判断し、超える場合はステップS1841において、局所特徴量と、入力映像と学習対象物とにおける合致した特徴点の位置関係との組みを記憶する。そして、合致した特徴点数のパラメータであるqを1つカウントアップする。ステップS1843においては、学習対象物の特徴点を次の特徴点に進め(p←p+1)、学習対象物の全特徴点の照合が終わってない場合には(p≦m)、ステップS1835に戻って合致する局所特徴量の照合を繰り返す。なお、閾値αは、学習対象物によって求められる認識精度に対応して変更可能である。ここで、他の学習対象物との相関が低い学習対象物であれば認識精度を低くしても、正確な認識が可能である。 In the loop of steps S1835 to S1845, the collation of each local feature amount is repeated until p> m (m = number of feature points of learning object). First, in step S1835, data of the selected number of dimensions of the p-th local feature amount of the learning target stored in the local feature amount DB 221 is acquired. That is, the number of dimensions selected from the first one dimension is acquired. Next, in step S1837, the p-th local feature value acquired in step S1835 and the local feature values of all feature points generated from the input video are sequentially checked to determine whether or not they are similar. In step S1839, it is determined whether or not the similarity exceeds the threshold value α from the result of collation between the local feature amounts. If so, the local feature amount matches the input image and the learning object in step S1841. A combination with the positional relationship of feature points is stored. Then, q, which is a parameter for the number of matched feature points, is incremented by one. In step S1843, the feature point of the learning object is advanced to the next feature point (p ← p + 1), and if all feature points of the learning object have not been collated (p ≦ m), the process returns to step S1835. Repeat matching of matching local features. Note that the threshold value α can be changed according to the recognition accuracy required by the learning object. Here, if the learning object has a low correlation with other learning objects, accurate recognition is possible even if the recognition accuracy is lowered.
 学習対象物の全特徴点との照合が終了すると、ステップS1845からS1847に進んで、ステップS1847~S1853において、学習対象物が入力映像に存在するか否かが判定される。まず、ステップS1847において、学習対象物の特徴点数pの内で入力映像の特徴点の局所特徴量と合致した特徴点数qの割合が、閾値βを超えたか否かを判定する。超えていればステップS1849に進んで、学習対象物候補として、さらに、入力映像の特徴点と学習対象物の特徴点との位置関係が、線形変換が可能な関係を有しているかを判定する。すなわち、ステップS1841において局所特徴量が合致したとして記憶した、入力映像の特徴点と学習対象物の特徴点との位置関係が、回転や反転、視点の位置変更などの変化によっても可能な位置関係なのか、変更不可能な位置関係なのかを判定する。かかる判定方法は幾何学的に既知であるので、詳細な説明は省略する。ステップS1851において、線形変換可能か否かの判定結果により、線形変換可能であればステップS953に進んで、照合した学習対象物が入力映像に存在すると判定する。なお、閾値βは、学習対象物によって求められる認識精度に対応して変更可能である。ここで、他の学習対象物との相関が低い、あるいは一部分からでも特徴が判断可能な学習対象物であれば合致した特徴点が少なくても、正確な認識が可能である。すなわち、一部分が隠れて見えなくても、あるいは特徴的な一部分が見えてさえいれば、学習対象物の認識が可能である。 When the collation with all feature points of the learning object is completed, the process proceeds from step S1845 to S1847, and it is determined in steps S1847 to S1853 whether or not the learning object exists in the input video. First, in step S1847, it is determined whether or not the ratio of the feature point number q that matches the local feature amount of the feature point of the input video among the feature point number p of the learning object exceeds the threshold value β. If it exceeds, the process proceeds to step S1849, and it is further determined as a learning target candidate whether the positional relationship between the feature point of the input video and the feature point of the learning target has a relationship that allows linear transformation. . In other words, the positional relationship between the feature point of the input image and the feature point of the learning target stored as the local feature amount matches in step S1841 is possible even by a change such as rotation, inversion, or change of the viewpoint position. Or whether the positional relationship cannot be changed. Since such a determination method is geometrically known, detailed description thereof is omitted. If it is determined in step S1851 that the linear conversion is possible, the process proceeds to step S953 to determine that the collated learning object exists in the input video. Note that the threshold value β can be changed according to the recognition accuracy required by the learning object. Here, if the learning object has a low correlation with other learning objects or a feature can be determined even from a part, accurate recognition is possible even if there are few matching feature points. That is, even if a part is hidden and cannot be seen, or if a characteristic part is visible, the learning object can be recognized.
 ステップS1855においては、局所特徴量DB221に未照合の学習対象物が残っているか否かを判定する。まだ学習対象物が残っていれば、ステップS957において次の学習対象物を設定して、パラメータp=1,q=0に初期化し、ステップS935に戻って照合を繰り返す。 In step S1855, it is determined whether or not an unmatched learning object remains in the local feature DB 221. If the learning object still remains, the next learning object is set in step S957, initialized to parameters p = 1 and q = 0, and the process returns to step S935 to repeat the collation.
 なお、かかる照合処理の説明からも明らかなように、あらゆる学習対象物を局所特徴量DB221に記憶して、全学習対象物を照合する処理は、負荷が非常に大きくなる。したがって、例えば、入力映像から学習対象物を認識する前にユーザが学習対象物の範囲をメニューから選択して、その範囲を局所特徴量DB221から検索して照合することが考えられる。また、局所特徴量DB221にユーザが使用する範囲の局所特徴量のみを記憶することによっても、負荷を軽減できる。 Note that, as is clear from the description of the collation processing, the processing for storing all the learning objects in the local feature DB 221 and collating all the learning objects is very heavy. Therefore, for example, before recognizing the learning object from the input video, it is conceivable that the user selects a range of the learning object from the menu, searches the range from the local feature DB 221 and collates the range. Also, the load can be reduced by storing only the local feature amount in the range used by the user in the local feature amount DB 221.
 [第3実施形態]
 次に、本発明の第3実施形態に係る情報処理システムについて説明する。本実施形態に係る情報処理システムは、上記第2実施形態と比べると、ユーザがリンク先アクセス操作をしなくても、自動的にリンク先から関連情報をアクセスする点で異なる。その他の構成および動作は、第2実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。
[Third Embodiment]
Next, an information processing system according to the third embodiment of the present invention will be described. The information processing system according to the present embodiment is different from the second embodiment in that related information is automatically accessed from a link destination even if the user does not perform a link destination access operation. Since other configurations and operations are the same as those of the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
 本実施形態によれば、ユーザの操作なしに、リアルタイムに映像中の画像内の学習対象物に対応付けて、リンク先の関連情報を報知できる。 According to the present embodiment, it is possible to notify related information of a link destination in association with a learning object in an image in a video in real time without a user operation.
 《情報処理システムの動作手順》
 図19は、本実施形態に係る情報処理システムの動作手順を示すシーケンス図である。なお、図19において、第2実施形態の図5と同様の動作は同じステップ番号を付して、説明は省略する。
<< Operation procedure of information processing system >>
FIG. 19 is a sequence diagram showing an operation procedure of the information processing system according to the present embodiment. In FIG. 19, operations similar to those in FIG. 5 of the second embodiment are denoted by the same step numbers, and description thereof is omitted.
 ステップS400およびS401においては、アプリケーションやデータの相違の可能性はあるが、図4および図5と同様にダウンロードおよび起動と初期化が行なわれる。 In steps S400 and S401, although there is a possibility that there is a difference between applications and data, download, activation and initialization are performed in the same manner as in FIGS.
 ステップS411において通信端末210から受信した映像の局所特徴量から、映像中の学習対象物を認識した学習対象物認識サーバ220は、ステップS513において、リンク情報DB223を参照して、認識した学習対象物に対応するリンク情報を取得する。 The learning object recognition server 220 that has recognized the learning object in the video from the local feature amount of the video received from the communication terminal 210 in step S411 refers to the link information DB 223 in step S513 and recognizes the learning target object. Get link information corresponding to.
 もし、取得したリンク情報が複数あれば、ステップS1915において、リンク先が選択される。リンク先の選択は、例えば、通信端末210を使用するユーザの指示や学習対象物認識サーバ220によるユーザ認識に基づいて行なってよいが、ここでは詳細な説明は省略する。ステップS1917において、リンク情報に基づいてリンク先の関連情報提供サーバ230を認識した学習対象物IDを持ってアクセスする。なお、図19の動作手順においては、リンク先アクセスで映像の局所特徴量を送信した通信端末IDも送信する。 If there are a plurality of acquired link information, a link destination is selected in step S1915. The selection of the link destination may be performed based on, for example, an instruction from a user using the communication terminal 210 or user recognition by the learning object recognition server 220, but detailed description thereof is omitted here. In step S1917, access is performed with the learning object ID that recognizes the related information providing server 230 of the link destination based on the link information. In the operation procedure of FIG. 19, the communication terminal ID that has transmitted the local feature amount of the video by the link destination access is also transmitted.
 関連情報提供サーバ230は、アクセスに付随する学習対象物IDに対応する学習対象物関連情報(文書データや音声データを含む)を関連情報DB231から取得する。そして、ステップS525において、アクセス元の通信端末210に関連情報を返信する。ここで、ステップS1917において、送信された通信端末IDが使用される。 The related information providing server 230 acquires learning object related information (including document data and voice data) corresponding to the learning object ID accompanying the access from the related information DB 231. In step S525, the related information is returned to the access source communication terminal 210. Here, in step S1917, the transmitted communication terminal ID is used.
 関連情報の返信を受けた通信端末210は、ステップS527において、受信した関連情報を表示あるいは音声出力する。 The communication terminal 210 that has received the reply of the related information displays or outputs the received related information in step S527.
 なお、図19においては、学習対象物認識サーバ220からのリンク先アクセスへの応答が、通信端末210に対して行なわれる場合を説明した。しかしながら、学習対象物認識サーバ220がリンク先からの返信を受けて、通信端末210に中継する構成であってもよい。あるいは、通信端末210において、リンク情報を受信するとリンク先への自動アクセスを行ない、リンク先からの返信を報知する構成であってもよい。 In FIG. 19, the case where the response to the link destination access from the learning object recognition server 220 is made to the communication terminal 210 has been described. However, the learning object recognition server 220 may be configured to receive a reply from the link destination and relay it to the communication terminal 210. Alternatively, the communication terminal 210 may be configured such that when link information is received, automatic access to the link destination is performed and a reply from the link destination is notified.
 [第4実施形態]
 次に、本発明の第4実施形態に係る情報処理システムについて説明する。本実施形態に係る情報処理システムは、上記第2実施形態および第3実施形態を、言語を含む学習対象物に適用したものである。その他の構成および動作は、第2実施形態あるいは第3実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。
[Fourth Embodiment]
Next, an information processing system according to the fourth embodiment of the present invention will be described. The information processing system according to the present embodiment is obtained by applying the second and third embodiments to a learning object including a language. Since other configurations and operations are the same as those in the second embodiment or the third embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
 本実施形態によれば、言語を含む学習対象物を認識し、その関連情報、特に読みの学習ができる。 According to this embodiment, it is possible to recognize a learning object including a language and learn related information, particularly reading.
 《情報処理システムの動作手順》
 以下、本実施形態に係る情報処理システムの動作手順の数例をシーケンス図に従って説明する。なお、言語を含む学習対象物への適用は、これに限定されない。
<< Operation procedure of information processing system >>
Hereinafter, several examples of operation procedures of the information processing system according to the present embodiment will be described with reference to sequence diagrams. Note that application to a learning object including a language is not limited to this.
 (本の認識)
 図20Aは、本実施形態に係る情報処理システムにおける本の認識の動作手順を示すシーケンス図である。なお、第2実施形態の図4と同様の動作手順には同じステップ番号を付して、説明は省略する。
(Book recognition)
FIG. 20A is a sequence diagram illustrating an operation procedure of book recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.
 ステップS400およびS401においては、アプリケーションやデータの相違の可能性はあるが、図4と同様にダウンロードおよび起動と初期化が行なわれる。 In steps S400 and S401, although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
 まず、ステップS2013においては、通信端末210による本の背表紙や本の宣伝画像の撮影が行なわれる。なお、本例では、映像を背表紙や宣伝画像で代表させたが、本ケースや表紙、奥付、目次などであっても、他の書籍関連画像であってもよい。 First, in step S2013, the back cover of the book and the advertisement image of the book are taken by the communication terminal 210. In this example, the video is represented by a back cover or a promotional image, but it may be a book case, a cover, an imprint, a table of contents, or other book-related images.
 学習対象物認識サーバ220は、通信端末210からの局所特徴量を受信して、ステップS2021において、局所特徴量DB221を参照して、本の認識を行なう。書名などの応答であれば、ステップS2023において、学習対象物認識サーバ220から通信端末210に認識結果を送信する。本の内容を表示あるいは音声で紹介する場合には、ステップS2025において、内容紹介DB2022を参照して、認識した本に対応する内容紹介データを取得する。そして、ステップS2027において、学習対象物認識サーバ220から通信端末210に認識結果および内容紹介データを送信する。 The learning object recognition server 220 receives the local feature amount from the communication terminal 210, and recognizes the book with reference to the local feature amount DB 221 in step S2021. If it is a response such as a book name, the recognition result is transmitted from the learning object recognition server 220 to the communication terminal 210 in step S2023. When introducing the contents of a book by display or voice, in step S2025, the contents introduction DB 2022 is referred to, and contents introduction data corresponding to the recognized book is acquired. In step S2027, the learning object recognition server 220 transmits the recognition result and the content introduction data to the communication terminal 210.
 通信端末210は、学習対象物認識サーバ220から認識結果または内容紹介を受信して、ステップS2029において、認識結果あるいは内容を表示および/または音声でユーザに報知する。 The communication terminal 210 receives the recognition result or the content introduction from the learning object recognition server 220, and notifies the user of the recognition result or the content by display and / or voice in step S2029.
 また、内容紹介をリンク先の関連情報提供サーバ230から取得する場合には、ステップS2031において、リンク情報DB223を参照して、認識した本に対応するリンク先を取得する。そして、ステップS2033において、学習対象物認識サーバ220からリンク先にアクセスする。 Also, when content introduction is acquired from the link related information providing server 230, in step S2031, the link information DB 223 is referred to, and the link destination corresponding to the recognized book is acquired. In step S2033, the learning object recognition server 220 accesses the link destination.
 リンク先の関連情報提供サーバ230は、ステップS2035において、内容紹介DB2023を参照して、本に対応する内容紹介データを取得する。そして、ステップS2037において、関連情報提供サーバ230から通信端末210に認識結果および内容紹介データを送信する。 In step S2035, the linked related information providing server 230 refers to the content introduction DB 2023 and acquires content introduction data corresponding to the book. In step S2037, the related information providing server 230 transmits the recognition result and the content introduction data to the communication terminal 210.
 通信端末210は、関連情報提供サーバ230から認識結果または内容紹介を受信して、ステップS2039において、認識結果あるいは内容を表示および/または音声でユーザに報知する。ここで、通信端末210のアドレスは、ステップS2033におけるアクセスで学習対象物認識サーバ220から得るものとする。 The communication terminal 210 receives the recognition result or the content introduction from the related information providing server 230, and notifies the user of the recognition result or the content by display and / or voice in step S2039. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2033.
 なお、図20Aにおいては、学習対象物認識サーバ220がリンク先を自動アクセスする手順を示したが、学習対象物認識サーバ220がリンク先を通信端末210に返信し、通信端末210におけるリンク指示を待つ構成であってもよい。 20A shows the procedure in which the learning target object recognition server 220 automatically accesses the link destination. However, the learning target object recognition server 220 returns the link destination to the communication terminal 210, and issues a link instruction in the communication terminal 210. It may be configured to wait.
 (ページ認識)
 図20Bは、本実施形態に係る情報処理システムにおけるページ認識の動作手順を示すシーケンス図である。なお、第2実施形態の図4と同様の動作手順には同じステップ番号を付して、説明は省略する。
(Page recognition)
FIG. 20B is a sequence diagram illustrating an operation procedure of page recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.
 ステップS400およびS401においては、アプリケーションやデータの相違の可能性はあるが、図4と同様にダウンロードおよび起動と初期化が行なわれる。 In steps S400 and S401, although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
 まず、ステップS2043においては、通信端末210による本を開いてページの撮影が行なわれる。なお、見開き2ページであっても、1ページであっても、ページの一部であっても、ページ内の写真や図、表などであってもよい。 First, in step S2043, a book is opened by the communication terminal 210 and a page is shot. In addition, it may be a two-page spread, a single page, a part of the page, a photograph, a diagram, a table, or the like in the page.
 学習対象物認識サーバ220は、通信端末210からの局所特徴量を受信して、ステップS2051において、局所特徴量DB221を参照して、ページの認識を行なう。次に、ステップS2053において、ページ情報DB2024を参照して、認識したページに対応する朗読音声によるページ情報を取得する。そして、ステップS2055において、学習対象物認識サーバ220から通信端末210にページ朗読音声のページデータを送信する。 The learning target object recognition server 220 receives the local feature value from the communication terminal 210, and recognizes the page with reference to the local feature value DB 221 in step S2051. Next, in step S2053, the page information DB 2024 is referred to, and page information based on the reading voice corresponding to the recognized page is acquired. In step S2055, page data of page reading voice is transmitted from the learning object recognition server 220 to the communication terminal 210.
 通信端末210は、学習対象物認識サーバ220からページデータを受信して、ステップS2057において、ページ内容をページ朗読音声の再生によりユーザに報知する。 The communication terminal 210 receives the page data from the learning object recognition server 220, and notifies the user of the page content by reproducing the page reading voice in step S2057.
 また、ページ情報をリンク先の関連情報提供サーバ230から取得する場合には、ステップS2061において、リンク情報DB223を参照して、認識したページに対応するリンク先を取得する。そして、ステップS2063において、学習対象物認識サーバ220からリンク先にアクセスする。 Further, when acquiring page information from the related information providing server 230 of the link destination, in step S2061, the link information corresponding to the recognized page is acquired with reference to the link information DB 223. In step S2063, the link destination is accessed from the learning object recognition server 220.
 リンク先の関連情報提供サーバ230は、ステップS2065において、ページ情報DB2025を参照して、ページに対応するページ情報を取得する。そして、ステップS2067において、関連情報提供サーバ230から通信端末210にページ朗読音声のページデータを送信する。 In step S2065, the link related information providing server 230 refers to the page information DB 2025 and acquires page information corresponding to the page. In step S2067, the page information of the page reading voice is transmitted from the related information providing server 230 to the communication terminal 210.
 通信端末210は、関連情報提供サーバ230からページデータを受信して、ステップS2069において、ページ内容をページ朗読音声の再生によりユーザに報知する。ここで、通信端末210のアドレスは、ステップS2063におけるアクセスで学習対象物認識サーバ220から得るものとする。 The communication terminal 210 receives the page data from the related information providing server 230, and notifies the user of the page content by reproducing the page reading voice in step S2069. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2063.
 なお、図20Aと同様に、図20Bにおいても、学習対象物認識サーバ220がリンク先を通信端末210に返信し、通信端末210におけるリンク指示を待つ構成であってもよい。 20A, the learning object recognition server 220 may return a link destination to the communication terminal 210 and wait for a link instruction in the communication terminal 210, as in FIG. 20B.
 (漢字認識)
 図20Cは、本実施形態に係る情報処理システムにおける漢字認識の動作手順を示すシーケンス図である。なお、第2実施形態の図4と同様の動作手順には同じステップ番号を付して、説明は省略する。
(Kanji recognition)
FIG. 20C is a sequence diagram illustrating an operation procedure for kanji recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.
 ステップS400およびS401においては、アプリケーションやデータの相違の可能性はあるが、図4と同様にダウンロードおよび起動と初期化が行なわれる。 In steps S400 and S401, although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
 まず、ステップS2073においては、通信端末210による本中の漢字や熟語、文章の撮影をが行なわれる。なお、図20Aのような表紙の撮影であっても、図20Bのようなページの撮影などであってもよい。 First, in step S2073, the communication terminal 210 shoots kanji, idioms, and sentences in the book. Note that the cover may be taken as shown in FIG. 20A or the page may be taken as shown in FIG. 20B.
 学習対象物認識サーバ220は、通信端末210からの局所特徴量を受信して、ステップS2081において、局所特徴量DB221を参照して漢字や熟語、文章の認識を行なう。次に、ステップS2083において、辞書DB2026を参照して、認識した漢字や熟語、文章に対応する表示または音声による読み方や意味を取得する。そして、ステップS2085において、学習対象物認識サーバ220から通信端末210に読み方や意味を示す表示/音声データを送信する。 The learning object recognition server 220 receives the local feature amount from the communication terminal 210, and recognizes kanji, idioms, and sentences with reference to the local feature amount DB 221 in step S2081. Next, in step S2083, the dictionary DB 2026 is referred to, and the display or voice reading or meaning corresponding to the recognized kanji, idiom, or sentence is acquired. In step S2085, the learning object recognition server 220 transmits display / audio data indicating the reading and meaning to the communication terminal 210.
 通信端末210は、学習対象物認識サーバ220から読み方や意味を示す表示/音声データを受信して、ステップS2087において、読み方や意味を表示と音声の再生によりユーザに報知する。 The communication terminal 210 receives display / audio data indicating how to read and the meaning from the learning object recognition server 220, and notifies the user of the reading and the meaning by displaying and reproducing the sound in step S2087.
 また、読み方や意味をリンク先の関連情報提供サーバ230から取得する場合には、ステップS2091において、リンク情報DB223を参照して、認識した漢字や熟語、文章に対応するリンク先を取得する。そして、ステップS2093において、学習対象物認識サーバ220からリンク先にアクセスする。 Also, when acquiring the reading and meaning from the related information providing server 230 of the link destination, in step S2091, the link information corresponding to the recognized kanji, idiom, and sentence is acquired with reference to the link information DB 223. In step S2093, the learning target object recognition server 220 accesses the link destination.
 リンク先の関連情報提供サーバ230は、ステップS2095において、辞書DB2027を参照して、漢字や熟語、文章に対応する読み方や意味を取得する。そして、ステップS2097において、関連情報提供サーバ230から通信端末210に読み方や意味を示す表示/音声データを送信する。 In step S2095, the linked related information providing server 230 refers to the dictionary DB 2027, and acquires readings and meanings corresponding to kanji, idioms, and sentences. In step S2097, the related information providing server 230 transmits display / audio data indicating the reading and meaning to the communication terminal 210.
 通信端末210は、関連情報提供サーバ230からページデータを受信して、ステップS2099において、漢字や熟語、文章の読み方や意味を示す表示や、音声データの再生によりユーザに報知する。ここで、通信端末210のアドレスは、ステップS2093におけるアクセスで学習対象物認識サーバ220から得るものとする。 The communication terminal 210 receives the page data from the related information providing server 230, and notifies the user in step S2099 by displaying kanji, idioms, how to read and meaning the sentence, and reproducing the voice data. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2093.
 なお、図20Cにおける表示と音声による報知ついては、複数の漢字や熟語、文章を含む映像においては表示による報知が望ましく、1つの漢字や熟語、文章の映像あるいはその一部であれば音声による報知が望ましい。
また、図20Aおよび図20Bと同様に、図20Cにおいても、学習対象物認識サーバ220がリンク先を通信端末210に返信し、通信端末210におけるリンク指示を待つ構成であってもよい。
Note that the display and voice notification in FIG. 20C is preferably displayed for images including a plurality of kanji, idioms, and sentences, and if one kanji, idiom, sentence video, or a part thereof, is notified by voice. desirable.
Similarly to FIGS. 20A and 20B, in FIG. 20C, the learning target object recognition server 220 may return the link destination to the communication terminal 210 and wait for a link instruction in the communication terminal 210.
 (他の認識)
 なお、本実施形態では図示しないが、単語、句、文章の他言語への翻訳結果を、表示や音声で行なうことも同様にできる。
(Other recognition)
Although not shown in the present embodiment, the result of translation of words, phrases, and sentences into other languages can be similarly displayed or voiced.
 《データベース》
 次に、上記図20A乃至図20Cの動作に使用されるそれぞれのデータベースの構成を、図21A乃至図21Cを参照して説明する。なお、翻訳例に使用されるデータベースの構成を、図21Dに示す。また、これらのデータベースは、局所特徴量DB221と別個に説明するが、局所特徴量に対応付けて一体に設けてもよい。
《Database》
Next, the configuration of each database used in the operations of FIGS. 20A to 20C will be described with reference to FIGS. 21A to 21C. In addition, the structure of the database used for the translation example is shown in FIG. 21D. These databases will be described separately from the local feature DB 221, but may be provided integrally with the local feature.
 (内容紹介DB)
 図21Aは、本実施形態に係る内容紹介DB2022または2023の構成を示す図である。なお、内容紹介DB2022と内容紹介DB2023とは基本的に同ような構成であるが、記憶容量を考慮すると、内容紹介DB2022よりも内容紹介DB2023の方が、より詳しい内容あるいはより多くの項目を用意できる。
(Content introduction DB)
FIG. 21A is a diagram showing a configuration of the content introduction DB 2022 or 2023 according to the present embodiment. The content introduction DB 2022 and the content introduction DB 2023 have basically the same configuration, but considering the storage capacity, the content introduction DB 2023 provides more detailed content or more items than the content introduction DB 2022. it can.
 内容紹介DB2022または2023は、本ID2111に対応付けて、作品名2112、作家2113、出版社2114、発行日2115、および表示データと音声データとを含む内容紹介情報2116を記憶する。なお、全てを内容紹介情報に含めてもよい。 The content introduction DB 2022 or 2023 stores a work name 2112, a writer 2113, a publisher 2114, an issue date 2115, and content introduction information 2116 including display data and audio data in association with the book ID 2111. All may be included in the content introduction information.
 (ページ情報DB)
 図21Bは、本実施形態に係るページ情報DB2024または2025の構成を示す図である。なお、ページ情報DB2024とページ情報DB2025とは基本的に同ような構成であるが、記憶容量を考慮すると、ページ情報DB2024よりもページ情報DB2025の方が、より詳しい内容あるいはより多くの項目を用意できる。
(Page information DB)
FIG. 21B is a diagram showing a configuration of the page information DB 2024 or 2025 according to the present embodiment. Note that the page information DB 2024 and the page information DB 2025 have basically the same configuration, but considering the storage capacity, the page information DB 2025 provides more detailed contents or more items than the page information DB 2024. it can.
 ページ情報DB2024または2025は、本ID2121に対応付けて、頁番号2122、章/部の情報2123に対応付けて、第1朗読データ/話者2124や第2朗読データ/話者2125を記憶する。 The page information DB 2024 or 2025 stores the first reading data / speaker 2124 and the second reading data / speaker 2125 in association with the page number 2122 and the chapter / part information 2123 in association with the book ID 2121.
 (辞書DB)
 図21Cは、本実施形態に係る辞書DB2026または2027の構成を示す図である。なお、辞書DB2026と辞書DB2027とは基本的に同ような構成であるが、記憶容量を考慮すると、辞書DB2026よりも辞書DB2027の方が、より詳しい内容あるいはより多くの項目を用意できる。
(Dictionary DB)
FIG. 21C is a diagram showing a configuration of the dictionary DB 2026 or 2027 according to the present embodiment. The dictionary DB 2026 and the dictionary DB 2027 have basically the same configuration, but considering the storage capacity, the dictionary DB 2027 can prepare more detailed contents or more items than the dictionary DB 2026.
 辞書DB2026または2027は、例えば、3つの部分、漢字用DB2130、熟語用DB2140、文章用DB2150を有する。なお、全てが一体であってもよい。 The dictionary DB 2026 or 2027 has, for example, three parts, a kanji DB 2130, an idiom DB 2140, and a sentence DB 2150. All may be integrated.
 漢字用DB2130は、漢字ID2131に対応付けて、表示と音声とからなる訓読みデータ2132、音読みデータ2133、解説データ(意味/使い方)2134を記憶する。 The kanji DB 2130 stores kanji reading data 2132 composed of display and voice, reading data 2133, and explanation data (meaning / usage) 2134 in association with the kanji ID 2131.
 熟語用DB2140は、熟語ID2141に対応付けて、表示と音声とからなる読み方データ2142と解説データ(意味/使い方)2143を記憶する。 The idiom DB 2140 stores reading data 2142 and comment data (meaning / usage) 2143 composed of display and voice in association with the idiom ID 2141.
 文章用DB2150は、文章ID2151に対応付けて、表示と音声とからなる読み方データ2152と解説データ(意味/使い方)2153を記憶する。文章用DB2150は、ことわざや俳句、和歌などを含んでよい。 The text DB 2150 stores reading data 2152 and comment data (meaning / usage) 2153 composed of display and voice in association with the text ID 2151. The sentence DB 2150 may include proverbs, haiku, waka, and the like.
 (翻訳のための辞書DB)
 図21Dは、本実施形態に係る翻訳辞書DB2100の構成を示す図である。なお、図21Dには、日本語から外国語への翻訳辞書の構成を説明するが、他の翻訳辞書も同様である。
(Dictionary DB for translation)
FIG. 21D is a diagram showing a configuration of the translation dictionary DB 2100 according to the present embodiment. FIG. 21D illustrates the configuration of a translation dictionary from Japanese to a foreign language, but the same applies to other translation dictionaries.
 翻訳辞書DB2100は、例えば、3つの部分、単語用DB2160、句用DB2170、文用DB2180を有する。なお、全てが一体であってもよい。 The translation dictionary DB 2100 includes, for example, three parts, a word DB 2160, a phrase DB 2170, and a sentence DB 2180. All may be integrated.
 単語用DB2160は、日本語ID2161に対応付けて、表記と音声とからなる英単語データ2162、他言語データ2163、解説データ(意味/使い方)2164を記憶する。 The word DB 2160 stores, in association with the Japanese ID 2161, English word data 2162 composed of notation and voice, other language data 2163, and explanation data (meaning / usage) 2164.
 句用DB2170は、日本句ID2171に対応付けて、表記と音声とからなる英文フレーズデータ2172、他言語フレーズデータ2173、解説データ(意味/使い方)2174を記憶する。 The phrase DB 2170 stores English phrase data 2172 composed of notation and speech, other language phrase data 2173, and explanation data (meaning / usage) 2174 in association with the Japanese phrase ID 2171.
 文用DB2180は、日本文ID2181に対応付けて、表記と音声とからなる英文データ2182、他言語文データ2183、解説データ(意味/使い方)2184を記憶する。 The sentence DB 2180 stores, in association with the Japanese sentence ID 2181, English sentence data 2182 composed of notation and speech, other language sentence data 2183, and explanation data (meaning / usage) 2184.
 句用DB2170や文用DB2180は、ことわざや俳句、和歌、詩などを含んでよい。 The phrase DB 2170 and the sentence DB 2180 may include proverbs, haiku, waka, poetry, and the like.
 [第5実施形態]
 次に、本発明の第5実施形態に係る情報処理システムについて説明する。本実施形態に係る情報処理システムは、上記第2実施形態および第3実施形態を、音を含む学習対象物に適用したものである。その他の構成および動作は、第2実施形態あるいは第3実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。
[Fifth Embodiment]
Next, an information processing system according to the fifth embodiment of the present invention will be described. The information processing system according to the present embodiment is obtained by applying the second embodiment and the third embodiment to a learning object including sound. Since other configurations and operations are the same as those in the second embodiment or the third embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
 本実施形態によれば、音を含む学習対象物を認識し、その関連情報、特に演奏の学習ができる。 According to the present embodiment, it is possible to recognize a learning object including sound and learn related information, particularly performance.
 《情報処理システムの動作手順》
 以下、本実施形態に係る情報処理システムの動作手順の数例をシーケンス図に従って説明する。なお、音を含む学習対象物への適用は、これに限定されない。
<< Operation procedure of information processing system >>
Hereinafter, several examples of operation procedures of the information processing system according to the present embodiment will be described with reference to sequence diagrams. Note that application to a learning object including sound is not limited to this.
 (音楽認識)
 図22Aは、本実施形態に係る情報処理システムにおける音楽認識の動作手順を示すシーケンス図である。なお、第2実施形態の図4と同様の動作手順には同じステップ番号を付して、説明は省略する。
(Music recognition)
FIG. 22A is a sequence diagram showing an operation procedure of music recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.
 ステップS400およびS401においては、アプリケーションやデータの相違の可能性はあるが、図4と同様にダウンロードおよび起動と初期化が行なわれる。 In steps S400 and S401, although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
 まず、ステップS2213においては、通信端末210による音楽ジャケットやCDあるいはコンサートの宣伝画像の撮影が行なわれる。なお、本例では、ジャケットや宣伝画像で代表させたが、他の音楽関連画像であってもよい。 First, in step S2213, the communication terminal 210 shoots a music jacket, a CD, or a concert promotional image. In this example, the image is represented by a jacket or a promotional image, but other music-related images may be used.
 学習対象物認識サーバ220は、通信端末210からの局所特徴量を受信して、ステップS2221において、局所特徴量DB221を参照して、音楽の認識を行なう。アルバム名、演奏家、コンサート情報などの応答であれば、ステップS2223において、学習対象物認識サーバ220から通信端末210に認識結果を送信する。音楽の内容を表示あるいは音声で紹介する場合には、ステップS2225において、内容紹介DB2222を参照して、認識した音楽に対応する内容紹介データを取得する。そして、ステップS2227において、学習対象物認識サーバ220から通信端末210に認識結果および内容紹介データを送信する。 The learning object recognition server 220 receives the local feature amount from the communication terminal 210, and performs music recognition with reference to the local feature amount DB 221 in step S2221. If the response is an album name, a performer, concert information, etc., the recognition result is transmitted from the learning object recognition server 220 to the communication terminal 210 in step S2223. When introducing the contents of music by display or voice, in step S2225, the contents introduction DB 2222 is referred to, and contents introduction data corresponding to the recognized music is acquired. In step S2227, the learning object recognition server 220 transmits the recognition result and the content introduction data to the communication terminal 210.
 通信端末210は、学習対象物認識サーバ220から認識結果または内容紹介を受信して、ステップS2229において、認識結果あるいは内容を表示および/または音声でユーザに報知する。 The communication terminal 210 receives the recognition result or the content introduction from the learning object recognition server 220, and notifies the user of the recognition result or the content by display and / or voice in step S2229.
 また、内容紹介をリンク先の関連情報提供サーバ230から取得する場合には、ステップS2231において、リンク情報DB223を参照して、認識した音楽に対応するリンク先を取得する。そして、ステップS2233において、学習対象物認識サーバ220からリンク先にアクセスする。 Further, when content introduction is acquired from the link destination related information providing server 230, in step S2231, the link information DB 223 is referred to, and the link destination corresponding to the recognized music is acquired. In step S2233, the learning target object recognition server 220 accesses the link destination.
 リンク先の関連情報提供サーバ230は、ステップS2235において、内容紹介DB2023を参照して、音楽に対応する内容紹介データを取得する。そして、ステップS2237において、関連情報提供サーバ230から通信端末210に認識結果および内容紹介データを送信する。 In step S2235, the linked related information providing server 230 refers to the content introduction DB 2023 and acquires content introduction data corresponding to music. In step S2237, the related information providing server 230 transmits the recognition result and the content introduction data to the communication terminal 210.
 通信端末210は、関連情報提供サーバ230から認識結果または内容紹介を受信して、ステップS2239において、認識結果あるいは内容を表示および/または音声でユーザに報知する。ここで、通信端末210のアドレスは、ステップS2233におけるアクセスで学習対象物認識サーバ220から得るものとする。 The communication terminal 210 receives the recognition result or the content introduction from the related information providing server 230, and notifies the user of the recognition result or the content with display and / or voice in step S2239. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2233.
 なお、図22Aにおいては、学習対象物認識サーバ220がリンク先を自動アクセスする手順を示したが、学習対象物認識サーバ220がリンク先を通信端末210に返信し、通信端末210におけるリンク指示を待つ構成であってもよい。 22A shows the procedure in which the learning object recognition server 220 automatically accesses the link destination. However, the learning object recognition server 220 returns the link destination to the communication terminal 210, and gives a link instruction in the communication terminal 210. It may be configured to wait.
 (曲認識)
 図22Bは、本実施形態に係る情報処理システムにおける曲認識の動作手順を示すシーケンス図である。なお、第2実施形態の図4と同様の動作手順には同じステップ番号を付して、説明は省略する。
(Song recognition)
FIG. 22B is a sequence diagram showing an operation procedure of music recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.
 ステップS400およびS401においては、アプリケーションやデータの相違の可能性はあるが、図4と同様にダウンロードおよび起動と初期化が行なわれる。 In steps S400 and S401, although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
 まず、ステップS2243においては、通信端末210による楽譜の表紙またはページの撮影が行なわれる。なお、見開き2ページであっても、1ページであっても、ページの一部であってもよい。 First, in step S2243, the cover or page of the score is photographed by the communication terminal 210. Note that the page may be two pages, one page, or a part of the page.
 学習対象物認識サーバ220は、通信端末210からの局所特徴量を受信して、ステップS2251において、局所特徴量DB221を参照して、曲の認識を行なう。次に、ステップS2253において、演奏情報DB2224を参照して、認識した曲に対応する曲演奏データである演奏情報を取得する。そして、ステップS2255において、学習対象物認識サーバ220から通信端末210に曲音声データを送信する。 The learning object recognition server 220 receives the local feature amount from the communication terminal 210, and in step S2251, refers to the local feature amount DB 221 to perform song recognition. Next, in step S2253, the performance information which is music performance data corresponding to the recognized music is acquired with reference to the performance information DB 2224. In step S <b> 2255, the song audio data is transmitted from the learning object recognition server 220 to the communication terminal 210.
 通信端末210は、学習対象物認識サーバ220から曲演奏データを受信して、ステップS2257において、曲を再生してユーザに報知する。 The communication terminal 210 receives the music performance data from the learning object recognition server 220, reproduces the music, and notifies the user in step S2257.
 また、演奏情報をリンク先の関連情報提供サーバ230から取得する場合には、ステップS2261において、リンク情報DB223を参照して、認識した曲に対応するリンク先を取得する。そして、ステップS2263において、学習対象物認識サーバ220からリンク先にアクセスする。 Further, when the performance information is acquired from the linked related information providing server 230, in step S2261, the link destination corresponding to the recognized music is acquired with reference to the link information DB 223. In step S 2263, the link destination is accessed from the learning object recognition server 220.
 リンク先の関連情報提供サーバ230は、ステップS2265において演奏情報DB2225を参照して、曲に対応する演奏情報を取得する。そして、ステップS2267において、関連情報提供サーバ230から通信端末210に曲演奏データを送信する。 The linked related information providing server 230 refers to the performance information DB 2225 in step S2265 and acquires performance information corresponding to the song. In step S <b> 2267, the music performance data is transmitted from the related information providing server 230 to the communication terminal 210.
 通信端末210は、関連情報提供サーバ230から曲演奏データを受信して、ステップS2269において、曲を再生してユーザに報知する。ここで、通信端末210のアドレスは、ステップS2263におけるアクセスで学習対象物認識サーバ220から得るものとする。 The communication terminal 210 receives the music performance data from the related information providing server 230, and reproduces the music and notifies the user in step S2269. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2263.
 なお、図22Aと同様に、図22Bにおいても、学習対象物認識サーバ220がリンク先を通信端末210に返信し、通信端末210におけるリンク指示を待つ構成であってもよい。 22A, the learning object recognition server 220 may return the link destination to the communication terminal 210 and wait for a link instruction in the communication terminal 210, as in FIG. 22B.
 (音認識)
 図22Cは、本実施形態に係る情報処理システムにおける音認識の動作手順を示すシーケンス図である。なお、第2実施形態の図4と同様の動作手順には同じステップ番号を付して、説明は省略する。
(Sound recognition)
FIG. 22C is a sequence diagram showing an operation procedure of sound recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.
 ステップS400およびS401においては、アプリケーションやデータの相違の可能性はあるが、図4と同様にダウンロードおよび起動と初期化が行なわれる。 In steps S400 and S401, although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
 まず、ステップS2273においては、通信端末210による楽譜中の音符や小節の撮影が行なわれる。 First, in step S2273, the communication terminal 210 shoots notes and bars in the score.
 学習対象物認識サーバ220は、通信端末210からの局所特徴量を受信して、ステップS2281において、局所特徴量DB221を参照して音符や小節の認識を行なう。次に、ステップS2283において、音情報DB2226を参照して、認識した音符や小節に対応する音や音列を取得する。そして、ステップS2285において、学習対象物認識サーバ220から通信端末210に音や音列の音データを送信する。 The learning object recognition server 220 receives the local feature amount from the communication terminal 210, and recognizes a note or a measure with reference to the local feature amount DB 221 in step S2281. Next, in step S2283, the sound information DB 2226 is referred to, and a sound or a sound string corresponding to the recognized note or measure is acquired. In step S2285, the learning object recognition server 220 transmits sound and sound string data to the communication terminal 210.
 通信端末210は、学習対象物認識サーバ220から音データを受信して、ステップS2287において、音の再生によりユーザに報知する。 The communication terminal 210 receives the sound data from the learning object recognition server 220 and notifies the user by reproducing the sound in step S2287.
 また、音情報をリンク先の関連情報提供サーバ230から取得する場合には、ステップS2291において、リンク情報DB223を参照して、認識した音や音列に対応するリンク先を取得する。そして、ステップS2293において、学習対象物認識サーバ220からリンク先にアクセスする。 Further, when the sound information is acquired from the linked related information providing server 230, in step S2291, the link information corresponding to the recognized sound or sound string is acquired with reference to the link information DB 223. In step S2293, the learning target object recognition server 220 accesses the link destination.
 リンク先の関連情報提供サーバ230は、ステップS2295において、音情報DB2227を参照して、音に対応する音データを取得する。そして、ステップS2297において、関連情報提供サーバ230から通信端末210に音データを送信する。 In step S2295, the linked related information providing server 230 refers to the sound information DB 2227 and acquires sound data corresponding to the sound. In step S2297, the related information providing server 230 transmits sound data to the communication terminal 210.
 通信端末210は、関連情報提供サーバ230から音データを受信して、ステップS2299において、音データの再生によりユーザに報知する。ここで、通信端末210のアドレスは、ステップS2293におけるアクセスで学習対象物認識サーバ220から得るものとする。 The communication terminal 210 receives the sound data from the related information providing server 230, and notifies the user by reproducing the sound data in step S2299. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2293.
 なお、図22Aおよび図22Bと同様に、図22Cにおいても、学習対象物認識サーバ220がリンク先を通信端末210に返信し、通信端末210におけるリンク指示を待つ構成であってもよい。 22A and 22B, the learning object recognition server 220 may return the link destination to the communication terminal 210 and wait for a link instruction in the communication terminal 210, as in FIG. 22C.
 《データベース》
 次に、上記図22A乃至図22Cの動作に使用されるそれぞれのデータベースの構成を、図23A乃至図23Cを参照して説明する。なお、これらのデータベースは、局所特徴量DB221と別個に説明するが、局所特徴量に対応付けて一体に設けてもよい。
《Database》
Next, the configuration of each database used in the operations of FIGS. 22A to 22C will be described with reference to FIGS. 23A to 23C. These databases will be described separately from the local feature DB 221, but may be provided integrally with the local feature.
 (内容紹介DB)
 図23Aは、本実施形態に係る内容紹介DB2222または2223の構成を示す図である。なお、内容紹介DB2222と内容紹介DB2223とは基本的に同ような構成であるが、記憶容量を考慮すると、内容紹介DB2222よりも内容紹介DB2223の方が、より詳しい内容あるいはより多くの項目を用意できる。
(Content introduction DB)
FIG. 23A is a diagram showing a configuration of the content introduction DB 2222 or 2223 according to the present embodiment. The content introduction DB 2222 and the content introduction DB 2223 have basically the same configuration, but considering the storage capacity, the content introduction DB 2223 provides more detailed content or more items than the content introduction DB 2222. it can.
 内容紹介DB2222または2223は、CD/DVD/レコードジャケットID2311に対応付けて、演奏者/歌手2312、録音場所2313、録音日/発売日2314、表示データと音声データとを含む内容紹介情報2315を記憶する。CD/DVD/レコードジャケットID2311は、コンサートIDであってもよい。各CD/DVD/レコードジャケットID2311には、複数の曲ID2316と曲の紹介2317とが含まれる。なお、全てを内容紹介情報として記憶してもよい。 The content introduction DB 2222 or 2223 stores content introduction information 2315 including a performer / singer 2312, a recording location 2313, a recording date / release date 2314, display data and audio data in association with the CD / DVD / record jacket ID 2311. To do. The CD / DVD / record jacket ID 2311 may be a concert ID. Each CD / DVD / record jacket ID 2311 includes a plurality of song IDs 2316 and a song introduction 2317. All may be stored as content introduction information.
 (演奏情報DB)
 図23Bは、本実施形態に係る演奏情報DB2224または2225の構成を示す図である。なお、演奏情報DB2224と演奏情報DB2225とは基本的に同ような構成であるが、記憶容量を考慮すると、演奏情報DB2224よりも演奏情報DB2225の方が、より詳しい内容あるいはより多くの項目を用意できる。
(Performance information DB)
FIG. 23B is a diagram showing a configuration of the performance information DB 2224 or 2225 according to the present embodiment. The performance information DB 2224 and the performance information DB 2225 have basically the same configuration, but considering the storage capacity, the performance information DB 2225 provides more detailed contents or more items than the performance information DB 2224. it can.
 演奏情報DB2224または2225は、曲ID2321に対応付けて、曲名2322、第1演奏者による第1曲再生デオタ2323、第2演奏者による第2曲再生データ2324を記憶する。なお、演奏者は、指揮者や歌手に代替できる。 The performance information DB 2224 or 2225 stores a song name 2322, a first song reproduction data 2323 by the first player, and second song reproduction data 2324 by the second player in association with the song ID 2321. The performer can be replaced by a conductor or singer.
 (音情報DB)
 図23Cは、本実施形態に係る音情報DB2226または2227の構成を示す図である。なお、音情報DB2226と音情報DB2227とは基本的に同ような構成であるが、記憶容量を考慮すると、音情報DB2226よりも音情報DB2227の方が、より詳しい内容あるいはより多くの項目を用意できる。
(Sound information DB)
FIG. 23C is a diagram showing a configuration of the sound information DB 2226 or 2227 according to the present embodiment. The sound information DB 2226 and the sound information DB 2227 have basically the same configuration, but considering the storage capacity, the sound information DB 2227 provides more detailed contents or more items than the sound information DB 2226. it can.
 音情報DB2226または2227は、小節単位に再生データを記憶する小節DB2330と音単位に再生データを記憶する音DB2340とを含む。小節DB2330は、小節ID2331に対応付けて、その小節を含む曲名(あるいは曲ID)2332と小節再生データ2333とを記憶する。一方、音DB2340は、音ID2341に対応付けて、音名/階名2342、ビアノによる第1音再生データ2343、バイオリンによる第2音再生データ2344、フルートによる第3音再生データ2345を記憶する。なお、楽器の種別は本例に限定されない。歌手の声であってもよい。 The sound information DB 2226 or 2227 includes a measure DB 2330 that stores reproduction data in units of measures and a sound DB 2340 that stores reproduction data in units of sounds. The measure DB 2330 stores a song name (or song ID) 2332 including the measure and measure reproduction data 2333 in association with the measure ID 2331. On the other hand, the sound DB 2340 stores a sound name / floor name 2342, first sound reproduction data 2343 by Viano, second sound reproduction data 2344 by violin, and third sound reproduction data 2345 by flute in association with the sound ID 2341. The type of musical instrument is not limited to this example. It may be the voice of a singer.
 [第6実施形態]
 次に、本発明の第6実施形態に係る情報処理システムについて説明する。本実施形態に係る情報処理システムは、上記第2実施形態および第3実施形態を展示物に適用したものである。その他の構成および動作は、第2実施形態あるいは第3実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。なお、展示物は、博物館や民族館などの資料や、美術館の絵画や彫刻、あるいは博覧会や展示会の展示物を含む。
[Sixth Embodiment]
Next, an information processing system according to the sixth embodiment of the present invention will be described. The information processing system according to the present embodiment is an application of the second and third embodiments to an exhibit. Since other configurations and operations are the same as those in the second embodiment or the third embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted. Exhibits include materials from museums and folk museums, paintings and sculptures from museums, and exhibitions from exhibitions and exhibitions.
 本実施形態によれば、展示物を認識し、その関連情報の学習ができる。 According to the present embodiment, the exhibit can be recognized and related information can be learned.
 《情報処理システムの動作手順》
 図24Aは、本実施形態に係る情報処理システムにおける展示物認識の動作手順を示すシーケンス図である。なお、第2実施形態の図4と同様の動作手順には同じステップ番号を付して、説明は省略する。
<< Operation procedure of information processing system >>
FIG. 24A is a sequence diagram illustrating an operation procedure for exhibit recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.
 ステップS400およびS401においては、アプリケーションやデータの相違の可能性はあるが、図4と同様にダウンロードおよび起動と初期化が行なわれる。 In steps S400 and S401, although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
 まず、ステップS2413においては、通信端末210による展示物の撮影が行なわれる。 First, in step S 2413, the exhibit is photographed by the communication terminal 210.
 学習対象物認識サーバ220は、通信端末210からの局所特徴量を受信して、ステップS2421において、局所特徴量DB221を参照して、展示物の認識を行なう。展示物名などの応答であれば、ステップS2423において、学習対象物認識サーバ220から通信端末210に認識結果を送信する。展示物の内容を表示あるいは音声で紹介する場合には、ステップS22425において、内容紹介DB2422を参照して、認識した展示物に対応する内容紹介データを取得する。そして、ステップS2227において、学習対象物認識サーバ220から通信端末210に認識結果および内容紹介データを送信する。 The learning object recognition server 220 receives the local feature amount from the communication terminal 210, and recognizes the exhibit with reference to the local feature amount DB 221 in step S2421. If it is a response such as an exhibit name, the recognition result is transmitted from the learning object recognition server 220 to the communication terminal 210 in step S2423. When the contents of the exhibit are introduced by display or voice, in step S22425, the contents introduction DB 2422 is referred to and content introduction data corresponding to the recognized exhibit is acquired. In step S2227, the learning object recognition server 220 transmits the recognition result and the content introduction data to the communication terminal 210.
 通信端末210は、学習対象物認識サーバ220から認識結果または内容紹介を受信して、ステップS2229において、認識結果あるいは内容を表示および/または音声でユーザに報知する。 The communication terminal 210 receives the recognition result or the content introduction from the learning object recognition server 220, and notifies the user of the recognition result or the content by display and / or voice in step S2229.
 また、内容紹介をリンク先の関連情報提供サーバ230から取得する場合には、ステップS2231において、リンク情報DB223を参照して、認識した展示物に対応するリンク先を取得する。そして、ステップS2233において、学習対象物認識サーバ220からリンク先にアクセスする。 Further, when content introduction is acquired from the linked related information providing server 230, in step S2231, the link information DB 223 is referred to, and the link destination corresponding to the recognized exhibit is acquired. In step S2233, the learning target object recognition server 220 accesses the link destination.
 リンク先の関連情報提供サーバ230は、ステップS2235において、内容紹介DB2023を参照して、展示物に対応する内容紹介データを取得する。そして、ステップS2237において、関連情報提供サーバ230から通信端末210に認識結果および内容紹介データを送信する。 In step S2235, the linked related information providing server 230 refers to the content introduction DB 2023 and acquires content introduction data corresponding to the exhibit. In step S2237, the related information providing server 230 transmits the recognition result and the content introduction data to the communication terminal 210.
 通信端末210は、関連情報提供サーバ230から認識結果または内容紹介を受信して、ステップS2239において、認識結果あるいは内容を表示および/または音声でユーザに報知する。ここで、通信端末210のアドレスは、ステップS2233におけるアクセスで学習対象物認識サーバ220から得るものとする。 The communication terminal 210 receives the recognition result or the content introduction from the related information providing server 230, and notifies the user of the recognition result or the content with display and / or voice in step S2239. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2233.
 なお、図22Aにおいては、学習対象物認識サーバ220がリンク先を自動アクセスする手順を示したが、学習対象物認識サーバ220がリンク先を通信端末210に返信し、通信端末210におけるリンク指示を待つ構成であってもよい。 22A shows the procedure in which the learning object recognition server 220 automatically accesses the link destination. However, the learning object recognition server 220 returns the link destination to the communication terminal 210, and gives a link instruction in the communication terminal 210. It may be configured to wait.
 (内容紹介DB)
 図24Bは、本実施形態に係る内容紹介DB2422または2423の構成を示す図である。なお、内容紹介DB2422と内容紹介DB2423とは基本的に同ような構成であるが、記憶容量を考慮すると、内容紹介DB2422よりも内容紹介DB2423の方が、より詳しい内容あるいはより多くの項目を用意できる。また、内容紹介DB2422または242は、局所特徴量DB221と別個に説明するが、局所特徴量に対応付けて一体に設けてもよい。
(Content introduction DB)
FIG. 24B is a diagram showing a configuration of the content introduction DB 2422 or 2423 according to the present embodiment. The content introduction DB 2422 and the content introduction DB 2423 basically have the same configuration, but considering the storage capacity, the content introduction DB 2423 provides more detailed contents or more items than the content introduction DB 2422. it can. The content introduction DB 2422 or 242 will be described separately from the local feature DB 221, but may be provided integrally with the local feature.
 内容紹介DB2022または2023は、展示物ID2401に対応付けて、名称(作者、年代)2402、関連表示データ2403、関連音声データ2404を記憶する。 The content introduction DB 2022 or 2023 stores a name (author, age) 2402, related display data 2403, and related audio data 2404 in association with the exhibit ID 2401.
 [第7実施形態]
 次に、本発明の第7実施形態に係る情報処理システムについて説明する。本実施形態に係る情報処理システムは、上記第2実施形態および第3実施形態を数式に適用したものである。その他の構成および動作は、第2実施形態あるいは第3実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。
[Seventh Embodiment]
Next, an information processing system according to a seventh embodiment of the present invention will be described. The information processing system according to the present embodiment is obtained by applying the second embodiment and the third embodiment to mathematical expressions. Since other configurations and operations are the same as those in the second embodiment or the third embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
 本実施形態によれば、数式を認識し、その演算過程と演算結果の学習ができる。 According to this embodiment, it is possible to recognize mathematical formulas and learn the computation process and computation results.
 《情報処理システムの動作手順》
 図25は、本実施形態に係る情報処理システムにおける数式認識の動作手順を示すシーケンス図である。なお、第2実施形態の図4と同様の動作手順には同じステップ番号を付して、説明は省略する。
<< Operation procedure of information processing system >>
FIG. 25 is a sequence diagram showing an operation procedure of mathematical expression recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.
 ステップS400およびS401においては、アプリケーションやデータの相違の可能性はあるが、図4と同様にダウンロードおよび起動と初期化が行なわれる。 In steps S400 and S401, although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.
 まず、ステップS2503においては、通信端末210による数式の撮影が行なわれる。なお、本例では、数式で代表させるが、例えば、グラフ画像や直線/曲線の画像などの撮影でもよい。 First, in step S2503, the communication terminal 210 captures a mathematical expression. In this example, although represented by a mathematical expression, for example, photographing of a graph image or a straight / curved image may be used.
 学習対象物認識サーバ220は、通信端末210からの局所特徴量を受信して、ステップS2511において、局所特徴量DB221を参照して、数式などの認識を行なう。次に、ステップS2513において、数式DB2522を参照して、認識した数式に対応する変数を含む数式や演算例などを含む数式関連データを取得する。そして、ステップS2517において、学習対象物認識サーバ220から通信端末210に数式および演算例のデータを送信する。 The learning object recognition server 220 receives the local feature amount from the communication terminal 210, and in step S2511, refers to the local feature amount DB 221 and recognizes a mathematical formula or the like. Next, in step S2513, the formula DB 2522 is referred to, and formula related data including formulas and calculation examples including variables corresponding to the recognized formulas is acquired. In step S <b> 2517, the learning target object recognition server 220 transmits data of mathematical formulas and calculation examples to the communication terminal 210.
 通信端末210は、学習対象物認識サーバ220から数式および演算例のデータを受信して、ステップS2519において、数式および演算例のデータを表示によりユーザに報知する。ステップS2519においては、通信端末210において、ユーザが数式内の変数に値を入力したかを判定する。変数入力がなければ処理を終了する。 The communication terminal 210 receives the data of the mathematical formula and the calculation example from the learning object recognition server 220, and notifies the user of the data of the mathematical formula and the calculation example in step S2519. In step S2519, in communication terminal 210, it is determined whether the user has input a value to a variable in the mathematical expression. If there is no variable input, the process ends.
 一方、変数値の入力があればステップS2521に進んで、受信した数式に変数を代入して数式の演算を行なう。そして、ステップS2523において、演算結果を表示する。また、必要であれば、ステップS2525において演算結果を学習対象物認識サーバ220に送信する。 On the other hand, if there is an input of a variable value, the process proceeds to step S2521, where the variable is substituted into the received mathematical expression to calculate the mathematical expression. In step S2523, the calculation result is displayed. If necessary, the calculation result is transmitted to the learning object recognition server 220 in step S2525.
 学習対象物認識サーバ220では、ステップS2527において、通信端末210からの演算結果を蓄積して、情報収集などに使用することができる。 In the learning object recognition server 220, the calculation result from the communication terminal 210 can be accumulated in step S2527 and used for information collection or the like.
 また、数式および演算例のデータをリンク先の関連情報提供サーバ230から取得する場合には、ステップS2531において、リンク情報DB223を参照して、認識した数式などに対応するリンク先を取得する。そして、ステップS2533において、学習対象物認識サーバ220からリンク先にアクセスする。 Further, when the mathematical formula and the data of the calculation example are acquired from the related information providing server 230 of the link destination, the link destination corresponding to the recognized mathematical formula or the like is acquired with reference to the link information DB 223 in step S2531. In step S2533, the link destination is accessed from the learning object recognition server 220.
 リンク先の関連情報提供サーバ230は、ステップS2535において、数式DB2523を参照して、数式関連データを取得する。そして、ステップS2537において、関連情報提供サーバ230から通信端末210に数式関連データを送信する。 In step S2535, the linked related information providing server 230 refers to the formula DB 2523 and acquires formula related data. In step S2537, the mathematical formula related data is transmitted from the related information providing server 230 to the communication terminal 210.
 通信端末210は、関連情報提供サーバ230から数式関連データ介を受信して、ステップS2539において、数式および演算例のデータを表示によりユーザに報知する。ここで、通信端末210のアドレスは、ステップS2533におけるアクセスで学習対象物認識サーバ220から得るものとする。 The communication terminal 210 receives the mathematical expression related data from the related information providing server 230, and notifies the user of the mathematical expression and the data of the calculation example in step S2539. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2533.
 なお、図25においては、学習対象物認識サーバ220がリンク先を自動アクセスする手順を示したが、学習対象物認識サーバ220がリンク先を通信端末210に返信し、通信端末210におけるリンク指示を待つ構成であってもよい。 25 shows a procedure in which the learning object recognition server 220 automatically accesses the link destination. However, the learning object recognition server 220 returns the link destination to the communication terminal 210, and issues a link instruction in the communication terminal 210. It may be configured to wait.
 (数式DB)
 図26Aは、本実施形態に係る数式DB2522または2523の構成を示す図である。なお、数式DB2522と数式DB2523とは基本的に同ような構成であるが、記憶容量を考慮すると、数式DB2522よりも数式DB2523の方が、より詳しい内容あるいはより多くの項目を用意できる。また、数式DB2522は、局所特徴量DB221と別個に説明するが、局所特徴量に対応付けて一体に設けてもよい。
(Formula DB)
FIG. 26A is a diagram showing a configuration of the formula DB 2522 or 2523 according to the present embodiment. Note that the formula DB 2522 and the formula DB 2523 have basically the same configuration, but considering the storage capacity, the formula DB 2523 can provide more detailed contents or more items than the formula DB 2522. The mathematical formula DB 2522 will be described separately from the local feature value DB 221, but may be provided integrally with the local feature value.
 数式DB2522または2523は、数式ID2611に対応付けて、数式名称2612、数式を記号で表わす数式データ2613、数式に使用される変数2614、数式中の定数2615を記憶する。 The formula DB 2522 or 2523 stores a formula name 2612, formula data 2613 that represents a formula with a symbol, a variable 2614 used in the formula, and a constant 2615 in the formula in association with the formula ID 2611.
 (演算パラメータテーブル)
 図26Bは、本実施形態に係る演算パラメータテーブル2600の構成を示す図である。なお、演算パラメータテーブル2600は、数式に変数や定数を代入して演算を実行する場合に、通信端末やサーバのRAM内に作成されるテーブルである。
(Calculation parameter table)
FIG. 26B is a diagram showing a configuration of a calculation parameter table 2600 according to the present embodiment. Note that the calculation parameter table 2600 is a table created in the RAM of the communication terminal or the server when a calculation is executed by substituting variables or constants into mathematical expressions.
 演算パラメータテーブル2600は、数式ID2621に対応付けて、数式で使用される各々の変数値2622、数式で使用される各々の定数値2623、変数値2622および定数値2623を使用した演算結果値2624を記憶する。 In the calculation parameter table 2600, each variable value 2622 used in the formula, each constant value 2623 used in the formula, each variable value 2622, and the calculation result value 2624 using the constant value 2623 are associated with the formula ID 2621. Remember.
 [第8実施形態]
 次に、本発明の第8実施形態に係る情報処理システムについて説明する。本実施形態に係る情報処理システムは、上記第2実施形態乃至第7実施形態と比べると、ユーザが捜索したい学習対象物を登録すれば、その学習対象物を捜索してユーザに報知する点で異なる。その他の構成および動作は、第2実施形態乃至第7実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。
[Eighth Embodiment]
Next, an information processing system according to an eighth embodiment of the present invention will be described. Compared with the second to seventh embodiments, the information processing system according to the present embodiment searches for the learning object and notifies the user if the learning object to be searched is registered. Different. Since other configurations and operations are the same as those of the second to seventh embodiments, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
 本実施形態によれば、ユーザがリアルタイムに所望の学習対象物を捜索できる。 According to this embodiment, the user can search for a desired learning object in real time.
 《情報処理システムにおける通信端末の表示画面例》
 図27Aおよび図27Bは、本実施形態に係る情報処理システムにおける通信端末の表示画面例を示す図である。
<< Display screen example of communication terminal in information processing system >>
27A and 27B are diagrams illustrating examples of display screens of the communication terminal in the information processing system according to the present embodiment.
 (作品捜索)
 図27Aの上段は、子どもが作った作品を展示場所や置き場所から探す例を示した図である。
(Work search)
The upper part of FIG. 27A is a diagram showing an example of searching for a work created by a child from an exhibition place or a placement place.
 まず、左図のように、通信端末2710により、子どもの作った作品の写真などを撮影して、映像2720を取得する。この作品の映像2720に基づいて局所特徴量を生成して、通信端末2710の局所特徴量DBに登録する。 First, as shown in the left figure, the communication terminal 2710 takes a picture of a work made by a child and obtains a video 2720. A local feature amount is generated based on the video 2720 of this work, and is registered in the local feature amount DB of the communication terminal 2710.
 次に、右図にように、通信端末2710により、作品の展示場所や置き場所の映像2730を撮像する。通信端末2710は、この映像2730に基づいて局所特徴量を生成する。そして、先ほど登録した作品の局所特徴量と、作品の展示場所や置き場所の映像2730の局所特徴量とを照合して、子どもの作った作品の位置を認識する。認識した位置の作品には、捜索結果として“ここにありますよ”などのコメント2731が重畳表示される。 Next, as shown in the right figure, the communication terminal 2710 captures an image 2730 of the exhibition place and place of the work. The communication terminal 2710 generates a local feature amount based on the video 2730. Then, the position of the work created by the child is recognized by comparing the local feature amount of the previously registered work with the local feature value of the image 2730 of the exhibition place and placement place of the work. A comment 2731 such as “I am here” is superimposed on the work at the recognized position as a search result.
 図27Aのように、例え、作品が写真などと同じ向きや同じサイズでなくても、次元選定された局所特徴量によりリアルタイムに子どもの作品の位置を方位置できる。 As shown in FIG. 27A, for example, even if the work is not in the same direction or the same size as the photograph, the position of the child's work can be located in real time according to the dimension-selected local features.
 (子ども捜索)
 図27Aの下段は、学芸会や演奏会などに参加した子どもがどこにいるかを探す例を示した図である。
(Child search)
The lower part of FIG. 27A is a diagram showing an example of finding where a child who participated in a school performance or a concert is.
 まず、左図のように、通信端末2710により、子どもの写真などを撮影して、映像2740を取得する。この映像2740に基づいて局所特徴量を生成して、通信端末2710の局所特徴量DBに登録する。 First, as shown in the left figure, a picture of a child is taken by the communication terminal 2710 to obtain a video 2740. A local feature value is generated based on the video 2740 and registered in the local feature value DB of the communication terminal 2710.
 次に、右図にように、通信端末2710により、学芸会や演奏会の映像2750を撮像する。通信端末2710は、この映像2750に基づいて局所特徴量を生成する。そして、先ほど登録した子どもの写真からの局所特徴量と、学芸会や演奏会の映像2750の局所特徴量とを照合して、子どもの位置を認識する。認識した位置の子どもには、捜索結果として“ここに居ますよ”などのコメント2751が重畳表示される。 Next, as shown in the right figure, the communication terminal 2710 captures an image 2750 of a school performance or performance. The communication terminal 2710 generates a local feature amount based on the video 2750. Then, the position of the child is recognized by comparing the local feature amount from the previously registered child's photograph with the local feature amount of the video 2750 of the school performance or performance. A comment 2751 such as “I am here” is superimposed and displayed as a search result for the child at the recognized position.
 図27Aのように、例え、子供の写真などと同じ向きや同じサイズでなくても、あるいは服や姿勢などの違いがあっても、次元選定された局所特徴量によりリアルタイムに子どもの位置を方位置できる。 As shown in Fig. 27A, even if it is not the same orientation and size as the picture of the child, or even if there is a difference in clothes or posture, the position of the child is determined in real time according to the dimension-selected local features. Can be located.
 (対象物へのズームイン)
 図27Bは、図27Aの下段の処理をさらに改善した処理である。
(Zoom in on the object)
FIG. 27B is a process in which the process in the lower part of FIG. 27A is further improved.
 図27Bの左図および中央図は、図27Aの下段の左図および右図と同じである。図27Bにおいては、子どもの位置が判明したので、通信端末2710が子どもの位置にズームインすることにより、子どもの拡大映像を取得することができる。 The left diagram and the central diagram in FIG. 27B are the same as the left diagram and the right diagram in the lower part of FIG. 27A. In FIG. 27B, since the position of the child is found, the communication terminal 2710 zooms in on the position of the child, so that an enlarged image of the child can be acquired.
 《通信端末の機能構成》
 図28は、本実施形態に係る通信端末2710の機能構成を示すブロック図である。なお、図28において、第2実施形態の図6と同様の機能構成部には同じ参照番号を付して、説明を省略する。
<Functional configuration of communication terminal>
FIG. 28 is a block diagram showing a functional configuration of a communication terminal 2710 according to this embodiment. In FIG. 28, the same functional components as those in FIG. 6 of the second embodiment are denoted by the same reference numerals, and description thereof is omitted.
 登録/捜索判定部2801は、通信端末2710が撮像部601で撮像した映像が、局所特徴量DB2821に捜索対象物として登録する映像か、捜索物を捜索するための映像かを判定する。かかる登録/捜索判定部2801の判定は、ユーザによる操作であってもよいし、映像中の物の映像画面における面積比率などにより自動的に判定してもよい。例えば、捜索対象物の登録時には、捜索対象物を画面全体で撮像するので所定閾値以上の面積比率を登録画面とする。 The registration / search determination unit 2801 determines whether the video captured by the communication terminal 2710 by the imaging unit 601 is a video registered as a search target in the local feature DB 2821 or a video for searching a search object. The determination by the registration / search determination unit 2801 may be an operation by a user, or may be automatically determined based on an area ratio of an object in the image on the image screen. For example, when a search object is registered, the search object is imaged on the entire screen, so an area ratio equal to or greater than a predetermined threshold is set as the registration screen.
 局所特徴量登録部2802は、捜索物の登録であると判断した場合に、局所特徴量生成部602が生成した局所特徴量を局所特徴量DB2821に登録する。一方、登録された捜索物の捜索と判断した場合は、捜索物認識部2803において、局所特徴量生成部602が生成した局所特徴量と、局所特徴量DB2821に登録した捜索物の局所特徴量とが照合される。 When the local feature amount registration unit 2802 determines that the search object is registered, the local feature amount registration unit 2802 registers the local feature amount generated by the local feature amount generation unit 602 in the local feature amount DB 2821. On the other hand, if it is determined that the search is for a registered search object, the search object recognition unit 2803 generates the local feature value generated by the local feature value generation unit 602 and the local feature value of the search object registered in the local feature value DB 2821. Are matched.
 照合すれば、捜索物DB2822を参照して、捜索物発見情報報知部2804が、捜索物に対応した捜索物発見情報を報知する。また、ズーム制御部2805は、捜索物を拡大撮像するために、捜索物の位置にズームインするように撮像部601を制御する。 If it collates, with reference to search object DB2822, the search object discovery information alerting | reporting part 2804 will alert | report the search object discovery information corresponding to a search object. In addition, the zoom control unit 2805 controls the imaging unit 601 to zoom in on the position of the search object in order to enlarge the image of the search object.
 なお、局所特徴量DB2821の構成は、新たな捜索物の局所特徴量が登録されることを除いて、第2実施形態の図8に示した構成と同様であるので、説明を省略する。また、捜索物DB2822は、捜索物についてのユーザが入力した情報を蓄積しておくものであって、必須ではなく局所特徴量DB2821内に設けてもよい。 Note that the configuration of the local feature DB 2821 is the same as the configuration shown in FIG. 8 of the second embodiment except that a new local feature of a search object is registered, and thus the description thereof is omitted. The search object DB 2822 stores information input by the user regarding the search object, and may be provided in the local feature DB 2821 instead of being essential.
 《通信端末の処理手順》
 図29は、本実施形態に係る通信端末の処理手順を示すフローチャートである。このフローチャートも、図12AのCPU1210がRAM1240を使用して実行し、図28の機能構成部を実現する。なお、図29中、局所特徴量生成処理は、図14と同様であるので、同じステップ番号S1313を付して、説明は書略する。
<< Processing procedure of communication terminal >>
FIG. 29 is a flowchart showing a processing procedure of the communication terminal according to the present embodiment. This flowchart is also executed by the CPU 1210 of FIG. 12A using the RAM 1240, and implements the functional configuration unit of FIG. In FIG. 29, the local feature quantity generation processing is the same as that in FIG. 14, and therefore the same step number S1313 is assigned and description thereof is omitted.
 ます、ステップS2911において、捜索物(図27Aの作品や子ども)の登録処理が否かを判定する。また、ステップS2921において、捜索物の捜索処理か否かを判定する。いずれでもない場合は、ステップS2941において他の処理を実行する。 In step S2911, it is determined whether or not the search object (the work or the child in FIG. 27A) is registered. Further, in step S2921, it is determined whether or not search object search processing is performed. If it is neither, other processing is executed in step S2941.
 登録処理であればステップS2913に進んで、捜索物の画像を取得する。ステップS1313において、取得した捜索物の画像の局所特徴量を生成する。そして、ステップS2917において、生成した局所特徴量を捜索物と対応付けて、局所特徴量DB2821に登録する。同時に、捜索物DB2822に、必要な捜索物の情報を登録する。 If it is a registration process, it will progress to step S2913 and will acquire the image of a search thing. In step S1313, a local feature amount of the acquired search object image is generated. In step S2917, the generated local feature value is associated with the search object and registered in the local feature value DB 2821. At the same time, necessary search object information is registered in the search object DB 2822.
 捜索処理であればステップS2923に進んで、捜索物を捜索する領域の映像を取得する。ステップS1313において、取得した映像の局所特徴量を生成する。次に、ステップS2927において、映像の局所特徴量の少なくとも一部に捜索物の局所特徴量が合致するかを照合して、捜索物の認識を行なう。捜索物が見付からなければ、ステップS2929からステップS2923に戻って、別の映像を取得して(実際には、通信端末2710の撮像方向/領域を変える)、捜索物の捜索を繰り返す。 If it is a search process, it will progress to step S2923 and the image | video of the area | region which searches a search thing will be acquired. In step S1313, a local feature amount of the acquired video is generated. Next, in step S2927, whether the local feature amount of the search object matches at least a part of the local feature amount of the video is collated, and the search object is recognized. If the search object is not found, the process returns from step S2929 to step S2923 to acquire another video (actually, the imaging direction / area of the communication terminal 2710 is changed), and the search for the search object is repeated.
 捜索物があればステップS2931に進んで、ズーム処理をするか否かが判定される。かかる判定は、ユーザによる設定であってもよい。ズーム処理をする場合は、ステップS2933において、ズームインした捜索物の拡大映像を取得する。ズーム処理があってもなくても、ステップS2935において、捜索物の位置に捜索物の存在を指示するコメントを表示する(図27A参照)。 If there is a search object, the process advances to step S2931 to determine whether or not to perform zoom processing. Such determination may be set by the user. When performing zoom processing, in step S2933, an enlarged image of the search object zoomed in is acquired. Whether or not there is a zoom process, in step S2935, a comment indicating the presence of the search object is displayed at the position of the search object (see FIG. 27A).
 [第9実施形態]
 次に、本発明の第9実施形態に係る情報処理システムについて説明する。本実施形態に係る情報処理システムは、上記第1実施形態乃至第8実施形態と比べると、通信端末が学習対象物認識を含む全ての処理を行なう点で異なる。その他の構成および動作は、第2実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。
[Ninth Embodiment]
Next, an information processing system according to the ninth embodiment of the present invention will be described. The information processing system according to the present embodiment is different from the first to eighth embodiments in that the communication terminal performs all processes including learning object recognition. Since other configurations and operations are the same as those of the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.
 本実施形態によれば、映像中の画像内の局所特徴量に基づいて、通信端末のみで全ての処理を行なうことができる。 According to the present embodiment, all processing can be performed only by the communication terminal based on the local feature amount in the image in the video.
 《通信端末の機能構成》
 図30は、本実施形態に係る通信端末3010の機能構成を示すブロック図である。なお、図30において、第2実施計やイの図6と同様の機能構成部については同じ参照番号を付して、説明を省略する。
<Functional configuration of communication terminal>
FIG. 30 is a block diagram illustrating a functional configuration of the communication terminal 3010 according to the present embodiment. In FIG. 30, the same reference numerals are assigned to the same functional components as those in FIG.
 学習対象物認識部3003は、局所特徴量生成部602の生成した局所特徴量と、局所特徴量DB3021に格納された局書特徴量を照合して、学習対象物を認識する。そして、学習対象認識結果報知部3004から認識結果を報知する。なお、学習対象物認識部3003および局所特徴量DB3021は、共の、学習対象物認識サーバ220が有した機能構成部を通信端末3010に配置したものであり、その機能は同様であるので説明は省略する。また、学習対象認識結果報知部3004も、図6の表示画面生成部606と音声生成部608を含む処理を報知情報に基づいて示したものであり、その処理は同様であるので、説明は省略する。 The learning target object recognition unit 3003 recognizes the learning target object by collating the local feature value generated by the local feature value generation unit 602 with the local feature value stored in the local feature value DB 3021. Then, the recognition result is notified from the learning target recognition result notification unit 3004. Note that the learning object recognition unit 3003 and the local feature DB 3021 are obtained by arranging the functional components included in the common learning object recognition server 220 in the communication terminal 3010, and the functions thereof are the same. Omitted. The learning target recognition result notification unit 3004 also shows the processing including the display screen generation unit 606 and the voice generation unit 608 in FIG. 6 based on the notification information, and the processing is the same, so the description is omitted. To do.
 関連情報取得部3005は、認識した学習対象物に対応して、関連情報DB3022から関連情報を取得する。また、関連情報報知部3006は、関連情報をユーザに報知する。 リンク情報取得部3007は、認識した学習対象物に対応して、リンク情報DB3023からリンク情報を取得する。また、リンク情報報知部3008は、リンク情報をユーザに報知する。これらの機能構成部も、学習対象物認識サーバ220が有した機能構成部を通信端末3010に配置したものであり、その機能は同様であるので説明は省略する。 The related information acquisition unit 3005 acquires related information from the related information DB 3022 corresponding to the recognized learning object. Also, the related information notification unit 3006 notifies the user of related information. The link information acquisition unit 3007 acquires link information from the link information DB 3023 corresponding to the recognized learning object. Moreover, the link information alerting | reporting part 3008 alert | reports link information to a user. These functional components are also configured by arranging the functional components included in the learning object recognition server 220 in the communication terminal 3010, and the functions thereof are the same, and the description thereof will be omitted.
 リンク先アクセス部3009は、取得したリンク情報を使用してリンク先の関連情報提供サーバ230にアクセスする。 The link destination access unit 3009 accesses the link destination related information providing server 230 using the acquired link information.
 [他の実施形態]
 以上、実施形態を参照して本発明を説明したが、本発明は上記実施形態に限定されものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解し得る様々な変更をすることができる。また、それぞれの実施形態に含まれる別々の特徴を如何様に組み合わせたシステムまたは装置も、本発明の範疇に含まれる。
[Other Embodiments]
Although the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention. In addition, a system or an apparatus in which different features included in each embodiment are combined in any way is also included in the scope of the present invention.
 また、本発明は、複数の機器から構成されるシステムに適用されてもよいし、単体の装置に適用されてもよい。さらに、本発明は、実施形態の機能を実現する制御プログラムが、システムあるいは装置に直接あるいは遠隔から供給される場合にも適用可能である。したがって、本発明の機能をコンピュータで実現するために、コンピュータにインストールされる制御プログラム、あるいはその制御プログラムを格納した媒体、その制御プログラムをダウンロードさせるWWW(World Wide Web)サーバも、本発明の範疇に含まれる。 Further, the present invention may be applied to a system composed of a plurality of devices, or may be applied to a single device. Furthermore, the present invention can also be applied to a case where a control program that realizes the functions of the embodiments is supplied directly or remotely to a system or apparatus. Therefore, in order to realize the functions of the present invention with a computer, a control program installed in the computer, a medium storing the control program, and a WWW (World Wide Web) server that downloads the control program are also included in the scope of the present invention. include.
 この出願は、2012年1月30日に出願された日本出願特願2012-017385を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2012-017385 filed on January 30, 2012, the entire disclosure of which is incorporated herein.
 本実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 Some or all of this embodiment can be described as in the following supplementary notes, but is not limited to the following.
 (付記1)
 学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段と、
 撮像手段が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成手段と、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する学習対象物認識手段と、
 を備えることを特徴とする情報処理システム。
 (付記2)
 前記第1局所特徴量記憶手段は、複数の学習対象物にそれぞれ対応付けて各学習対象物の画像から生成した前記m個の第1局所特徴量を記憶し、
 前記学習対象物認識手段は、前記撮像手段が撮像した前記画像に含まれる複数の学習対象物を認識することを特徴とする付記1に記載の情報処理システム。
 (付記3)
 前記学習対象物認識手段の認識結果を報知する報知手段をさらに備えることを特徴とする付記1または2に記載の情報処理システム。
 (付記4)
 前記報知手段は、さらに、前記認識結果に関連する情報を報知することを特徴とする付記3に記載の情報処理システム。
 (付記5)
 前記報知手段は、さらに、前記認識結果に関連する情報を取得するためのリンク情報を報知することを特徴とする付記3または4に記載の情報処理システム。
 (付記6)
 前記報知手段は、前記認識結果に関連する情報をリンク情報にしたがって取得する関連情報取得手段を有し、
 リンク情報にしたがって取得した前記関連情報を報知することを特徴とする付記3に記載の情報処理システム。
 (付記7)
 前記第1局所特徴量記憶手段に、捜索する学習対象物の局所特徴量を登録する登録手段をさらに備え、
 前記報知手段は、前記学習対象物認識手段が認識した学習対象物を捜索結果として報知することを特徴とする付記3乃至6のいずれか1つに記載の情報処理システム。
 (付記8)
 前記学習対象物は、文字を含む学習対象物であって、
 前記報知手段は、前記学習対象物の内容を報知することを特徴とする付記3乃至6のいずれか1つに記載の情報処理システム。
 (付記9)
 前記学習対象物は、音に関連する学習対象物であって、
 前記報知手段は、前記学習対象物の内容を音の演奏で報知することを特徴とする付記3乃至6のいずれか1つに記載の情報処理システム。
 (付記10)
 前記学習対象物は、展示物であって、
 前記報知手段は、前記学習対象物の説明を報知することを特徴とする付記3乃至6のいずれか1つに記載の情報処理システム。
 (付記11)
 前記学習対象物は、数式を含む学習対象物であって、
 前記報知手段は、前記学習対象物の数式を演算して演算結果を報知することを特徴とする付記3乃至6のいずれか1つに記載の情報処理システム。
 (付記12)
 前記第1局所特徴量および前記第2局所特徴量は、画像から抽出した特徴点を含む局所領域を複数のサブ領域に分割し、前記複数のサブ領域内の勾配方向のヒストグラムからなる複数の次元の特徴ベクトルを生成することにより生成されることを特徴とする付記1乃至11のいずれか1つに記載の情報処理システム。
 (付記13)
 前記第1局所特徴量および前記第2局所特徴量は、前記生成した複数の次元の特徴ベクトルから、隣接するサブ領域間の相関がより大きな次元を選定することにより生成されることを特徴とする付記12に記載の情報処理システム。
 (付記14)
 前記特徴ベクトルの複数の次元は、前記特徴点の特徴に寄与する次元から順に、かつ、前記局所特徴量に対して求められる精度の向上に応じて第1次元から順に選択できるよう、所定の次元数ごとに前記局所領域をひと回りするよう配列することを特徴とする付記12または13に記載の情報処理システム。
 (付記15)
 前記第2局所特徴量生成手段は、前記学習対象物の相関に対応して、他の学習対象物とより低い前記相関を有する学習対象物については次元数のより少ない前記第2局所特徴量を生成することを特徴とする付記14に記載の情報処理システム。
 (付記16)
 前記第1局所特徴量記憶手段は、前記学習対象物の相関に対応して、他の学習対象物とより低い前記相関を有する学習対象物については次元数のより少ない前記第1局所特徴量を記憶することを特徴とする付記14または15に記載の情報処理システム。
 (付記17)
 学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた情報処理システムを利用した情報処理方法であって、
 撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する認識ステップと、
 を含むことを特徴とする情報処理方法。
 (付記18)
 撮像手段が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成手段と、
 前記m個の第2局所特徴量を、局所特徴量の照合に基づいて撮像した前記画像に含まれる学習対象物を認識する情報処理装置に送信する第1送信手段と、
 前記情報処理装置から、撮像した前記画像に含まれる学習対象物を示す情報を受信する第1受信手段と、
 を備えたことを特徴とする通信端末。
 (付記19)
 撮像手段が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
 前記m個の第2局所特徴量を、局所特徴量の照合に基づいて撮像した前記画像に含まれる学習対象物を認識する情報処理装置に送信する第1送信ステップと、
 前記情報処理装置から、撮像した前記画像に含まれる学習対象物を示す情報を受信する第1受信ステップと、
 を含むことを特徴とする通信端末の制御方法。
 (付記20)
 撮像手段が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
 前記m個の第2局所特徴量を、局所特徴量の照合に基づいて撮像した前記画像に含まれる学習対象物を認識する情報処理装置に送信する第1送信ステップと、
 前記情報処理装置から、撮像した前記画像に含まれる学習対象物を示す情報を受信する第1受信ステップと、
 をコンピュータに実行させることを特徴とする通信端末の制御プログラム。
 (付記21)
 学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段と、
 通信端末が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を、前記通信端末から受信する第2受信手段と、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する認識手段と、
 認識した前記学習対象物を示す情報を前記通信端末に送信する第2送信手段と、
 を備えることを特徴とする情報処理装置。
 (付記22)
 学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた情報処理装置の制御方法であって、
 通信端末が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を、前記通信端末から受信する第2受信ステップと、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する認識ステップと、
 認識した前記学習対象物を示す情報を前記通信端末に送信する第2送信ステップと、
 を含むことを特徴とする情報処理装置の制御方法。
 (付記23)
 学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた情報処理装置の制御プログラムであって、
 通信端末が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を、前記通信端末から受信する第2受信ステップと、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する認識ステップと、
 認識した前記学習対象物を示す情報を前記通信端末に送信する第2送信ステップと、
 をコンピュータに実行させることを特徴とする情報処理装置の制御プログラム。
(Appendix 1)
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. First local feature quantity storage means for storing the local feature quantity in association with each other;
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. Second local feature quantity generating means for generating the second local feature quantity of
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Learning object recognition means for recognizing that the learning object exists in
An information processing system comprising:
(Appendix 2)
The first local feature amount storage means stores the m first local feature amounts generated from the images of the learning objects in association with the plurality of learning objects, respectively.
The information processing system according to appendix 1, wherein the learning object recognition unit recognizes a plurality of learning objects included in the image captured by the imaging unit.
(Appendix 3)
The information processing system according to appendix 1 or 2, further comprising notification means for notifying a recognition result of the learning object recognition means.
(Appendix 4)
The information processing system according to supplementary note 3, wherein the notification unit further notifies information related to the recognition result.
(Appendix 5)
The information processing system according to appendix 3 or 4, wherein the notifying unit further notifies link information for acquiring information related to the recognition result.
(Appendix 6)
The notification means includes related information acquisition means for acquiring information related to the recognition result according to link information,
The information processing system according to appendix 3, wherein the related information acquired according to link information is notified.
(Appendix 7)
The first local feature quantity storage means further comprises a registration means for registering a local feature quantity of the learning object to be searched,
The information processing system according to any one of supplementary notes 3 to 6, wherein the notification means notifies the learning object recognized by the learning object recognition means as a search result.
(Appendix 8)
The learning object is a learning object including characters,
The information processing system according to any one of supplementary notes 3 to 6, wherein the notification means notifies the contents of the learning object.
(Appendix 9)
The learning object is a learning object related to sound,
The information processing system according to any one of appendices 3 to 6, wherein the notification unit notifies the contents of the learning object by playing a sound.
(Appendix 10)
The learning object is an exhibit,
The information processing system according to any one of supplementary notes 3 to 6, wherein the notification means notifies the description of the learning object.
(Appendix 11)
The learning object is a learning object including a mathematical formula,
The information processing system according to any one of appendices 3 to 6, wherein the notification unit calculates a mathematical expression of the learning object and notifies a calculation result.
(Appendix 12)
The first local feature amount and the second local feature amount are a plurality of dimensions formed by dividing a local region including a feature point extracted from an image into a plurality of sub-regions, and comprising histograms of gradient directions in the plurality of sub-regions. The information processing system according to any one of appendices 1 to 11, wherein the information processing system is generated by generating a feature vector of
(Appendix 13)
The first local feature quantity and the second local feature quantity are generated by selecting a dimension having a larger correlation between adjacent sub-regions from the generated feature vectors of a plurality of dimensions. The information processing system according to attachment 12.
(Appendix 14)
The plurality of dimensions of the feature vector is a predetermined dimension so that it can be selected in order from the dimension that contributes to the feature of the feature point and from the first dimension in accordance with the improvement in accuracy required for the local feature amount. 14. The information processing system according to appendix 12 or 13, wherein the local region is arranged so as to make a round for each number.
(Appendix 15)
The second local feature quantity generation means corresponds to the correlation of the learning target object, and the second local feature quantity having a smaller number of dimensions for the learning target object having a lower correlation with another learning target object. The information processing system according to attachment 14, wherein the information processing system is generated.
(Appendix 16)
The first local feature quantity storage means corresponds to the correlation of the learning object, and the first local feature quantity having a smaller number of dimensions for the learning object having a lower correlation with another learning object. 16. The information processing system according to appendix 14 or 15, wherein the information processing system is stored.
(Appendix 17)
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. An information processing method using an information processing system including a first local feature amount storage unit that stores a local feature amount in association with each other,
N feature points are extracted from an image in the captured video, and n second regions each consisting of a feature vector from one dimension to j dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a local feature;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
An information processing method comprising:
(Appendix 18)
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. Second local feature quantity generating means for generating the second local feature quantity of
First transmitting means for transmitting the m second local feature amounts to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature amounts;
First receiving means for receiving information indicating a learning object included in the captured image from the information processing apparatus;
A communication terminal comprising:
(Appendix 19)
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. A second local feature generation step of generating the second local feature of
A first transmission step of transmitting the m second local feature quantities to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature quantities;
A first receiving step of receiving information indicating a learning object included in the captured image from the information processing apparatus;
A control method for a communication terminal, comprising:
(Appendix 20)
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. A second local feature generation step of generating the second local feature of
A first transmission step of transmitting the m second local feature quantities to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature quantities;
A first receiving step of receiving information indicating a learning object included in the captured image from the information processing apparatus;
A program for controlling a communication terminal, which causes a computer to execute.
(Appendix 21)
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. First local feature quantity storage means for storing the local feature quantity in association with each other;
N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. Second receiving means for receiving the second local feature amount from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing means for recognizing that the learning object exists in
Second transmission means for transmitting information indicating the recognized learning object to the communication terminal;
An information processing apparatus comprising:
(Appendix 22)
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. A control method for an information processing apparatus including first local feature storage means for storing a local feature in association with each other,
N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. A second receiving step of receiving the second local feature amount of the communication terminal from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
A second transmission step of transmitting information indicating the recognized learning object to the communication terminal;
A method for controlling an information processing apparatus, comprising:
(Appendix 23)
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. A control program for an information processing apparatus including a first local feature storage unit that stores a local feature in association with each other,
N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. A second receiving step of receiving the second local feature amount of the communication terminal from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
A second transmission step of transmitting information indicating the recognized learning object to the communication terminal;
A computer-readable storage medium storing a control program for an information processing apparatus, characterized by causing a computer to execute.

Claims (23)

  1.  学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段と、
     撮像手段が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成手段と、
     前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する学習対象物認識手段と、
     を備えることを特徴とする情報処理システム。
    Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. First local feature quantity storage means for storing the local feature quantity in association with each other;
    N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. Second local feature quantity generating means for generating the second local feature quantity of
    A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Learning object recognition means for recognizing that the learning object exists in
    An information processing system comprising:
  2.  前記第1局所特徴量記憶手段は、複数の学習対象物にそれぞれ対応付けて各学習対象物の画像から生成した前記m個の第1局所特徴量を記憶し、
     前記学習対象物認識手段は、前記撮像手段が撮像した前記画像に含まれる複数の学習対象物を認識することを特徴とする請求項1に記載の情報処理システム。
    The first local feature amount storage means stores the m first local feature amounts generated from the images of the learning objects in association with the plurality of learning objects, respectively.
    The information processing system according to claim 1, wherein the learning object recognition unit recognizes a plurality of learning objects included in the image captured by the imaging unit.
  3.  前記学習対象物認識手段の認識結果を報知する報知手段をさらに備えることを特徴とする請求項1または2に記載の情報処理システム。 The information processing system according to claim 1, further comprising notification means for notifying a recognition result of the learning object recognition means.
  4.  前記報知手段は、さらに、前記認識結果に関連する情報を報知することを特徴とする請求項3に記載の情報処理システム。 4. The information processing system according to claim 3, wherein the notifying unit further notifies information related to the recognition result.
  5.  前記報知手段は、さらに、前記認識結果に関連する情報を取得するためのリンク情報を報知することを特徴とする請求項3または4に記載の情報処理システム。 The information processing system according to claim 3 or 4, wherein the notification means further notifies link information for acquiring information related to the recognition result.
  6.  前記報知手段は、前記認識結果に関連する情報をリンク情報にしたがって取得する関連情報取得手段を有し、
     リンク情報にしたがって取得した前記関連情報を報知することを特徴とする請求項3に記載の情報処理システム。
    The notification means includes related information acquisition means for acquiring information related to the recognition result according to link information,
    The information processing system according to claim 3, wherein the related information acquired according to link information is notified.
  7.  前記第1局所特徴量記憶手段に、捜索する学習対象物の局所特徴量を登録する登録手段をさらに備え、
     前記報知手段は、前記学習対象物認識手段が認識した学習対象物を捜索結果として報知することを特徴とする請求項3乃至6のいずれか1項に記載の情報処理システム。
    The first local feature quantity storage means further comprises a registration means for registering a local feature quantity of the learning object to be searched,
    The information processing system according to claim 3, wherein the notification unit notifies the learning target recognized by the learning target recognition unit as a search result.
  8.  前記学習対象物は、文字を含む学習対象物であって、
     前記報知手段は、前記学習対象物の内容を報知することを特徴とする請求項3乃至6のいずれか1項に記載の情報処理システム。
    The learning object is a learning object including characters,
    The information processing system according to any one of claims 3 to 6, wherein the notification unit notifies the content of the learning object.
  9.  前記学習対象物は、音に関連する学習対象物であって、
     前記報知手段は、前記学習対象物の内容を音の演奏で報知することを特徴とする請求項3乃至6のいずれか1項に記載の情報処理システム。
    The learning object is a learning object related to sound,
    The information processing system according to any one of claims 3 to 6, wherein the notification unit notifies the contents of the learning object by playing a sound.
  10.  前記学習対象物は、展示物であって、
     前記報知手段は、前記学習対象物の説明を報知することを特徴とする請求項3乃至6のいずれか1項に記載の情報処理システム。
    The learning object is an exhibit,
    The information processing system according to claim 3, wherein the notification unit notifies the description of the learning object.
  11.  前記学習対象物は、数式を含む学習対象物であって、
     前記報知手段は、前記学習対象物の数式を演算して演算結果を報知することを特徴とする請求項3乃至6のいずれか1項に記載の情報処理システム。
    The learning object is a learning object including a mathematical formula,
    The information processing system according to claim 3, wherein the notification unit calculates a mathematical expression of the learning object and notifies a calculation result.
  12.  前記第1局所特徴量および前記第2局所特徴量は、画像から抽出した特徴点を含む局所領域を複数のサブ領域に分割し、前記複数のサブ領域内の勾配方向のヒストグラムからなる複数の次元の特徴ベクトルを生成することにより生成されることを特徴とする請求項1乃至11のいずれか1項に記載の情報処理システム。 The first local feature amount and the second local feature amount are a plurality of dimensions formed by dividing a local region including a feature point extracted from an image into a plurality of sub-regions, and comprising histograms of gradient directions in the plurality of sub-regions. The information processing system according to claim 1, wherein the information processing system is generated by generating a feature vector.
  13.  前記第1局所特徴量および前記第2局所特徴量は、前記生成した複数の次元の特徴ベクトルから、隣接するサブ領域間の相関がより大きな次元を選定することにより生成されることを特徴とする請求項12に記載の情報処理システム。 The first local feature quantity and the second local feature quantity are generated by selecting a dimension having a larger correlation between adjacent sub-regions from the generated feature vectors of a plurality of dimensions. The information processing system according to claim 12.
  14.  前記特徴ベクトルの複数の次元は、前記特徴点の特徴に寄与する次元から順に、かつ、前記局所特徴量に対して求められる精度の向上に応じて第1次元から順に選択できるよう、所定の次元数ごとに前記局所領域をひと回りするよう配列することを特徴とする請求項12または13に記載の情報処理システム。 The plurality of dimensions of the feature vector is a predetermined dimension so that it can be selected in order from the dimension that contributes to the feature of the feature point and from the first dimension in accordance with the improvement in accuracy required for the local feature amount. The information processing system according to claim 12 or 13, wherein the local area is arranged so as to make a round for each number.
  15.  前記第2局所特徴量生成手段は、前記学習対象物の相関に対応して、他の学習対象物とより低い前記相関を有する学習対象物については次元数のより少ない前記第2局所特徴量を生成することを特徴とする請求項14に記載の情報処理システム。 The second local feature quantity generation means corresponds to the correlation of the learning target object, and the second local feature quantity having a smaller number of dimensions for the learning target object having a lower correlation with another learning target object. The information processing system according to claim 14, wherein the information processing system is generated.
  16.  前記第1局所特徴量記憶手段は、前記学習対象物の相関に対応して、他の学習対象物とより低い前記相関を有する学習対象物については次元数のより少ない前記第1局所特徴量を記憶することを特徴とする請求項14または15に記載の情報処理システム。 The first local feature quantity storage means corresponds to the correlation of the learning object, and the first local feature quantity having a smaller number of dimensions for the learning object having a lower correlation with another learning object. 16. The information processing system according to claim 14, wherein the information processing system is stored.
  17.  学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた情報処理システムを利用した情報処理方法であって、
     撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
     前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する認識ステップと、
     を含むことを特徴とする情報処理方法。
    Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the learning object image. An information processing method using an information processing system including a first local feature amount storage unit that stores a local feature amount in association with each other,
    N feature points are extracted from an image in the captured video, and n second regions each consisting of a feature vector from one dimension to j dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a local feature;
    A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
    An information processing method comprising:
  18.  撮像手段が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成手段と、
     前記m個の第2局所特徴量を、局所特徴量の照合に基づいて撮像した前記画像に含まれる学習対象物を認識する情報処理装置に送信する第1送信手段と、
     前記情報処理装置から、撮像した前記画像に含まれる学習対象物を示す情報を受信する第1受信手段と、
     を備えたことを特徴とする通信端末。
    N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. Second local feature quantity generating means for generating the second local feature quantity of
    First transmitting means for transmitting the m second local feature amounts to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature amounts;
    First receiving means for receiving information indicating a learning object included in the captured image from the information processing apparatus;
    A communication terminal comprising:
  19.  撮像手段が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
     前記m個の第2局所特徴量を、局所特徴量の照合に基づいて撮像した前記画像に含まれる学習対象物を認識する情報処理装置に送信する第1送信ステップと、
     前記情報処理装置から、撮像した前記画像に含まれる学習対象物を示す情報を受信する第1受信ステップと、
     を含むことを特徴とする通信端末の制御方法。
    N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. A second local feature generation step of generating the second local feature of
    A first transmission step of transmitting the m second local feature amounts to an information processing apparatus that recognizes a learning target included in the image captured based on the comparison of local feature amounts;
    A first receiving step of receiving information indicating a learning object included in the captured image from the information processing apparatus;
    A control method for a communication terminal, comprising:
  20.  撮像手段が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
     前記m個の第2局所特徴量を、局所特徴量の照合に基づいて撮像した前記画像に含まれる学習対象物を認識する情報処理装置に送信する第1送信ステップと、
     前記情報処理装置から、撮像した前記画像に含まれる学習対象物を示す情報を受信する第1受信ステップと、
     をコンピュータに実行させることを特徴とする通信端末の制御プログラム。
    N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. A second local feature generation step of generating the second local feature of
    A first transmission step of transmitting the m second local feature quantities to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature quantities;
    A first receiving step of receiving information indicating a learning object included in the captured image from the information processing apparatus;
    A program for controlling a communication terminal, which causes a computer to execute.
  21.  学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段と、
     通信端末が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を、前記通信端末から受信する第2受信手段と、
     前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する認識手段と、
     認識した前記学習対象物を示す情報を前記通信端末に送信する第2送信手段と、
     を備えることを特徴とする情報処理装置。
    Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. First local feature quantity storage means for storing the local feature quantity in association with each other;
    N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. Second receiving means for receiving the second local feature amount from the communication terminal;
    A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing means for recognizing that the learning object exists in
    Second transmission means for transmitting information indicating the recognized learning object to the communication terminal;
    An information processing apparatus comprising:
  22.  学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた情報処理装置の制御方法であって、
     通信端末が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を、前記通信端末から受信する第2受信ステップと、
     前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する認識ステップと、
     認識した前記学習対象物を示す情報を前記通信端末に送信する第2送信ステップと、
     を含むことを特徴とする情報処理装置の制御方法。
    Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. A control method for an information processing apparatus including first local feature storage means for storing a local feature in association with each other,
    N feature points are extracted from an image captured by the communication terminal, and n local regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. A second receiving step of receiving the second local feature amount of the communication terminal from the communication terminal;
    A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
    A second transmission step of transmitting information indicating the recognized learning object to the communication terminal;
    A method for controlling an information processing apparatus, comprising:
  23.  学習対象物と、前記学習対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた情報処理装置の制御プログラムであって、
     通信端末が撮像した映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を、前記通信端末から受信する第2受信ステップと、
     前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択された前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択された前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記学習対象物が存在すると認識する認識ステップと、
     認識した前記学習対象物を示す情報を前記通信端末に送信する第2送信ステップと、
     をコンピュータに実行させることを特徴とする情報処理装置の制御プログラム。
    Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. A control program for an information processing apparatus including a first local feature storage unit that stores a local feature in association with each other,
    N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. A second receiving step of receiving the second local feature amount of the communication terminal from the communication terminal;
    A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
    A second transmission step of transmitting information indicating the recognized learning object to the communication terminal;
    A computer-readable storage medium storing a control program for an information processing apparatus, characterized by causing a computer to execute.
PCT/JP2013/051954 2012-01-30 2013-01-30 Information processing system, information processing method, information processing device, and control method and control program therefor, and communication terminal, and control method and control program therefor WO2013115203A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2012-017385 2012-01-30
JP2012017385 2012-01-30

Publications (1)

Publication Number Publication Date
WO2013115203A1 true WO2013115203A1 (en) 2013-08-08

Family

ID=48905237

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/051954 WO2013115203A1 (en) 2012-01-30 2013-01-30 Information processing system, information processing method, information processing device, and control method and control program therefor, and communication terminal, and control method and control program therefor

Country Status (2)

Country Link
JP (1) JPWO2013115203A1 (en)
WO (1) WO2013115203A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015159775A1 (en) * 2014-04-15 2015-10-22 オリンパス株式会社 Image processing apparatus, communication system, communication method, and image-capturing device
JP2016541052A (en) * 2013-11-14 2016-12-28 シクパ ホルディング ソシエテ アノニムSicpa Holding Sa Image analysis to certify products
US11176402B2 (en) * 2017-05-17 2021-11-16 Samsung Electronics Co., Ltd Method and device for identifying object
WO2022085128A1 (en) * 2020-10-21 2022-04-28 日本電信電話株式会社 Name presentation device, name presentation method, and program

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150021283A (en) * 2013-08-20 2015-03-02 한국전자통신연구원 System and method for learning foreign language using smart glasses
CN111951616B (en) * 2020-08-18 2022-05-06 怀化学院 Picture-aided object-identifying device for preschool education

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001101191A (en) * 1999-09-27 2001-04-13 Cadix Inc Image identifying device and database system used for image identification
JP2006053622A (en) * 2004-08-10 2006-02-23 Hitachi Omron Terminal Solutions Corp Document link information acquisition system
JP2011008507A (en) * 2009-06-25 2011-01-13 Kddi Corp Image retrieval method and system
JP2011198130A (en) * 2010-03-19 2011-10-06 Fujitsu Ltd Image processing apparatus, and image processing program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001101191A (en) * 1999-09-27 2001-04-13 Cadix Inc Image identifying device and database system used for image identification
JP2006053622A (en) * 2004-08-10 2006-02-23 Hitachi Omron Terminal Solutions Corp Document link information acquisition system
JP2011008507A (en) * 2009-06-25 2011-01-13 Kddi Corp Image retrieval method and system
JP2011198130A (en) * 2010-03-19 2011-10-06 Fujitsu Ltd Image processing apparatus, and image processing program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HIRONOBU FUJIYOSHI: "Gradient-Based Feature Extraction -SIFT and HOG", IEICE TECHNICAL REPORT, vol. 107, no. 206, 27 August 2007 (2007-08-27), pages 211 - 224 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016541052A (en) * 2013-11-14 2016-12-28 シクパ ホルディング ソシエテ アノニムSicpa Holding Sa Image analysis to certify products
WO2015159775A1 (en) * 2014-04-15 2015-10-22 オリンパス株式会社 Image processing apparatus, communication system, communication method, and image-capturing device
CN106233283A (en) * 2014-04-15 2016-12-14 奥林巴斯株式会社 Image processing apparatus, communication system and communication means and camera head
US10133932B2 (en) 2014-04-15 2018-11-20 Olympus Corporation Image processing apparatus, communication system, communication method and imaging device
US11176402B2 (en) * 2017-05-17 2021-11-16 Samsung Electronics Co., Ltd Method and device for identifying object
WO2022085128A1 (en) * 2020-10-21 2022-04-28 日本電信電話株式会社 Name presentation device, name presentation method, and program

Also Published As

Publication number Publication date
JPWO2013115203A1 (en) 2015-05-11

Similar Documents

Publication Publication Date Title
US8176054B2 (en) Retrieving electronic documents by converting them to synthetic text
WO2013115203A1 (en) Information processing system, information processing method, information processing device, and control method and control program therefor, and communication terminal, and control method and control program therefor
Cliche et al. Scatteract: Automated extraction of data from scatter plots
US8086038B2 (en) Invisible junction features for patch recognition
Chiang et al. Using historical maps in scientific studies: Applications, challenges, and best practices
US20110153633A1 (en) Searching for handwritten annotations appearing a given distance from document content
TW201712600A (en) Methods and systems for detecting and recognizing text from images
CN101297318A (en) Data organization and access for mixed media document system
EP2015226A1 (en) Information retrieval using invisible junctions and geometric constraints
CN102402593A (en) Multi-modal approach to search query input
KR20130029430A (en) Character recognition device, character recognition method, character recognition system, and character recognition program
US20130057583A1 (en) Providing information services related to multimodal inputs
CN112801099B (en) Image processing method, device, terminal equipment and medium
Araújo et al. A real-world approach on the problem of chart recognition using classification, detection and perspective correction
Lund et al. How well does multiple OCR error correction generalize?
JP2011248596A (en) Searching system and searching method for picture-containing documents
CN112182275A (en) Trademark approximate retrieval system and method based on multi-dimensional feature fusion
JP5433396B2 (en) Manga image analysis device, program, search device and method for extracting text from manga image
JPWO2013088994A1 (en) Video processing system, video processing method, video processing apparatus for portable terminal or server, and control method and control program therefor
Du et al. From plane to hierarchy: Deformable transformer for remote sensing image captioning
JP5484113B2 (en) Document image related information providing apparatus and document image related information acquisition system
Zhao et al. Saliency-constrained semantic learning for airport target recognition of aerial images
CN114281919A (en) Node adding method, device, equipment and storage medium based on directory tree
JP6131859B2 (en) Information processing system, information processing method, information processing apparatus and control method and control program thereof, communication terminal and control method and control program thereof
Pillai et al. Document layout analysis using detection transformers

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13744083

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2013556425

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13744083

Country of ref document: EP

Kind code of ref document: A1