WO2013115203A1

WO2013115203A1 - Information processing system, information processing method, information processing device, and control method and control program therefor, and communication terminal, and control method and control program therefor

Info

Publication number: WO2013115203A1
Application number: PCT/JP2013/051954
Authority: WO
Inventors: 野村　俊之; 山田　昭雄; 岩元　浩太; 亮太間瀬
Original assignee: 日本電気株式会社
Priority date: 2012-01-30
Filing date: 2013-01-30
Publication date: 2013-08-08
Also published as: JPWO2013115203A1

Abstract

Provided is a technology for recognizing an object for study in an image in a video in real time. An object for study, and m number of first local features, which comprise feature vectors having from 1 to i dimensions for each of m number of local regions including m number of feature points in an image of the object for study, are associated and stored. Next, n number of feature points are extracted from an image in a captured video, and n number of second local features, which comprise feature vectors having from 1 to j dimensions for each of n number of local regions including n number of feature points, are generated. The number of dimensions (i) of the first local features or the number of dimensions (j) of the second local features, whichever is the smaller number of dimensions, is selected. The object for study is recognized to be present in the image from the video when a prescribed proportion or more of the m number of first local features up to the selected number of dimensions is determined to correspond to the n number of second local features up to the selected number of dimensions.

Description

Information processing system, information processing method, information processing apparatus and control method and control program thereof, communication terminal and control method and control program thereof

The present invention relates to a technique for identifying a learning object in a video imaged using local feature amounts.

In the above technical field, Patent Document 1 describes a technique for obtaining the name of an object (plant, insect, etc.) based on a video from a camera-equipped mobile phone and an inquiry mail. Japanese Patent Application Laid-Open No. 2004-228561 describes a technique that improves the recognition speed by clustering feature amounts when a query image is recognized using a model dictionary generated in advance from a model image.

JP 2003-132062 A JP 2011-221688 A

However, with the technique described in the above document, the learning object in the image in the video cannot be recognized in real time.

An object of the present invention is to provide a technique for solving the above-described problems.

In order to achieve the above object, a system according to the present invention provides:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. First local feature quantity storage means for storing the local feature quantity in association with each other;
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. Second local feature quantity generating means for generating the second local feature quantity of
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Learning object recognition means for recognizing that the learning object exists in
It is characterized by providing.

In order to achieve the above object, the method according to the present invention comprises:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. An information processing method in an information processing system including first local feature storage means for storing a local feature in association with each other,
N feature points are extracted from an image in the captured video, and n second regions each consisting of a feature vector from one dimension to j dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a local feature;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
It is characterized by providing.

In order to achieve the above object, an apparatus according to the present invention provides:
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. Second local feature quantity generating means for generating the second local feature quantity of
First transmitting means for transmitting the m second local feature amounts to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature amounts;
First receiving means for receiving information indicating a learning object included in the captured image from the information processing apparatus;
It is characterized by providing.

In order to achieve the above object, the method according to the present invention comprises:
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. A second local feature generation step of generating the second local feature of
A first transmission step of transmitting the m second local feature quantities to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature quantities;
A first receiving step of receiving information indicating a learning object included in the captured image from the information processing apparatus;
It is characterized by including.

In order to achieve the above object, a program according to the present invention provides:
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. A second local feature generation step of generating the second local feature of
A first transmission step of transmitting the m second local feature quantities to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature quantities;
A first receiving step of receiving information indicating a learning object included in the captured image from the information processing apparatus;
Is executed by a computer.

In order to achieve the above object, an apparatus according to the present invention provides:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. First local feature quantity storage means for storing the local feature quantity in association with each other;
N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. Second receiving means for receiving the second local feature amount from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing means for recognizing that the learning object exists in
Second transmission means for transmitting information indicating the recognized learning object to the communication terminal;
It is characterized by providing.

In order to achieve the above object, the method according to the present invention comprises:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. A control method for an information processing apparatus including first local feature storage means for storing a local feature in association with each other,
N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. A second receiving step of receiving the second local feature amount of the communication terminal from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
A second transmission step of transmitting information indicating the recognized learning object to the communication terminal;
It is characterized by including.

In order to achieve the above object, a program according to the present invention provides:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. A control program for an information processing apparatus including a first local feature storage unit that stores a local feature in association with each other,
N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. A second receiving step of receiving the second local feature amount of the communication terminal from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
A second transmission step of transmitting information indicating the recognized learning object to the communication terminal;
Is executed by a computer.

According to the present invention, the learning object in the image in the video can be recognized in real time.

It is a block diagram which shows the structure of the information processing system which concerns on 1st Embodiment of this invention. It is a block diagram which shows the structure of the information processing system which concerns on 2nd Embodiment of this invention. It is a figure which shows the example of a display screen of the communication terminal in the information processing system which concerns on 2nd Embodiment of this invention. It is a sequence diagram which shows the operation | movement procedure of the relevant information alerting | reporting in the information processing system which concerns on 2nd Embodiment of this invention. It is a sequence diagram which shows the operation | movement procedure of the link information alerting | reporting in the information processing system which concerns on 2nd Embodiment of this invention. It is a block diagram which shows the function structure of the communication terminal which concerns on 2nd Embodiment of this invention. It is a block diagram which shows the function structure of the learning target object recognition server which concerns on 2nd Embodiment of this invention. It is a figure which shows the structure of local feature-value DB which concerns on 2nd Embodiment of this invention. It is a figure which shows the structure of related information DB which concerns on 2nd Embodiment of this invention. It is a figure which shows the structure of link information DB which concerns on 2nd Embodiment of this invention. It is a block diagram which shows the function structure of the local feature-value production | generation part which concerns on 2nd Embodiment of this invention. It is a figure explaining the procedure of the local feature-value production | generation which concerns on 2nd Embodiment of this invention. It is a figure explaining the procedure of the local feature-value production | generation which concerns on 2nd Embodiment of this invention. It is a figure which shows the selection order of the sub area | region in the local feature-value production | generation part which concerns on 2nd Embodiment of this invention. It is a figure which shows the selection order of the feature vector in the local feature-value production | generation part which concerns on 2nd Embodiment of this invention. It is a figure which shows hierarchization of the feature vector in the local feature-value production | generation part which concerns on 2nd Embodiment of this invention. It is a figure which shows the structure of the encoding part which concerns on 2nd Embodiment of this invention. It is a figure which shows the process of the learning target object recognition part which concerns on 2nd Embodiment of this invention. It is a block diagram which shows the hardware constitutions of the communication terminal which concerns on 2nd Embodiment of this invention. It is a figure which shows the local feature-value production | generation table in the communication terminal which concerns on 2nd Embodiment of this invention. It is a flowchart which shows the process sequence of the communication terminal which concerns on 2nd Embodiment of this invention. It is a flowchart which shows the process sequence of the local feature-value production | generation process which concerns on 2nd Embodiment of this invention. It is a flowchart which shows the process sequence of the encoding process which concerns on 2nd Embodiment of this invention. It is a flowchart which shows the process sequence of the encoding process of the difference value which concerns on 2nd Embodiment of this invention. It is a block diagram which shows the hardware constitutions of the learning target object recognition server which concerns on 2nd Embodiment of this invention. It is a flowchart which shows the process sequence of the learning target object recognition server which concerns on 2nd Embodiment of this invention. It is a flowchart which shows the process sequence of local feature-value DB production | generation processing which concerns on 2nd Embodiment of this invention. It is a flowchart which shows the process sequence of the learning target object recognition process which concerns on 2nd Embodiment of this invention. It is a flowchart which shows the process sequence of the collation process which concerns on 2nd Embodiment of this invention. It is a sequence diagram which shows the operation | movement procedure of the information processing system which concerns on 3rd Embodiment of this invention. It is a sequence diagram which shows the operation | movement procedure of the recognition of the book in the information processing system which concerns on 4th Embodiment of this invention. It is a sequence diagram which shows the operation | movement procedure of the page recognition in the information processing system which concerns on 4th Embodiment of this invention. It is a sequence diagram which shows the operation | movement procedure of the kanji recognition in the information processing system which concerns on 4th Embodiment of this invention. It is a figure which shows the structure of content introduction DB which concerns on 4th Embodiment of this invention. It is a figure which shows the structure of page information DB which concerns on 4th Embodiment of this invention. It is a figure which shows the structure of dictionary DB which concerns on 4th Embodiment of this invention. It is a figure which shows the structure of dictionary DB for the translation which concerns on 4th Embodiment of this invention. It is a sequence diagram which shows the operation | movement procedure of the music recognition in the information processing system which concerns on 5th Embodiment of this invention. It is a sequence diagram which shows the operation | movement procedure of the music recognition in the information processing system which concerns on 5th Embodiment of this invention. It is a sequence diagram which shows the operation | movement procedure of the sound recognition in the information processing system which concerns on 5th Embodiment of this invention. It is a figure which shows the structure of content introduction DB which concerns on 5th Embodiment of this invention. It is a figure which shows the structure of performance information DB which concerns on 5th Embodiment of this invention. It is a figure which shows the structure of sound information DB which concerns on 5th Embodiment of this invention. It is a sequence diagram which shows the operation | movement procedure of the exhibit recognition in the information processing system which concerns on 6th Embodiment of this invention. It is a figure which shows the structure of content introduction DB which concerns on 6th Embodiment of this invention. It is a sequence diagram which shows the operation | movement procedure of numerical formula recognition in the information processing system which concerns on 7th Embodiment of this invention. It is a figure which shows the structure of numerical formula DB which concerns on 7th Embodiment of this invention. It is a figure which shows the structure of the calculation parameter table which concerns on 7th Embodiment of this invention. It is a figure which shows the example of a display screen of the communication terminal in the information processing system which concerns on 8th Embodiment of this invention. It is a figure which shows the example of a display screen of the communication terminal in the information processing system which concerns on 8th Embodiment of this invention. It is a block diagram which shows the function structure of the communication terminal which concerns on 8th Embodiment of this invention. It is a flowchart which shows the process sequence of the communication terminal which concerns on 8th Embodiment of this invention. It is a block diagram which shows the function structure of the communication terminal which concerns on 9th Embodiment of this invention.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the drawings. However, the constituent elements described in the following embodiments are merely examples, and are not intended to limit the technical scope of the present invention only to them. Note that the term “learning object” used in this specification includes a wide variety of teaching materials and works used in the educational world.

[First Embodiment]
An information processing system 100 as a first embodiment of the present invention will be described with reference to FIG. The information processing system 100 is a system that recognizes a learning object in real time.

As shown in FIG. 1, the information processing system 100 includes a first local feature quantity storage unit 110, an imaging unit 120, a second local feature quantity generation unit 130, and a learning target object recognition 140. The first local feature quantity storage unit 110 generates each of the learning object 111 and m local regions including each of m feature points of the image of the learning object 111 from one dimension to i dimension. The m first local feature quantities 112 composed of the feature vectors up to are stored in association with each other. The second local feature quantity generation unit 130 extracts n feature points 131 from the image 101 in the video captured by the imaging unit 120. Then, the second local feature value generation unit 130, for n local regions 132 each including the n feature points 131, n second local feature values each consisting of a feature vector from 1 dimension to j dimension. 133 is generated. The learning object recognition 140 selects a smaller number of dimensions from the dimension number i of the feature vector of the first local feature quantity 112 and the dimension number j of the feature vector of the second local feature quantity 133. Then, the learning object recognition 140 adds the m second local feature amounts consisting of feature vectors up to the selected number of dimensions to the n second local feature amounts 133 consisting of feature vectors up to the selected number of dimensions. When it is determined that the predetermined ratio of 112 or more corresponds, it is recognized that the learning object 111 exists in the image 101 in the video.

According to the present embodiment, the learning target in the image in the video can be recognized in real time.

[Second Embodiment]
Next, an information processing system according to the second embodiment of the present invention will be described. In the present embodiment, the learning target in the video is recognized by collating the local feature generated from the video captured by the communication terminal with the local feature stored in the local feature DB of the learning target recognition server. To do. Then, the recognized learning object is notified by adding its name, related information, and / or link information.

According to the present embodiment, the name, related information, and / or link information can be notified in association with the learning object in the image in the video in real time.

<Configuration of information processing system>
FIG. 2 is a block diagram illustrating a configuration of the information processing system 200 according to the present embodiment.

The information processing system 200 in FIG. 2 includes a communication terminal 210 having an imaging function, connected via a network 240, a learning object recognition server 220 that recognizes a learning object from an image captured by the communication terminal 210, And a related information providing server 230 that provides related information to the communication terminal 210.

The communication terminal 210 displays the captured video on the display unit. And each name of the learning object recognized by the learning object recognition server 220 based on the local feature-value produced | generated by the local feature-value production | generation part with respect to the imaged image | video like the display screen 211 of FIG. Are superimposed and displayed. As shown in the figure, the communication terminal 210 represents a plurality of communication terminals including a mobile phone having an imaging function and other communication terminals.

The learning object recognition server 220 corresponds to the local feature DB 221 that stores the learning object and the local feature in association with each other, the related information DB 222 that stores the related information in correspondence with the learning object, and the learning object. And a link information DB 223 for storing link information. The learning object recognition server 220 returns the name of the learning object recognized based on the collation with the local feature quantity of the local feature quantity DB 221 from the local feature quantity of the video received from the communication terminal 210. In addition, related information such as introduction corresponding to the learning object recognized from the related information DB 222 is searched and returned to the communication terminal 210. Moreover, the link information to the related information providing server 230 corresponding to the learning object recognized from the link information DB 223 is searched and returned to the communication terminal 210. The name of the learning object, the related information corresponding to the learning object, and the link information for the learning object may be provided separately or may be provided at the same time.

The related information providing server 230 has a related information DB 231 that stores related information corresponding to the learning object. Access is made based on link information provided corresponding to the learning object recognized by the learning object recognition server 220. Then, the related information corresponding to the recognized learning object is searched from the related information DB 231 and returned to the communication terminal 210 that has transmitted the local feature amount of the video including the learning object. Therefore, although one related information providing server 230 is shown in FIG. 2, as many related information providing servers 230 as the number of link destinations are connected. In that case, the selection of an appropriate link destination by the learning object recognition server 220 or a plurality of link destinations are displayed on the communication terminal 210 and the selection is performed by the user.

FIG. 2 shows an example in which the name is superimposed on the learning object in the captured video. The display of the related information corresponding to the learning object and the link information for the learning object will be described with reference to FIG.

(Example of communication terminal display screen)
FIG. 3 is a diagram illustrating a display screen example of the communication terminal 210 in the information processing system 200 according to the present embodiment.

3 is an example of a display screen that displays related information corresponding to the learning object. The display screen 310 in FIG. 3 includes a captured image 311 and operation buttons 312. The learning target is recognized by collating the local feature generated from the video in the upper left diagram with the local feature DB 221 of the learning target recognition server 220. As a result, on the display screen 320 in the upper right diagram, a video 321 is displayed in which the captured video and the learning object name and related information 322 to 325 are superimposed. At the same time, the related information may be output by voice through the speaker 340.

The lower part of FIG. 3 is an example of a display screen that displays link information corresponding to the learning object. The learning target is recognized by collating the local feature generated from the video in the lower left figure with the local feature DB 221 of the learning target recognition server 220. As a result, on the display screen 330 in the lower right diagram, a video 331 is displayed in which the captured video is superimposed with the learning object name and link information 332 to 335. Although not shown, by clicking the displayed link information, the linked related information providing server 230 is accessed, and the related information retrieved from the related information DB 231 is displayed on the communication terminal 210, or the communication terminal 210 receives audio. Output.

<< Operation procedure of information processing system >>
Hereinafter, an operation procedure of the information processing system 200 in the present embodiment will be described with reference to FIGS. 4 and 5. 4 and 5 do not show display examples of only recognized learning object names, the learning object names may be transmitted to the communication terminal 210 after the learning object recognition. Further, the display of the learning object name, the related information, and the link information can be realized by combining FIG. 4 and FIG.

(Related information notification procedure)
FIG. 4 is a sequence diagram showing an operation procedure of related information notification in the information processing system 200 according to the present embodiment.

First, if necessary, in step S400, an application and / or data is downloaded from the learning object recognition server 220 to the communication terminal 210. In step S401, the application is activated and initialized to perform the processing of this embodiment.

In step S403, the communication terminal captures an image by the imaging unit and acquires a video. In step S405, a local feature amount is generated from the video. Subsequently, in step S407, the local feature amount is encoded together with the feature point coordinates. The encoded local feature amount is transmitted from the communication terminal to the learning object recognition server 220 in step S409.

In step S411, the learning target object recognition server 220 refers to the local feature amount DB 221 generated and stored for the image of the learning target object, and recognizes the learning target object in the video. In step S413, the related information is acquired by referring to the related information DB 222 corresponding to the recognized learning object. In step S415, the learning object name and related information are transmitted from the learning object recognition server 220 to the communication terminal 210.

In step S417, the communication terminal 210 notifies the received learning object name and related information (see the upper part of FIG. 3). Note that it is desirable that the learning object name is displayed and the related information is displayed or output by voice.

(Operation procedure of link information notification)
FIG. 5 is a sequence diagram showing an operation procedure of link information notification in the information processing system 200 according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4, and description is abbreviate | omitted.

In steps S400 and S401, although there is a possibility of a difference between applications and data, downloading, activation and initialization are performed as in FIG.

The learning object recognition server 220 that has recognized the learning object in the landscape from the local feature amount of the video received from the communication terminal 210 in step S411 refers to the link information DB 223 in step S513 and recognizes the learning object. Get link information corresponding to. In step S515, the learning object name and link information are transmitted from the learning object recognition server 220 to the communication terminal 210.

In step S517, the communication terminal 210 displays the received learning object name and link information superimposed on the video (see the lower part of FIG. 3). In step S519, an instruction from the user of link information is awaited. If there is a user's link destination instruction, in step S521, the related information providing server 230 that is the link destination is accessed with the learning object ID.

In step S523, the related information providing server 230 acquires related information (including document data and audio data) from the related information DB 231 using the received learning object ID. In step S525, the related information is returned to the access source communication terminal 210.

The communication terminal 210 that has received the reply of the related information displays or outputs the received related information in step S527.

<Functional configuration of communication terminal>
FIG. 6 is a block diagram illustrating a functional configuration of the communication terminal 210 according to the present embodiment.

In FIG. 6, the imaging unit 601 inputs a video as a query image. The local feature value generation unit 602 generates a local feature value from the video from the imaging unit 601. The local feature amount transmission unit 603 encodes the generated local feature amount together with the feature point coordinates by the encoding unit 603a and transmits the encoded local feature amount to the learning object recognition server 220 via the communication control unit 604.

The learning object recognition result receiving unit 605 receives the learning object recognition result from the learning object recognition server 220 via the communication control unit 604. The display screen generation unit 606 generates a display screen of the received learning object recognition result and notifies the user.

Also, the related information receiving unit 607 receives related information via the communication control unit 604. Then, the display screen generation unit 606 and the sound generation unit 698 generate a display screen and sound data of the received related information and notify the user. Note that the related information received by the related information receiving unit 607 includes related information from the learning object recognition server 220 or related information from the related information providing server 230.

Also, the link information receiving unit 609 receives link information from the related information providing server 230 via the communication control unit 604. Then, the display screen generation unit 606 generates a display screen of the received link information and notifies the user. The link destination access unit 610 accesses the link destination related information providing server 230 based on the click of link information by an operation unit (not shown).

Note that the learning object recognition result receiving unit 605, the related information receiving unit 607, and the link information receiving unit 609 are not provided, but are provided as one information receiving unit that receives information received via the communication control unit 604. May be.

<< Functional structure of the learning object recognition server >>
FIG. 7 is a block diagram showing a functional configuration of the learning object recognition server 220 according to the present embodiment.

In FIG. 7, the local feature receiving unit 702 decodes the local feature received from the communication terminal 210 via the communication control unit 701 by the decoding unit 702a. The learning object recognition unit 703 recognizes the learning object by comparing the received local feature quantity with the local feature quantity of the local feature quantity DB 221 that stores the local feature quantity corresponding to the learning object. The learning object recognition result transmission unit 704 transmits the learning object recognition result (learning object name) to the communication terminal 210.

The related information acquisition unit 705 refers to the related information DB 222 and acquires related information corresponding to the recognized learning object. The related information DB related information transmission unit 706 transmits the acquired related information to the communication terminal 210. When the learning object recognition server 220 transmits related information, as shown in FIG. 4, transmitting the learning object recognition result and the related information as a single transmission data reduces communication traffic. desirable.

The link information acquisition unit 707 refers to the link information DB 223 and acquires link information corresponding to the recognized learning object. The link information transmission unit 708 transmits the acquired link information to the communication terminal 210. When transmitting link information, as shown in FIG. 5, the learning object recognition server 220 transmits the learning object recognition result and the link information as a single transmission data because communication traffic is reduced. desirable.

Of course, when the learning object recognition server 220 transmits the learning object recognition result, the related information, and the link information, the transmission of one piece of transmission data after acquiring all the information is to reduce the communication traffic. This is desirable.

It should be noted that the configuration of the related information providing server 230 includes various linkable providers, and a description of the configuration is omitted.

(Local feature DB)
FIG. 8 is a diagram illustrating a configuration of the local feature DB 221 according to the present embodiment. Note that the present invention is not limited to such a configuration.

The local feature DB 221 stores a first local feature 803, a second local feature 804, ..., an mth local feature 805 in association with the learning object ID 801 and the name 802. Each local feature quantity stores a feature vector composed of 1-dimensional to 150-dimensional elements hierarchized by 25 dimensions corresponding to 5 × 5 subregions (see FIG. 11F).

Note that m is a positive integer and may be a different number corresponding to the learning object ID. In the present embodiment, the feature point coordinates used for the matching process are stored together with the respective local feature amounts.

(Related information DB)
FIG. 9 is a diagram showing a configuration of the related information DB 222 according to the present embodiment. Note that the present invention is not limited to such a configuration.

The related information DB 222 stores related display data 903 and related audio data 904 that are related information in association with the learning object ID 901 and the learning object name 902. The related information DB 222 may be provided integrally with the local feature DB 221.

(Link information DB)
FIG. 10 is a diagram showing a configuration of the link information DB 223 according to the present embodiment. Note that the present invention is not limited to such a configuration.

The link information DB 223 stores ink information, for example, a URL (Uniform Resource Locator) 1003 and display data 10904 on the display screen in association with the learning object ID 1001 and the learning object name 1002. The link information DB 223 may be provided integrally with the local feature amount DB 221 and the related information DB 222.

Note that the related information DB 231 of the related information providing server 230 is the same as the related information DB 222 of the learning object recognition server 220, and a description thereof is omitted to avoid duplication.

<< Local feature generator >>
FIG. 11A is a block diagram illustrating a configuration of a local feature value generation unit 702 according to the present embodiment.

The local feature quantity generation unit 702 includes a feature point detection unit 1111, a local region acquisition unit 1112, a sub region division unit 1113, a sub region feature vector generation unit 1114, and a dimension selection unit 1115.

The feature point detection unit 1111 detects a large number of characteristic points (feature points) from the image data, and outputs the coordinate position, scale (size), and angle of each feature point.

The local region acquisition unit 1112 acquires a local region where feature amount extraction is performed from the coordinate value, scale, and angle of each detected feature point.

The sub area dividing unit 1113 divides the local area into sub areas. For example, the sub-region dividing unit 1113 can divide the local region into 16 blocks (4 × 4 blocks) or divide the local region into 25 blocks (5 × 5 blocks). The number of divisions is not limited. In the present embodiment, the case where the local area is divided into 25 blocks (5 × 5 blocks) will be described below as a representative.

The sub-region feature vector generation unit 1114 generates a feature vector for each sub-region of the local region. As the feature vector of the sub-region, for example, a gradient direction histogram can be used.

The dimension selection unit 1115 selects a dimension to be output as a local feature amount (for example, thinning out) so that the correlation between feature vectors of adjacent sub-regions becomes low based on the positional relationship of the sub-regions. In addition, the dimension selection unit 1115 can not only select a dimension but also determine a selection priority. That is, the dimension selection unit 1115 can select dimensions with priorities so that, for example, dimensions in the same gradient direction are not selected between adjacent sub-regions. Then, the dimension selection unit 1115 outputs a feature vector composed of the selected dimensions as a local feature amount. In addition, the dimension selection part 1115 can output a local feature-value in the state which rearranged the dimension based on the priority.

<< Processing of local feature generator >>
11B to 11F are diagrams showing processing of the local feature quantity generation unit 602 according to the present embodiment.

First, FIG. 11B is a diagram showing a series of processing of feature point detection / local region acquisition / sub-region division / feature vector generation in the local feature quantity generation unit 602. Such a series of processes is described in US Pat. No. 6,711,293, David G. Lowe, “Distinctive image features from scale-invariant key points” (USA), International Journal of Computer Vision, 60 (2), 2004. Year, p. 91-110.

(Feature point detector)
An image 1121 in FIG. 11B is a diagram illustrating a state in which feature points are detected from an image in the video in the feature point detection unit 1111 in FIG. 11A. Hereinafter, the generation of local feature amounts will be described by using one feature point data 1121a as a representative. The starting point of the arrow of the feature point data 1121a indicates the coordinate position of the feature point, the length of the arrow indicates the scale (size), and the direction of the arrow indicates the angle. Here, as the scale (size) and direction, brightness, saturation, hue, and the like can be selected according to the target image. Further, in the example of FIG. 11B, the case of six directions at intervals of 60 degrees will be described, but the present invention is not limited to this.

(Local area acquisition unit)
For example, the local region acquisition unit 1112 in FIG. 11A generates a Gaussian window 1122a around the starting point of the feature point data 1121a, and generates a local region 1122 that substantially includes the Gaussian window 1122a. In the example of FIG. 11B, the local region acquisition unit 1112 generates a square local region 1122, but the local region may be circular or have another shape. This local region is acquired for each feature point. If the local area is circular, there is an effect that the robustness is improved with respect to the imaging direction.

(Sub-region division part)
Next, the sub-region dividing unit 1113 shows a state in which the scale and angle of each pixel included in the local region 1122 of the feature point data 1121a are divided into sub-regions 1123. FIG. 11B shows an example in which 4 × 4 = 16 pixels are divided into 5 × 5 = 25 subregions. However, the sub-region may be 4 × 4 = 16, other shapes, or the number of divisions.

(Sub-region feature vector generator)
The sub-region feature vector generation unit 1114 generates and quantizes the histogram of the scale of each pixel in the sub-region in units of angles in six directions, and sets it as a sub-region feature vector 1124. That is, the direction is normalized with respect to the angle output by the feature point detection unit 1111. Then, the sub-region feature vector generation unit 1114 aggregates the frequencies in the six directions quantized for each sub-region, and generates a histogram. In this case, the sub-region feature vector generation unit 1114 outputs a feature vector constituted by a histogram of 25 sub-region blocks × 6 directions = 150 dimensions generated for each feature point. In addition, the gradient direction is not limited to 6 directions, but may be quantized to an arbitrary quantization number such as 4 directions, 8 directions, and 10 directions. When the gradient direction is quantized in the D direction, if the gradient direction before quantization is G (0 to 2π radians), the quantized value Qq (q = 0,..., D−1) in the gradient direction can be expressed by, for example, Although it can obtain | require by (1), Formula (2), etc., it is not restricted to this.

Qq = floor (G × D / 2π) (1)
Qq = round (G × D / 2π) mod D (2)
Here, floor () is a function for rounding off the decimal point, round () is a function for rounding off, and mod is an operation for obtaining a remainder. Further, when generating the gradient histogram, the sub-region feature vector generation unit 1114 may add up the magnitudes of the gradients instead of adding up the simple frequencies. In addition, when the sub-region feature vector generation unit 1114 aggregates the gradient histogram, the sub-region feature vector generation unit 1114 assigns weight values not only to the sub-region to which the pixel belongs, but also to sub-regions (such as adjacent blocks) that are close to each other according to the distance between the sub-regions. You may make it add. Further, the sub-region feature vector generation unit 1114 may add weight values to gradient directions before and after the quantized gradient direction. Note that the feature vector of the sub-region is not limited to the gradient direction histogram, and may be any one having a plurality of dimensions (elements) such as color information. In the present embodiment, it is assumed that a gradient direction histogram is used as the feature vector of the sub-region.

(Dimension selection part)
Next, processing will be described in the dimension selection unit 1115 in the local feature quantity generation unit 602 according to FIGS. 11C to 11F.

The dimension selection unit 1115 selects (decimates) a dimension (element) to be output as a local feature amount based on the positional relationship between the sub-regions so that the correlation between feature vectors of adjacent sub-regions becomes low. More specifically, the dimension selection unit 1115 selects dimensions such that at least one gradient direction differs between adjacent sub-regions, for example. In the present embodiment, the dimension selection unit 1115 mainly uses adjacent subregions as adjacent subregions. However, the adjacent subregions are not limited to adjacent subregions. A sub-region within a predetermined distance may be a nearby sub-region.

FIG. 11C shows an example in which a dimension is selected from a feature vector 1131 of a 150-dimensional gradient histogram generated by dividing a local region into 5 × 5 block sub-regions and quantizing gradient directions into six directions 1131a. FIG. In the example of FIG. 11C, dimensions are selected from feature vectors of 150 dimensions (5 × 5 = 25 sub-region blocks × 6 directions).

(Dimension selection of local area)
FIG. 11C is a diagram showing a state of feature vector dimension number selection processing in the local feature value generation unit 602.

As shown in FIG. 11C, the dimension selection unit 1115 selects a feature vector 1132 of a half 75-dimensional gradient histogram from a feature vector 1131 of a 150-dimensional gradient histogram. In this case, dimensions can be selected so that dimensions in the same gradient direction are not selected in adjacent left and right and upper and lower sub-region blocks.

In this example, when the quantized gradient direction in the gradient direction histogram is q (q = 0, 1, 2, 3, 4, 5), a block for selecting elements of q = 0, 2, 4 and , Q = 1, 3, and 5 are alternately arranged with sub-region blocks for selecting elements. In the example of FIG. 11C, when the gradient directions selected in the adjacent sub-region blocks are combined, there are six directions.

Also, the dimension selection unit 1115 selects the feature vector 1133 of the 50-dimensional gradient histogram from the feature vector 1132 of the 75-dimensional gradient histogram. In this case, the dimension can be selected so that only one direction is the same (the remaining one direction is different) between the sub-region blocks positioned at an angle of 45 degrees.

In addition, when the dimension selection unit 1115 selects the feature vector 1134 of the 25-dimensional gradient histogram from the feature vector 1133 of the 50-dimensional gradient histogram, the gradient direction selected between the sub-region blocks located at an angle of 45 degrees. Dimension can be selected so that does not match. In the example shown in FIG. 11C, the dimension selection unit 1115 selects one gradient direction from each sub-region from the first dimension to the 25th dimension, selects two gradient directions from the 26th dimension to the 50th dimension, and starts from the 51st dimension. Three gradient directions are selected up to 75 dimensions.

As described above, it is desirable that the gradient directions should not be overlapped between adjacent sub-area blocks and that all gradient directions should be selected uniformly. At the same time, as in the example shown in FIG. 11C, it is desirable that the dimensions be selected uniformly from the entire local region. Note that the dimension selection method illustrated in FIG. 11C is an example, and is not limited to this selection method.

(Local area priority)
FIG. 11D is a diagram illustrating an example of the selection order of feature vectors from sub-regions in the local feature value generation unit 602.

The dimension selection unit 1115 can determine the priority of selection so as to select not only the dimensions but also the dimensions that contribute to the features of the feature points in order. That is, the dimension selection unit 1115 can select dimensions with priorities so that, for example, dimensions in the same gradient direction are not selected between adjacent sub-area blocks. Then, the dimension selection unit 1115 outputs a feature vector composed of the selected dimensions as a local feature amount. In addition, the dimension selection part 1115 can output a local feature-value in the state which rearranged the dimension based on the priority.

That is, the dimension selection unit 1115 adds dimensions in the order of the sub-region blocks as shown in the matrix 1141 in FIG. 11D, for example, between 1 to 25 dimensions, 26 dimensions to 50 dimensions, and 51 dimensions to 75 dimensions. It may be selected. When the priority order shown in the matrix 1141 in FIG. 11D is used, the dimension selection unit 1115 can select the gradient direction by increasing the priority order of the sub-region blocks close to the center.

11E is a diagram illustrating an example of element numbers of 150-dimensional feature vectors in accordance with the selection order of FIG. 11D. In this example, 5 × 5 = 25 blocks are represented by numbers p (p = 0, 1,..., 25) in raster scan order, and the quantized gradient direction is represented by q (q = 0, 1, 2, 3, 4). , 5), the element number of the feature vector is 6 × p + q.

The matrix 1161 in FIG. 11F is a diagram showing that the 150-dimensional order according to the selection order in FIG. 11E is hierarchized in units of 25 dimensions. In other words, the matrix 1161 in FIG. 11F is a diagram illustrating a configuration example of local feature amounts obtained by selecting the elements illustrated in FIG. 11E according to the priority order illustrated in the matrix 1141 in FIG. 11D. The dimension selection unit 1115 can output dimension elements in the order shown in FIG. 11F. Specifically, for example, when outputting a 150-dimensional local feature amount, the dimension selection unit 1115 can output all 150-dimensional elements in the order shown in FIG. 11F. When the dimension selection unit 1115 outputs, for example, a 25-dimensional local feature, the element 1171 in the first row (76th, 45th, 83rd,..., 120th) shown in FIG. 11F is shown in FIG. 11F. Can be output in order (from left to right). For example, when outputting a 50-dimensional local feature value, the dimension selection unit 1115 adds the elements 1172 in the second row shown in FIG. 11F in the order shown in FIG. To the right).

Incidentally, in the example shown in FIG. 11F, the local feature amount has a hierarchical structure arrangement. That is, for example, in the 25-dimensional local feature quantity and the 150-dimensional local feature quantity, the arrangement of the elements 1171 to 1176 in the first 25-dimensional local feature quantity is the same. In this way, the dimension selection unit 1115 selects a dimension hierarchically (progressively), thereby depending on the application, communication capacity, terminal specification, etc. Feature quantities can be extracted and output. In addition, the dimension selection unit 1115 can select images hierarchically, sort the dimensions based on the priority order, and output them, thereby collating images using local feature amounts of different dimensions. . For example, when images are collated using a 75-dimensional local feature value and a 50-dimensional local feature value, the distance between the local feature values can be calculated by using only the first 50 dimensions.

Note that the priorities shown in the matrix 1141 in FIG. 11D to FIG. 11F are merely examples, and the order of selecting dimensions is not limited to this. For example, the order of blocks may be the order shown in the matrix 1142 in FIG. 11D or the matrix 1143 in FIG. 11D in addition to the example of the matrix 1141 in FIG. 11D. Further, for example, the priority order may be determined so that dimensions are selected from all the sub-regions. Also, the vicinity of the center of the local region may be important, and the priority order may be determined so that the selection frequency of the sub-region near the center is increased. Further, the information indicating the dimension selection order may be defined in the program, for example, or may be stored in a table or the like (selection order storage unit) referred to when the program is executed.

Also, the dimension selection unit 1115 may select a dimension by selecting one sub-region block. That is, 6 dimensions are selected in a certain sub-region, and 0 dimensions are selected in other sub-regions close to the sub-region. Even in such a case, it can be said that the dimension is selected for each sub-region so that the correlation between adjacent sub-regions becomes low.

In addition, the shape of the local region and sub-region is not limited to a square, and can be any shape. For example, the local region acquisition unit 1112 may acquire a circular local region. In this case, the sub-region dividing unit 1113 can divide the circular local region into, for example, nine or seventeen sub-regions into concentric circles having a plurality of local regions. Even in this case, the dimension selection unit 1115 can select a dimension in each sub-region.

As described above, as shown in FIGS. 11B to 11F, according to the local feature value generation unit 602 of the present embodiment, the dimension of the feature vector generated while maintaining the information amount of the local feature value is hierarchically selected. The This processing enables real-time learning object recognition and recognition result display while maintaining recognition accuracy. Note that the configuration and processing of the local feature value generation unit 602 are not limited to this example. Naturally, other processes that enable real-time object recognition and recognition result display while maintaining recognition accuracy can be applied.

(Encoding part)
FIG. 11G is a block diagram showing the encoding unit 603a according to the present embodiment. Note that the encoding unit is not limited to this example, and other encoding processes can be applied.

The encoding unit 603a has a coordinate value scanning unit 1181 that inputs the coordinates of feature points from the feature point detection unit 1111 of the local feature quantity generation unit 602 and scans the coordinate values. The coordinate value scanning unit 1181 scans the image according to a specific scanning method, and converts the two-dimensional coordinate values (X coordinate value and Y coordinate value) of the feature points into one-dimensional index values. This index value is a scanning distance from the origin according to scanning. There is no restriction on the scanning direction.

Also, it has a sorting unit 1182 that sorts the index values of feature points and outputs permutation information after sorting. Here, the sorting unit 1182 sorts, for example, in ascending order. You may also sort in descending order.

Also, a difference calculation unit 1183 that calculates a difference value between two adjacent index values in the sorted index value and outputs a series of difference values is provided.

And, it has a differential encoding unit 1184 that encodes a sequence of difference values in sequence order. The sequence of the difference value may be encoded with a fixed bit length, for example. When encoding with a fixed bit length, the bit length may be specified in advance, but this requires the number of bits necessary to express the maximum possible difference value, so the encoding size is small. Don't be. Therefore, when encoding with a fixed bit length, the differential encoding unit 1184 can determine the bit length based on the input sequence of difference values. Specifically, for example, the difference encoding unit 1184 obtains the maximum value of the difference value from the input series of difference values, obtains the number of bits (expression number of bits) necessary to express the maximum value, A series of difference values can be encoded with the obtained number of expression bits.

On the other hand, it has a local feature encoding unit 1185 that encodes the local feature of the corresponding feature point in the same permutation as the index value of the sorted feature point. By encoding with the same permutation as the sorted index value, it is possible to associate the coordinate value encoded by the differential encoding unit 1184 and the corresponding local feature amount on a one-to-one basis. In this embodiment, the local feature amount encoding unit 1185 encodes a local feature amount that is dimension-selected from 150-dimensional local feature amounts for one feature point, for example, one dimension with one byte, and the number of dimensions. Can be encoded.

(Learning object recognition unit)
FIG. 11H is a diagram illustrating processing of the learning object recognition unit 703 according to the present embodiment.

FIG. 11H shows a state in which the local feature amount generated from the video 311 captured by the communication terminal 210 in the display screen 310 in FIG. FIG.

From the video 311 captured by the communication terminal 210 in the left diagram of FIG. 11H, local feature amounts are generated according to the present embodiment. Then, it is verified whether or not the local feature amounts 1191 to 1194 stored in the local feature amount DB 221 corresponding to each learning object are in the local feature amounts generated from the video 311.

As shown in FIG. 11H, the learning object recognition unit 703 associates each feature point where the local feature quantity stored in the local feature quantity DB 221 matches the local feature quantity like a thin line. Note that the learning target object recognition unit 703 determines that the feature points match when a predetermined ratio or more of the local feature amounts match. Then, the learning object recognition unit 703 recognizes that it is a target learning object if the positional relationship between the sets of associated feature points is a linear relationship. If such recognition is performed, it is possible to recognize by size difference, orientation difference (difference in viewpoint), or inversion. In addition, since recognition accuracy can be obtained if there are a predetermined number or more of associated feature points, the learning object can be recognized even if a part of the feature points is hidden from view.

In FIG. 11H, four different learning objects in the landscape that match the local feature quantities 1191 to 1194 of the four learning objects in the local feature quantity DB 221 are recognized with a precision corresponding to the accuracy of the local feature quantity. Is done.

<< Hardware configuration of communication terminal >>
FIG. 12A is a block diagram illustrating a hardware configuration of the communication terminal 210 according to the present embodiment.

In FIG. 12A, a CPU 1210 is a processor for arithmetic control, and implements each functional component of the communication terminal 210 by executing a program. The ROM 1220 stores fixed data and programs such as initial data and programs. The communication control unit 604 is a communication control unit, and in the present embodiment, communicates with the learning object recognition server 220 and the related information providing server 230 via a network. Note that the number of CPUs 1210 is not limited to one, and may be a plurality of CPUs or may include a GPU (GraphicsGraphProcessing Unit) for image processing.

The RAM 1240 is a random access memory that the CPU 1210 uses as a work area for temporary storage. The RAM 1240 has an area for storing data necessary for realizing the present embodiment. An input video 1241 indicates an input video imaged and input by the imaging unit 601. The feature point data 1242 indicates feature point data including the feature point coordinates, scale, and angle detected from the input video 1241. The local feature value generation table 1243 is a local feature value generation table that holds data until a local feature value is generated (see FIG. 12B). The local feature value 1244 is generated using the local feature value generation table 1243 and indicates a local feature value to be sent to the learning object recognition server 220 via the communication control unit 604. A learning object recognition result 1245 indicates a learning object recognition result returned from the learning object recognition server 220 via the communication control unit 604. The related information / link information 1246 indicates related information and link information returned from the learning object recognition server 220 or related information returned from the related information providing server 230. The display screen data 1247 indicates display screen data for notifying the user of information including the learning object recognition result 1245 and related information / link information 1246. In the case of outputting audio, audio data may be included. Input / output data 1248 indicates input / output data input / output via the input / output interface 1260. Transmission / reception data 1249 indicates transmission / reception data transmitted / received via the communication control unit 604.

The storage 1250 stores a database, various parameters, or the following data or programs necessary for realizing the present embodiment. A display format 1251 indicates a display format for displaying information including the learning object recognition result 1245 and related information / link information 1246.

The storage 1250 stores the following programs. The communication terminal control program 1252 indicates a communication terminal control program that controls the entire communication terminal 210. The communication terminal control program 1252 includes the following modules. The local feature value generation module 1253 indicates a module that generates a local feature value from the input video according to FIGS. 11B to 11F in the communication terminal control program 1252. The local feature quantity generation module 1253 is composed of the illustrated module group, but detailed description thereof is omitted here. The encoding module 1254 indicates a module that encodes the local feature generated by the local feature generating module 1253 for transmission. The information reception notification module 1255 is a module for receiving the learning object recognition result 1245 and the related information / link information 1246 and notifying the user by display or voice. The link destination access module 1256 is a module that accesses a link destination based on a user instruction to the link information received and notified.

The input / output interface 1260 interfaces input / output data with input / output devices. The input / output interface 1260 is connected to a display unit 1261, a touch panel or keyboard as the operation unit 1262, a speaker 1263, a microphone 1264, and an imaging unit 601. The input / output device is not limited to the above example. In addition, a GPS (Global Positioning System) position generation unit 1265 is mounted, and acquires the current position based on a signal from a GPS satellite.

In FIG. 12A, only data and programs essential to the present embodiment are shown, and data and programs not related to the present embodiment are not shown.

(Local feature generation table)
FIG. 12B is a diagram showing a local feature generation table 1243 in the communication terminal 210 according to the present embodiment.

In the local feature quantity generation table 1243, a plurality of detected feature points 1202, feature point coordinates 1203, and local region information 1204 corresponding to the feature points are stored in association with the input image ID 1201. A selection dimension 1208 including a plurality of sub-region IDs 1205, sub-region information 1206, a feature vector 1207 corresponding to each sub-region, and a priority order in association with each detected feature point 1202, feature point coordinates 1203 and local region information 1204. Is memorized.

A local feature quantity 1209 is generated for each detected feature point 1202 from the above data. Data collected by combining these with feature point coordinates is a local feature quantity 1244 transmitted to the learning object recognition server 220 generated from the captured landscape.

<< Processing procedure of communication terminal >>
FIG. 13 is a flowchart illustrating a processing procedure of the communication terminal 210 according to the present embodiment. This flowchart is executed by the CPU 1210 of FIG. 12A using the RAM 1240, and implements each functional component of FIG.

First, in step S1311, it is determined whether or not there is a video input for recognizing the learning object. In step S1321, data reception is determined. In step S1331, it is determined whether the instruction is a link destination by the user. Otherwise, other processing is performed in step S1341. Note that description of normal transmission processing is omitted.

If there is video input, the process proceeds to step S1313, and local feature generation processing is executed based on the input video (see FIG. 14A). Next, in step S1315, local feature quantities and feature point coordinates are encoded (see FIGS. 14B and 14C). In step S1317, the encoded data is transmitted to the learning object recognition server 220.

In the case of data reception, the process proceeds to step S 1323, where it is determined whether the learning object recognition result and the related information are received from the learning object recognition server 220 or the related information is received from the related information providing server 230. If it is reception from the learning object recognition server 220, it will progress to step S1325 and will alert | report the received learning object recognition result, related information, and link information by a display or audio | voice output. On the other hand, if it is reception from the related information provision server 230, it will progress to step S1327 and will alert | report the received related information by a display or audio | voice output.

(Local feature generation processing)
FIG. 14A is a flowchart illustrating a processing procedure of local feature generation processing S1313 according to the present embodiment.

First, in step S1411, the position coordinates, scale, and angle of the feature points are detected from the input video. In step S1413, a local region is acquired for one of the feature points detected in step S1411. Next, in step S1415, the local area is divided into sub-areas. In step S1417, a feature vector for each sub-region is generated to generate a feature vector for the local region. The processing of steps S1411 to S1417 is illustrated in FIG. 11B.

Next, in step S1419, dimension selection is performed on the feature vector of the local region generated in step S1417. The dimension selection is illustrated in FIGS. 11D to 11F.

In step S1421, it is determined whether the generation of local features and dimension selection have been completed for all feature points detected in step S1411. If not completed, the process returns to step S1413 to repeat the process for the next one feature point.

(Encoding process)
FIG. 14B is a flowchart illustrating a processing procedure of the encoding processing S1315 according to the present embodiment.

First, in step S1431, the coordinate values of feature points are scanned in a desired order. Next, in step S1433, the scanned coordinate values are sorted. In step S1435, a difference value of coordinate values is calculated in the sorted order. In step S1437, the difference value is encoded (see FIG. 14C). In step S1439, local feature amounts are encoded in the coordinate value sorting order. The difference value encoding and the local feature amount encoding may be performed in parallel.

(Difference processing)
FIG. 14C is a flowchart illustrating a processing procedure of difference value encoding processing S1437 according to the present embodiment.

First, in step S1441, it is determined whether or not the difference value is within a range that can be encoded. If it is within the range which can be encoded, it will progress to step S1447 and will encode a difference value. Then, control goes to a step S1449. If it is not within the range that can be encoded (outside the range), the process proceeds to step S1443 to encode the escape code. In step S1445, the difference value is encoded by an encoding method different from the encoding in step S1447. Then, control goes to a step S1449. In step S1449, it is determined whether the processed difference value is the last element in the series of difference values. If it is the last, the process ends. When it is not the last, it returns to step S1441 again and the process with respect to the next difference value of the series of a difference value is performed.

<< Hardware configuration of learning object recognition server >>
FIG. 15 is a block diagram illustrating a hardware configuration of the learning object recognition server 220 according to the present embodiment.

15, a CPU 1510 is a processor for arithmetic control, and implements each functional component of the learning object recognition server 220 in FIG. 7 by executing a program. The ROM 1520 stores fixed data and programs such as initial data and programs. The communication control unit 701 is a communication control unit, and in this embodiment, communicates with the communication terminal 210 or the related information providing server 230 via a network. Note that the number of CPUs 1510 is not limited to one, and may be a plurality of CPUs or may include a GPU for image processing.

The RAM 1540 is a random access memory that the CPU 1510 uses as a work area for temporary storage. The RAM 1540 has an area for storing data necessary for realizing the present embodiment. The received local feature value 1541 indicates a local feature value including the feature point coordinates received from the communication terminal 210. The read local feature value 1542 indicates the local feature value when including the feature point coordinates read from the local feature value DB 221. The learning target object recognition result 1543 indicates a learning target object recognition result recognized from collation between the received local feature value and the local feature value stored in the local feature value DB 221. The related information 1544 indicates related information retrieved from the related information DB 222 in correspondence with the learning object of the learning object recognition result 1543. The link information 1545 indicates link information searched from the link information DB 223 corresponding to the learning object of the learning object recognition result 1543. Transmission / reception data 1546 indicates transmission / reception data transmitted / received via the communication control unit 701.

The storage 1550 stores a database, various parameters, or the following data or programs necessary for realizing the present embodiment. The local feature DB 221 is a local feature DB similar to that shown in FIG. The related information DB 222 is a related information DB similar to that shown in FIG. The link information DB 223 shows the same link information DB as shown in FIG.

The storage 1550 stores the following programs. The learning target object recognition server control program 1551 indicates a learning target object recognition server control program that controls the entire learning target object recognition server 220. The local feature DB creation module 1552 indicates a module that generates a local feature from a learning target image and stores it in the local feature DB 221 in the learning target recognition server control program 1551. The learning target object recognition module 1553 is a module that recognizes the learning target object in the learning target object recognition server control program 1551 by comparing the received local feature value with the local feature value stored in the local feature value DB 221. The related information / link information acquisition module 1554 indicates a module that acquires related information and link information from the related information DB 222 and the link information DB 223 corresponding to the recognized learning object. The recognition result / information transmission module 1555 indicates a module that transmits a recognized learning object name, acquired related information, and link information.

Note that FIG. 15 shows only data and programs essential to the present embodiment, and does not illustrate data and programs not related to the present embodiment.

<< Processing procedure of the learning object recognition server >>
FIG. 16 is a flowchart showing a processing procedure of the learning object recognition server 220 according to the present embodiment. This flowchart is executed by the CPU 1510 of FIG. 15 using the RAM 1540, and implements each functional component of the learning object recognition server 220 of FIG.

First, in step S1611, it is determined whether or not a local feature DB is generated. In step S1621, it is determined whether a local feature amount is received from the communication terminal. Otherwise, other processing is performed in step S1641.

If the local feature DB is generated, the process advances to step S1613 to execute a local feature DB generation process (see FIG. 17). If a local feature is received, the process proceeds to step S1623 to perform learning object recognition processing (see FIGS. 18A and 18B).

Next, in step S1625, related information and link information corresponding to the recognized learning object are acquired. Then, the recognized learning object name, related information, and link information are transmitted to the communication terminal 210.

(Local feature DB generation processing)
FIG. 17 is a flowchart showing a processing procedure of local feature DB generation processing S1613 according to the present embodiment.

First, in step S1701, an image of a learning object is acquired. In step S1703, the position coordinates, scale, and angle of the feature points are detected. In step S1705, a local region is acquired for one of the feature points detected in step S1703. Next, in step S1707, the local area is divided into sub-areas. In step S1709, a feature vector for each sub-region is generated to generate a local region feature vector. The processing from step S1705 to S1709 is illustrated in FIG. 11B.

Next, in step S1711, dimension selection is performed on the feature vector of the local region generated in step S1709. The dimension selection is illustrated in FIGS. 11D to 11F. However, in the generation of the local feature DB 221, hierarchization is performed in dimension selection, but it is desirable to store all generated feature vectors.

In step S1713, it is determined whether generation of local feature values and dimension selection have been completed for all feature points detected in step S1703. If not completed, the process returns to step S1705 to repeat the process for the next one feature point. If all feature points have been completed, the process advances to step S1715 to register local feature values and feature point coordinates in the local feature value DB 221 in association with the learning object.

In step S1717, it is determined whether there is an image of another learning object. If there is an image of another learning object, the process returns to step S1701 to acquire an image of another learning object and repeat the process.

(Learning object recognition processing)
FIG. 18A is a flowchart showing a processing procedure of learning object recognition processing S1623 according to the present embodiment.

First, in step S1811, the local feature amount of one learning object is acquired from the local feature amount DB 221. In step S1813, the local feature amount of the learning object is collated with the local feature amount received from the communication terminal 210 (see FIG. 18B).

In step S1815, it is determined whether or not they match. If they match, the process proceeds to step S1821, and the matched learning object is stored as being in the image captured by the communication terminal 210.

In step S1817, it is determined whether or not all learning objects registered in the local feature DB 221 have been collated, and if there is any remaining, the process returns to step S1811 to repeat the collation of the next learning object. In such collation, the field may be limited in advance in order to reduce the load on the learning time object recognition server or the drill time process by improving the processing speed.

(Verification process)
FIG. 18B is a flowchart showing a processing procedure of collation processing S1813 according to the present embodiment.

First, in step S1831, parameters p = 1 and q = 0 are set as initialization. Next, in step S1833, a smaller number of dimensions is selected between the dimension number i of the local feature quantity in the local feature quantity DB 221 and the dimension number j of the received local feature quantity.

In the loop of steps S1835 to S1845, the collation of each local feature amount is repeated until p> m (m = number of feature points of learning object). First, in step S1835, data of the selected number of dimensions of the p-th local feature amount of the learning target stored in the local feature amount DB 221 is acquired. That is, the number of dimensions selected from the first one dimension is acquired. Next, in step S1837, the p-th local feature value acquired in step S1835 and the local feature values of all feature points generated from the input video are sequentially checked to determine whether or not they are similar. In step S1839, it is determined whether or not the similarity exceeds the threshold value α from the result of collation between the local feature amounts. If so, the local feature amount matches the input image and the learning object in step S1841. A combination with the positional relationship of feature points is stored. Then, q, which is a parameter for the number of matched feature points, is incremented by one. In step S1843, the feature point of the learning object is advanced to the next feature point (p ← p + 1), and if all feature points of the learning object have not been collated (p ≦ m), the process returns to step S1835. Repeat matching of matching local features. Note that the threshold value α can be changed according to the recognition accuracy required by the learning object. Here, if the learning object has a low correlation with other learning objects, accurate recognition is possible even if the recognition accuracy is lowered.

When the collation with all feature points of the learning object is completed, the process proceeds from step S1845 to S1847, and it is determined in steps S1847 to S1853 whether or not the learning object exists in the input video. First, in step S1847, it is determined whether or not the ratio of the feature point number q that matches the local feature amount of the feature point of the input video among the feature point number p of the learning object exceeds the threshold value β. If it exceeds, the process proceeds to step S1849, and it is further determined as a learning target candidate whether the positional relationship between the feature point of the input video and the feature point of the learning target has a relationship that allows linear transformation. . In other words, the positional relationship between the feature point of the input image and the feature point of the learning target stored as the local feature amount matches in step S1841 is possible even by a change such as rotation, inversion, or change of the viewpoint position. Or whether the positional relationship cannot be changed. Since such a determination method is geometrically known, detailed description thereof is omitted. If it is determined in step S1851 that the linear conversion is possible, the process proceeds to step S953 to determine that the collated learning object exists in the input video. Note that the threshold value β can be changed according to the recognition accuracy required by the learning object. Here, if the learning object has a low correlation with other learning objects or a feature can be determined even from a part, accurate recognition is possible even if there are few matching feature points. That is, even if a part is hidden and cannot be seen, or if a characteristic part is visible, the learning object can be recognized.

In step S1855, it is determined whether or not an unmatched learning object remains in the local feature DB 221. If the learning object still remains, the next learning object is set in step S957, initialized to parameters p = 1 and q = 0, and the process returns to step S935 to repeat the collation.

Note that, as is clear from the description of the collation processing, the processing for storing all the learning objects in the local feature DB 221 and collating all the learning objects is very heavy. Therefore, for example, before recognizing the learning object from the input video, it is conceivable that the user selects a range of the learning object from the menu, searches the range from the local feature DB 221 and collates the range. Also, the load can be reduced by storing only the local feature amount in the range used by the user in the local feature amount DB 221.

[Third Embodiment]
Next, an information processing system according to the third embodiment of the present invention will be described. The information processing system according to the present embodiment is different from the second embodiment in that related information is automatically accessed from a link destination even if the user does not perform a link destination access operation. Since other configurations and operations are the same as those of the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.

According to the present embodiment, it is possible to notify related information of a link destination in association with a learning object in an image in a video in real time without a user operation.

<< Operation procedure of information processing system >>
FIG. 19 is a sequence diagram showing an operation procedure of the information processing system according to the present embodiment. In FIG. 19, operations similar to those in FIG. 5 of the second embodiment are denoted by the same step numbers, and description thereof is omitted.

In steps S400 and S401, although there is a possibility that there is a difference between applications and data, download, activation and initialization are performed in the same manner as in FIGS.

The learning object recognition server 220 that has recognized the learning object in the video from the local feature amount of the video received from the communication terminal 210 in step S411 refers to the link information DB 223 in step S513 and recognizes the learning target object. Get link information corresponding to.

If there are a plurality of acquired link information, a link destination is selected in step S1915. The selection of the link destination may be performed based on, for example, an instruction from a user using the communication terminal 210 or user recognition by the learning object recognition server 220, but detailed description thereof is omitted here. In step S1917, access is performed with the learning object ID that recognizes the related information providing server 230 of the link destination based on the link information. In the operation procedure of FIG. 19, the communication terminal ID that has transmitted the local feature amount of the video by the link destination access is also transmitted.

The related information providing server 230 acquires learning object related information (including document data and voice data) corresponding to the learning object ID accompanying the access from the related information DB 231. In step S525, the related information is returned to the access source communication terminal 210. Here, in step S1917, the transmitted communication terminal ID is used.

In FIG. 19, the case where the response to the link destination access from the learning object recognition server 220 is made to the communication terminal 210 has been described. However, the learning object recognition server 220 may be configured to receive a reply from the link destination and relay it to the communication terminal 210. Alternatively, the communication terminal 210 may be configured such that when link information is received, automatic access to the link destination is performed and a reply from the link destination is notified.

[Fourth Embodiment]
Next, an information processing system according to the fourth embodiment of the present invention will be described. The information processing system according to the present embodiment is obtained by applying the second and third embodiments to a learning object including a language. Since other configurations and operations are the same as those in the second embodiment or the third embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.

According to this embodiment, it is possible to recognize a learning object including a language and learn related information, particularly reading.

<< Operation procedure of information processing system >>
Hereinafter, several examples of operation procedures of the information processing system according to the present embodiment will be described with reference to sequence diagrams. Note that application to a learning object including a language is not limited to this.

(Book recognition)
FIG. 20A is a sequence diagram illustrating an operation procedure of book recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.

First, in step S2013, the back cover of the book and the advertisement image of the book are taken by the communication terminal 210. In this example, the video is represented by a back cover or a promotional image, but it may be a book case, a cover, an imprint, a table of contents, or other book-related images.

The learning object recognition server 220 receives the local feature amount from the communication terminal 210, and recognizes the book with reference to the local feature amount DB 221 in step S2021. If it is a response such as a book name, the recognition result is transmitted from the learning object recognition server 220 to the communication terminal 210 in step S2023. When introducing the contents of a book by display or voice, in step S2025, the contents introduction DB 2022 is referred to, and contents introduction data corresponding to the recognized book is acquired. In step S2027, the learning object recognition server 220 transmits the recognition result and the content introduction data to the communication terminal 210.

The communication terminal 210 receives the recognition result or the content introduction from the learning object recognition server 220, and notifies the user of the recognition result or the content by display and / or voice in step S2029.

Also, when content introduction is acquired from the link related information providing server 230, in step S2031, the link information DB 223 is referred to, and the link destination corresponding to the recognized book is acquired. In step S2033, the learning object recognition server 220 accesses the link destination.

In step S2035, the linked related information providing server 230 refers to the content introduction DB 2023 and acquires content introduction data corresponding to the book. In step S2037, the related information providing server 230 transmits the recognition result and the content introduction data to the communication terminal 210.

The communication terminal 210 receives the recognition result or the content introduction from the related information providing server 230, and notifies the user of the recognition result or the content by display and / or voice in step S2039. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2033.

20A shows the procedure in which the learning target object recognition server 220 automatically accesses the link destination. However, the learning target object recognition server 220 returns the link destination to the communication terminal 210, and issues a link instruction in the communication terminal 210. It may be configured to wait.

(Page recognition)
FIG. 20B is a sequence diagram illustrating an operation procedure of page recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.

First, in step S2043, a book is opened by the communication terminal 210 and a page is shot. In addition, it may be a two-page spread, a single page, a part of the page, a photograph, a diagram, a table, or the like in the page.

The learning target object recognition server 220 receives the local feature value from the communication terminal 210, and recognizes the page with reference to the local feature value DB 221 in step S2051. Next, in step S2053, the page information DB 2024 is referred to, and page information based on the reading voice corresponding to the recognized page is acquired. In step S2055, page data of page reading voice is transmitted from the learning object recognition server 220 to the communication terminal 210.

The communication terminal 210 receives the page data from the learning object recognition server 220, and notifies the user of the page content by reproducing the page reading voice in step S2057.

Further, when acquiring page information from the related information providing server 230 of the link destination, in step S2061, the link information corresponding to the recognized page is acquired with reference to the link information DB 223. In step S2063, the link destination is accessed from the learning object recognition server 220.

In step S2065, the link related information providing server 230 refers to the page information DB 2025 and acquires page information corresponding to the page. In step S2067, the page information of the page reading voice is transmitted from the related information providing server 230 to the communication terminal 210.

The communication terminal 210 receives the page data from the related information providing server 230, and notifies the user of the page content by reproducing the page reading voice in step S2069. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2063.

20A, the learning object recognition server 220 may return a link destination to the communication terminal 210 and wait for a link instruction in the communication terminal 210, as in FIG. 20B.

(Kanji recognition)
FIG. 20C is a sequence diagram illustrating an operation procedure for kanji recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.

First, in step S2073, the communication terminal 210 shoots kanji, idioms, and sentences in the book. Note that the cover may be taken as shown in FIG. 20A or the page may be taken as shown in FIG. 20B.

The learning object recognition server 220 receives the local feature amount from the communication terminal 210, and recognizes kanji, idioms, and sentences with reference to the local feature amount DB 221 in step S2081. Next, in step S2083, the dictionary DB 2026 is referred to, and the display or voice reading or meaning corresponding to the recognized kanji, idiom, or sentence is acquired. In step S2085, the learning object recognition server 220 transmits display / audio data indicating the reading and meaning to the communication terminal 210.

The communication terminal 210 receives display / audio data indicating how to read and the meaning from the learning object recognition server 220, and notifies the user of the reading and the meaning by displaying and reproducing the sound in step S2087.

Also, when acquiring the reading and meaning from the related information providing server 230 of the link destination, in step S2091, the link information corresponding to the recognized kanji, idiom, and sentence is acquired with reference to the link information DB 223. In step S2093, the learning target object recognition server 220 accesses the link destination.

In step S2095, the linked related information providing server 230 refers to the dictionary DB 2027, and acquires readings and meanings corresponding to kanji, idioms, and sentences. In step S2097, the related information providing server 230 transmits display / audio data indicating the reading and meaning to the communication terminal 210.

The communication terminal 210 receives the page data from the related information providing server 230, and notifies the user in step S2099 by displaying kanji, idioms, how to read and meaning the sentence, and reproducing the voice data. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2093.

Note that the display and voice notification in FIG. 20C is preferably displayed for images including a plurality of kanji, idioms, and sentences, and if one kanji, idiom, sentence video, or a part thereof, is notified by voice. desirable.
Similarly to FIGS. 20A and 20B, in FIG. 20C, the learning target object recognition server 220 may return the link destination to the communication terminal 210 and wait for a link instruction in the communication terminal 210.

(Other recognition)
Although not shown in the present embodiment, the result of translation of words, phrases, and sentences into other languages can be similarly displayed or voiced.

《Database》
Next, the configuration of each database used in the operations of FIGS. 20A to 20C will be described with reference to FIGS. 21A to 21C. In addition, the structure of the database used for the translation example is shown in FIG. 21D. These databases will be described separately from the local feature DB 221, but may be provided integrally with the local feature.

(Content introduction DB)
FIG. 21A is a diagram showing a configuration of the

content introduction DB

2022 or 2023 according to the present embodiment. The content introduction DB 2022 and the content introduction DB 2023 have basically the same configuration, but considering the storage capacity, the content introduction DB 2023 provides more detailed content or more items than the content introduction DB 2022. it can.

The

content introduction DB

2022 or 2023 stores a work name 2112, a writer 2113, a publisher 2114, an issue date 2115, and content introduction information 2116 including display data and audio data in association with the book ID 2111. All may be included in the content introduction information.

(Page information DB)
FIG. 21B is a diagram showing a configuration of the

page information DB

2024 or 2025 according to the present embodiment. Note that the page information DB 2024 and the page information DB 2025 have basically the same configuration, but considering the storage capacity, the page information DB 2025 provides more detailed contents or more items than the page information DB 2024. it can.

The

page information DB

2024 or 2025 stores the first reading data / speaker 2124 and the second reading data / speaker 2125 in association with the page number 2122 and the chapter / part information 2123 in association with the book ID 2121.

(Dictionary DB)
FIG. 21C is a diagram showing a configuration of the

dictionary DB

2026 or 2027 according to the present embodiment. The dictionary DB 2026 and the dictionary DB 2027 have basically the same configuration, but considering the storage capacity, the dictionary DB 2027 can prepare more detailed contents or more items than the dictionary DB 2026.

The

dictionary DB

2026 or 2027 has, for example, three parts, a kanji DB 2130, an idiom DB 2140, and a sentence DB 2150. All may be integrated.

The kanji DB 2130 stores kanji reading data 2132 composed of display and voice, reading data 2133, and explanation data (meaning / usage) 2134 in association with the kanji ID 2131.

The idiom DB 2140 stores reading data 2142 and comment data (meaning / usage) 2143 composed of display and voice in association with the idiom ID 2141.

The text DB 2150 stores reading data 2152 and comment data (meaning / usage) 2153 composed of display and voice in association with the text ID 2151. The sentence DB 2150 may include proverbs, haiku, waka, and the like.

(Dictionary DB for translation)
FIG. 21D is a diagram showing a configuration of the translation dictionary DB 2100 according to the present embodiment. FIG. 21D illustrates the configuration of a translation dictionary from Japanese to a foreign language, but the same applies to other translation dictionaries.

The translation dictionary DB 2100 includes, for example, three parts, a word DB 2160, a phrase DB 2170, and a sentence DB 2180. All may be integrated.

The word DB 2160 stores, in association with the Japanese ID 2161, English word data 2162 composed of notation and voice, other language data 2163, and explanation data (meaning / usage) 2164.

The phrase DB 2170 stores English phrase data 2172 composed of notation and speech, other language phrase data 2173, and explanation data (meaning / usage) 2174 in association with the Japanese phrase ID 2171.

The sentence DB 2180 stores, in association with the Japanese sentence ID 2181, English sentence data 2182 composed of notation and speech, other language sentence data 2183, and explanation data (meaning / usage) 2184.

The phrase DB 2170 and the sentence DB 2180 may include proverbs, haiku, waka, poetry, and the like.

[Fifth Embodiment]
Next, an information processing system according to the fifth embodiment of the present invention will be described. The information processing system according to the present embodiment is obtained by applying the second embodiment and the third embodiment to a learning object including sound. Since other configurations and operations are the same as those in the second embodiment or the third embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.

According to the present embodiment, it is possible to recognize a learning object including sound and learn related information, particularly performance.

<< Operation procedure of information processing system >>
Hereinafter, several examples of operation procedures of the information processing system according to the present embodiment will be described with reference to sequence diagrams. Note that application to a learning object including sound is not limited to this.

(Music recognition)
FIG. 22A is a sequence diagram showing an operation procedure of music recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.

First, in step S2213, the communication terminal 210 shoots a music jacket, a CD, or a concert promotional image. In this example, the image is represented by a jacket or a promotional image, but other music-related images may be used.

The learning object recognition server 220 receives the local feature amount from the communication terminal 210, and performs music recognition with reference to the local feature amount DB 221 in step S2221. If the response is an album name, a performer, concert information, etc., the recognition result is transmitted from the learning object recognition server 220 to the communication terminal 210 in step S2223. When introducing the contents of music by display or voice, in step S2225, the contents introduction DB 2222 is referred to, and contents introduction data corresponding to the recognized music is acquired. In step S2227, the learning object recognition server 220 transmits the recognition result and the content introduction data to the communication terminal 210.

The communication terminal 210 receives the recognition result or the content introduction from the learning object recognition server 220, and notifies the user of the recognition result or the content by display and / or voice in step S2229.

Further, when content introduction is acquired from the link destination related information providing server 230, in step S2231, the link information DB 223 is referred to, and the link destination corresponding to the recognized music is acquired. In step S2233, the learning target object recognition server 220 accesses the link destination.

In step S2235, the linked related information providing server 230 refers to the content introduction DB 2023 and acquires content introduction data corresponding to music. In step S2237, the related information providing server 230 transmits the recognition result and the content introduction data to the communication terminal 210.

The communication terminal 210 receives the recognition result or the content introduction from the related information providing server 230, and notifies the user of the recognition result or the content with display and / or voice in step S2239. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2233.

22A shows the procedure in which the learning object recognition server 220 automatically accesses the link destination. However, the learning object recognition server 220 returns the link destination to the communication terminal 210, and gives a link instruction in the communication terminal 210. It may be configured to wait.

(Song recognition)
FIG. 22B is a sequence diagram showing an operation procedure of music recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.

First, in step S2243, the cover or page of the score is photographed by the communication terminal 210. Note that the page may be two pages, one page, or a part of the page.

The learning object recognition server 220 receives the local feature amount from the communication terminal 210, and in step S2251, refers to the local feature amount DB 221 to perform song recognition. Next, in step S2253, the performance information which is music performance data corresponding to the recognized music is acquired with reference to the performance information DB 2224. In step S <b> 2255, the song audio data is transmitted from the learning object recognition server 220 to the communication terminal 210.

The communication terminal 210 receives the music performance data from the learning object recognition server 220, reproduces the music, and notifies the user in step S2257.

Further, when the performance information is acquired from the linked related information providing server 230, in step S2261, the link destination corresponding to the recognized music is acquired with reference to the link information DB 223. In step S 2263, the link destination is accessed from the learning object recognition server 220.

The linked related information providing server 230 refers to the performance information DB 2225 in step S2265 and acquires performance information corresponding to the song. In step S <b> 2267, the music performance data is transmitted from the related information providing server 230 to the communication terminal 210.

The communication terminal 210 receives the music performance data from the related information providing server 230, and reproduces the music and notifies the user in step S2269. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2263.

22A, the learning object recognition server 220 may return the link destination to the communication terminal 210 and wait for a link instruction in the communication terminal 210, as in FIG. 22B.

(Sound recognition)
FIG. 22C is a sequence diagram showing an operation procedure of sound recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.

First, in step S2273, the communication terminal 210 shoots notes and bars in the score.

The learning object recognition server 220 receives the local feature amount from the communication terminal 210, and recognizes a note or a measure with reference to the local feature amount DB 221 in step S2281. Next, in step S2283, the sound information DB 2226 is referred to, and a sound or a sound string corresponding to the recognized note or measure is acquired. In step S2285, the learning object recognition server 220 transmits sound and sound string data to the communication terminal 210.

The communication terminal 210 receives the sound data from the learning object recognition server 220 and notifies the user by reproducing the sound in step S2287.

Further, when the sound information is acquired from the linked related information providing server 230, in step S2291, the link information corresponding to the recognized sound or sound string is acquired with reference to the link information DB 223. In step S2293, the learning target object recognition server 220 accesses the link destination.

In step S2295, the linked related information providing server 230 refers to the sound information DB 2227 and acquires sound data corresponding to the sound. In step S2297, the related information providing server 230 transmits sound data to the communication terminal 210.

The communication terminal 210 receives the sound data from the related information providing server 230, and notifies the user by reproducing the sound data in step S2299. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2293.

22A and 22B, the learning object recognition server 220 may return the link destination to the communication terminal 210 and wait for a link instruction in the communication terminal 210, as in FIG. 22C.

《Database》
Next, the configuration of each database used in the operations of FIGS. 22A to 22C will be described with reference to FIGS. 23A to 23C. These databases will be described separately from the local feature DB 221, but may be provided integrally with the local feature.

(Content introduction DB)
FIG. 23A is a diagram showing a configuration of the

content introduction DB

2222 or 2223 according to the present embodiment. The content introduction DB 2222 and the content introduction DB 2223 have basically the same configuration, but considering the storage capacity, the content introduction DB 2223 provides more detailed content or more items than the content introduction DB 2222. it can.

The

content introduction DB

2222 or 2223 stores content introduction information 2315 including a performer / singer 2312, a recording location 2313, a recording date / release date 2314, display data and audio data in association with the CD / DVD / record jacket ID 2311. To do. The CD / DVD / record jacket ID 2311 may be a concert ID. Each CD / DVD / record jacket ID 2311 includes a plurality of song IDs 2316 and a song introduction 2317. All may be stored as content introduction information.

(Performance information DB)
FIG. 23B is a diagram showing a configuration of the

performance information DB

2224 or 2225 according to the present embodiment. The performance information DB 2224 and the performance information DB 2225 have basically the same configuration, but considering the storage capacity, the performance information DB 2225 provides more detailed contents or more items than the performance information DB 2224. it can.

The

performance information DB

2224 or 2225 stores a song name 2322, a first song reproduction data 2323 by the first player, and second song reproduction data 2324 by the second player in association with the song ID 2321. The performer can be replaced by a conductor or singer.

(Sound information DB)
FIG. 23C is a diagram showing a configuration of the

sound information DB

2226 or 2227 according to the present embodiment. The sound information DB 2226 and the sound information DB 2227 have basically the same configuration, but considering the storage capacity, the sound information DB 2227 provides more detailed contents or more items than the sound information DB 2226. it can.

The

sound information DB

2226 or 2227 includes a measure DB 2330 that stores reproduction data in units of measures and a sound DB 2340 that stores reproduction data in units of sounds. The measure DB 2330 stores a song name (or song ID) 2332 including the measure and measure reproduction data 2333 in association with the measure ID 2331. On the other hand, the sound DB 2340 stores a sound name / floor name 2342, first sound reproduction data 2343 by Viano, second sound reproduction data 2344 by violin, and third sound reproduction data 2345 by flute in association with the sound ID 2341. The type of musical instrument is not limited to this example. It may be the voice of a singer.

[Sixth Embodiment]
Next, an information processing system according to the sixth embodiment of the present invention will be described. The information processing system according to the present embodiment is an application of the second and third embodiments to an exhibit. Since other configurations and operations are the same as those in the second embodiment or the third embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted. Exhibits include materials from museums and folk museums, paintings and sculptures from museums, and exhibitions from exhibitions and exhibitions.

According to the present embodiment, the exhibit can be recognized and related information can be learned.

<< Operation procedure of information processing system >>
FIG. 24A is a sequence diagram illustrating an operation procedure for exhibit recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.

First, in step S 2413, the exhibit is photographed by the communication terminal 210.

The learning object recognition server 220 receives the local feature amount from the communication terminal 210, and recognizes the exhibit with reference to the local feature amount DB 221 in step S2421. If it is a response such as an exhibit name, the recognition result is transmitted from the learning object recognition server 220 to the communication terminal 210 in step S2423. When the contents of the exhibit are introduced by display or voice, in step S22425, the contents introduction DB 2422 is referred to and content introduction data corresponding to the recognized exhibit is acquired. In step S2227, the learning object recognition server 220 transmits the recognition result and the content introduction data to the communication terminal 210.

Further, when content introduction is acquired from the linked related information providing server 230, in step S2231, the link information DB 223 is referred to, and the link destination corresponding to the recognized exhibit is acquired. In step S2233, the learning target object recognition server 220 accesses the link destination.

In step S2235, the linked related information providing server 230 refers to the content introduction DB 2023 and acquires content introduction data corresponding to the exhibit. In step S2237, the related information providing server 230 transmits the recognition result and the content introduction data to the communication terminal 210.

(Content introduction DB)
FIG. 24B is a diagram showing a configuration of the

content introduction DB

2422 or 2423 according to the present embodiment. The content introduction DB 2422 and the content introduction DB 2423 basically have the same configuration, but considering the storage capacity, the content introduction DB 2423 provides more detailed contents or more items than the content introduction DB 2422. it can. The content introduction DB 2422 or 242 will be described separately from the local feature DB 221, but may be provided integrally with the local feature.

The

content introduction DB

2022 or 2023 stores a name (author, age) 2402, related display data 2403, and related audio data 2404 in association with the exhibit ID 2401.

[Seventh Embodiment]
Next, an information processing system according to a seventh embodiment of the present invention will be described. The information processing system according to the present embodiment is obtained by applying the second embodiment and the third embodiment to mathematical expressions. Since other configurations and operations are the same as those in the second embodiment or the third embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.

According to this embodiment, it is possible to recognize mathematical formulas and learn the computation process and computation results.

<< Operation procedure of information processing system >>
FIG. 25 is a sequence diagram showing an operation procedure of mathematical expression recognition in the information processing system according to the present embodiment. In addition, the same step number is attached | subjected to the operation | movement procedure similar to FIG. 4 of 2nd Embodiment, and description is abbreviate | omitted.

First, in step S2503, the communication terminal 210 captures a mathematical expression. In this example, although represented by a mathematical expression, for example, photographing of a graph image or a straight / curved image may be used.

The learning object recognition server 220 receives the local feature amount from the communication terminal 210, and in step S2511, refers to the local feature amount DB 221 and recognizes a mathematical formula or the like. Next, in step S2513, the formula DB 2522 is referred to, and formula related data including formulas and calculation examples including variables corresponding to the recognized formulas is acquired. In step S <b> 2517, the learning target object recognition server 220 transmits data of mathematical formulas and calculation examples to the communication terminal 210.

The communication terminal 210 receives the data of the mathematical formula and the calculation example from the learning object recognition server 220, and notifies the user of the data of the mathematical formula and the calculation example in step S2519. In step S2519, in communication terminal 210, it is determined whether the user has input a value to a variable in the mathematical expression. If there is no variable input, the process ends.

On the other hand, if there is an input of a variable value, the process proceeds to step S2521, where the variable is substituted into the received mathematical expression to calculate the mathematical expression. In step S2523, the calculation result is displayed. If necessary, the calculation result is transmitted to the learning object recognition server 220 in step S2525.

In the learning object recognition server 220, the calculation result from the communication terminal 210 can be accumulated in step S2527 and used for information collection or the like.

Further, when the mathematical formula and the data of the calculation example are acquired from the related information providing server 230 of the link destination, the link destination corresponding to the recognized mathematical formula or the like is acquired with reference to the link information DB 223 in step S2531. In step S2533, the link destination is accessed from the learning object recognition server 220.

In step S2535, the linked related information providing server 230 refers to the formula DB 2523 and acquires formula related data. In step S2537, the mathematical formula related data is transmitted from the related information providing server 230 to the communication terminal 210.

The communication terminal 210 receives the mathematical expression related data from the related information providing server 230, and notifies the user of the mathematical expression and the data of the calculation example in step S2539. Here, it is assumed that the address of the communication terminal 210 is obtained from the learning object recognition server 220 by the access in step S2533.

25 shows a procedure in which the learning object recognition server 220 automatically accesses the link destination. However, the learning object recognition server 220 returns the link destination to the communication terminal 210, and issues a link instruction in the communication terminal 210. It may be configured to wait.

(Formula DB)
FIG. 26A is a diagram showing a configuration of the

formula DB

2522 or 2523 according to the present embodiment. Note that the formula DB 2522 and the formula DB 2523 have basically the same configuration, but considering the storage capacity, the formula DB 2523 can provide more detailed contents or more items than the formula DB 2522. The mathematical formula DB 2522 will be described separately from the local feature value DB 221, but may be provided integrally with the local feature value.

The

formula DB

2522 or 2523 stores a formula name 2612, formula data 2613 that represents a formula with a symbol, a variable 2614 used in the formula, and a constant 2615 in the formula in association with the formula ID 2611.

(Calculation parameter table)
FIG. 26B is a diagram showing a configuration of a calculation parameter table 2600 according to the present embodiment. Note that the calculation parameter table 2600 is a table created in the RAM of the communication terminal or the server when a calculation is executed by substituting variables or constants into mathematical expressions.

In the calculation parameter table 2600, each variable value 2622 used in the formula, each constant value 2623 used in the formula, each variable value 2622, and the calculation result value 2624 using the constant value 2623 are associated with the formula ID 2621. Remember.

[Eighth Embodiment]
Next, an information processing system according to an eighth embodiment of the present invention will be described. Compared with the second to seventh embodiments, the information processing system according to the present embodiment searches for the learning object and notifies the user if the learning object to be searched is registered. Different. Since other configurations and operations are the same as those of the second to seventh embodiments, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.

According to this embodiment, the user can search for a desired learning object in real time.

<< Display screen example of communication terminal in information processing system >>
27A and 27B are diagrams illustrating examples of display screens of the communication terminal in the information processing system according to the present embodiment.

(Work search)
The upper part of FIG. 27A is a diagram showing an example of searching for a work created by a child from an exhibition place or a placement place.

First, as shown in the left figure, the communication terminal 2710 takes a picture of a work made by a child and obtains a video 2720. A local feature amount is generated based on the video 2720 of this work, and is registered in the local feature amount DB of the communication terminal 2710.

Next, as shown in the right figure, the communication terminal 2710 captures an image 2730 of the exhibition place and place of the work. The communication terminal 2710 generates a local feature amount based on the video 2730. Then, the position of the work created by the child is recognized by comparing the local feature amount of the previously registered work with the local feature value of the image 2730 of the exhibition place and placement place of the work. A comment 2731 such as “I am here” is superimposed on the work at the recognized position as a search result.

As shown in FIG. 27A, for example, even if the work is not in the same direction or the same size as the photograph, the position of the child's work can be located in real time according to the dimension-selected local features.

(Child search)
The lower part of FIG. 27A is a diagram showing an example of finding where a child who participated in a school performance or a concert is.

First, as shown in the left figure, a picture of a child is taken by the communication terminal 2710 to obtain a video 2740. A local feature value is generated based on the video 2740 and registered in the local feature value DB of the communication terminal 2710.

Next, as shown in the right figure, the communication terminal 2710 captures an image 2750 of a school performance or performance. The communication terminal 2710 generates a local feature amount based on the video 2750. Then, the position of the child is recognized by comparing the local feature amount from the previously registered child's photograph with the local feature amount of the video 2750 of the school performance or performance. A comment 2751 such as “I am here” is superimposed and displayed as a search result for the child at the recognized position.

As shown in Fig. 27A, even if it is not the same orientation and size as the picture of the child, or even if there is a difference in clothes or posture, the position of the child is determined in real time according to the dimension-selected local features. Can be located.

(Zoom in on the object)
FIG. 27B is a process in which the process in the lower part of FIG. 27A is further improved.

The left diagram and the central diagram in FIG. 27B are the same as the left diagram and the right diagram in the lower part of FIG. 27A. In FIG. 27B, since the position of the child is found, the communication terminal 2710 zooms in on the position of the child, so that an enlarged image of the child can be acquired.

<Functional configuration of communication terminal>
FIG. 28 is a block diagram showing a functional configuration of a communication terminal 2710 according to this embodiment. In FIG. 28, the same functional components as those in FIG. 6 of the second embodiment are denoted by the same reference numerals, and description thereof is omitted.

The registration / search determination unit 2801 determines whether the video captured by the communication terminal 2710 by the imaging unit 601 is a video registered as a search target in the local feature DB 2821 or a video for searching a search object. The determination by the registration / search determination unit 2801 may be an operation by a user, or may be automatically determined based on an area ratio of an object in the image on the image screen. For example, when a search object is registered, the search object is imaged on the entire screen, so an area ratio equal to or greater than a predetermined threshold is set as the registration screen.

When the local feature amount registration unit 2802 determines that the search object is registered, the local feature amount registration unit 2802 registers the local feature amount generated by the local feature amount generation unit 602 in the local feature amount DB 2821. On the other hand, if it is determined that the search is for a registered search object, the search object recognition unit 2803 generates the local feature value generated by the local feature value generation unit 602 and the local feature value of the search object registered in the local feature value DB 2821. Are matched.

If it collates, with reference to search object DB2822, the search object discovery information alerting | reporting part 2804 will alert | report the search object discovery information corresponding to a search object. In addition, the zoom control unit 2805 controls the imaging unit 601 to zoom in on the position of the search object in order to enlarge the image of the search object.

Note that the configuration of the local feature DB 2821 is the same as the configuration shown in FIG. 8 of the second embodiment except that a new local feature of a search object is registered, and thus the description thereof is omitted. The search object DB 2822 stores information input by the user regarding the search object, and may be provided in the local feature DB 2821 instead of being essential.

<< Processing procedure of communication terminal >>
FIG. 29 is a flowchart showing a processing procedure of the communication terminal according to the present embodiment. This flowchart is also executed by the CPU 1210 of FIG. 12A using the RAM 1240, and implements the functional configuration unit of FIG. In FIG. 29, the local feature quantity generation processing is the same as that in FIG. 14, and therefore the same step number S1313 is assigned and description thereof is omitted.

In step S2911, it is determined whether or not the search object (the work or the child in FIG. 27A) is registered. Further, in step S2921, it is determined whether or not search object search processing is performed. If it is neither, other processing is executed in step S2941.

If it is a registration process, it will progress to step S2913 and will acquire the image of a search thing. In step S1313, a local feature amount of the acquired search object image is generated. In step S2917, the generated local feature value is associated with the search object and registered in the local feature value DB 2821. At the same time, necessary search object information is registered in the search object DB 2822.

If it is a search process, it will progress to step S2923 and the image | video of the area | region which searches a search thing will be acquired. In step S1313, a local feature amount of the acquired video is generated. Next, in step S2927, whether the local feature amount of the search object matches at least a part of the local feature amount of the video is collated, and the search object is recognized. If the search object is not found, the process returns from step S2929 to step S2923 to acquire another video (actually, the imaging direction / area of the communication terminal 2710 is changed), and the search for the search object is repeated.

If there is a search object, the process advances to step S2931 to determine whether or not to perform zoom processing. Such determination may be set by the user. When performing zoom processing, in step S2933, an enlarged image of the search object zoomed in is acquired. Whether or not there is a zoom process, in step S2935, a comment indicating the presence of the search object is displayed at the position of the search object (see FIG. 27A).

[Ninth Embodiment]
Next, an information processing system according to the ninth embodiment of the present invention will be described. The information processing system according to the present embodiment is different from the first to eighth embodiments in that the communication terminal performs all processes including learning object recognition. Since other configurations and operations are the same as those of the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.

According to the present embodiment, all processing can be performed only by the communication terminal based on the local feature amount in the image in the video.

<Functional configuration of communication terminal>
FIG. 30 is a block diagram illustrating a functional configuration of the communication terminal 3010 according to the present embodiment. In FIG. 30, the same reference numerals are assigned to the same functional components as those in FIG.

The learning target object recognition unit 3003 recognizes the learning target object by collating the local feature value generated by the local feature value generation unit 602 with the local feature value stored in the local feature value DB 3021. Then, the recognition result is notified from the learning target recognition result notification unit 3004. Note that the learning object recognition unit 3003 and the local feature DB 3021 are obtained by arranging the functional components included in the common learning object recognition server 220 in the communication terminal 3010, and the functions thereof are the same. Omitted. The learning target recognition result notification unit 3004 also shows the processing including the display screen generation unit 606 and the voice generation unit 608 in FIG. 6 based on the notification information, and the processing is the same, so the description is omitted. To do.

The related information acquisition unit 3005 acquires related information from the related information DB 3022 corresponding to the recognized learning object. Also, the related information notification unit 3006 notifies the user of related information. The link information acquisition unit 3007 acquires link information from the link information DB 3023 corresponding to the recognized learning object. Moreover, the link information alerting | reporting part 3008 alert | reports link information to a user. These functional components are also configured by arranging the functional components included in the learning object recognition server 220 in the communication terminal 3010, and the functions thereof are the same, and the description thereof will be omitted.

The link destination access unit 3009 accesses the link destination related information providing server 230 using the acquired link information.

[Other Embodiments]
Although the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention. In addition, a system or an apparatus in which different features included in each embodiment are combined in any way is also included in the scope of the present invention.

Further, the present invention may be applied to a system composed of a plurality of devices, or may be applied to a single device. Furthermore, the present invention can also be applied to a case where a control program that realizes the functions of the embodiments is supplied directly or remotely to a system or apparatus. Therefore, in order to realize the functions of the present invention with a computer, a control program installed in the computer, a medium storing the control program, and a WWW (World Wide Web) server that downloads the control program are also included in the scope of the present invention. include.

This application claims priority based on Japanese Patent Application No. 2012-017385 filed on January 30, 2012, the entire disclosure of which is incorporated herein.

Some or all of this embodiment can be described as in the following supplementary notes, but is not limited to the following.

(Appendix 1)
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. First local feature quantity storage means for storing the local feature quantity in association with each other;
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. Second local feature quantity generating means for generating the second local feature quantity of
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Learning object recognition means for recognizing that the learning object exists in
An information processing system comprising:
(Appendix 2)
The first local feature amount storage means stores the m first local feature amounts generated from the images of the learning objects in association with the plurality of learning objects, respectively.
The information processing system according to appendix 1, wherein the learning object recognition unit recognizes a plurality of learning objects included in the image captured by the imaging unit.
(Appendix 3)
The information processing system according to

appendix

1 or 2, further comprising notification means for notifying a recognition result of the learning object recognition means.
(Appendix 4)
The information processing system according to supplementary note 3, wherein the notification unit further notifies information related to the recognition result.
(Appendix 5)
The information processing system according to

appendix

3 or 4, wherein the notifying unit further notifies link information for acquiring information related to the recognition result.
(Appendix 6)
The notification means includes related information acquisition means for acquiring information related to the recognition result according to link information,
The information processing system according to appendix 3, wherein the related information acquired according to link information is notified.
(Appendix 7)
The first local feature quantity storage means further comprises a registration means for registering a local feature quantity of the learning object to be searched,
The information processing system according to any one of supplementary notes 3 to 6, wherein the notification means notifies the learning object recognized by the learning object recognition means as a search result.
(Appendix 8)
The learning object is a learning object including characters,
The information processing system according to any one of supplementary notes 3 to 6, wherein the notification means notifies the contents of the learning object.
(Appendix 9)
The learning object is a learning object related to sound,
The information processing system according to any one of appendices 3 to 6, wherein the notification unit notifies the contents of the learning object by playing a sound.
(Appendix 10)
The learning object is an exhibit,
The information processing system according to any one of supplementary notes 3 to 6, wherein the notification means notifies the description of the learning object.
(Appendix 11)
The learning object is a learning object including a mathematical formula,
The information processing system according to any one of appendices 3 to 6, wherein the notification unit calculates a mathematical expression of the learning object and notifies a calculation result.
(Appendix 12)
The first local feature amount and the second local feature amount are a plurality of dimensions formed by dividing a local region including a feature point extracted from an image into a plurality of sub-regions, and comprising histograms of gradient directions in the plurality of sub-regions. The information processing system according to any one of appendices 1 to 11, wherein the information processing system is generated by generating a feature vector of
(Appendix 13)
The first local feature quantity and the second local feature quantity are generated by selecting a dimension having a larger correlation between adjacent sub-regions from the generated feature vectors of a plurality of dimensions. The information processing system according to attachment 12.
(Appendix 14)
The plurality of dimensions of the feature vector is a predetermined dimension so that it can be selected in order from the dimension that contributes to the feature of the feature point and from the first dimension in accordance with the improvement in accuracy required for the local feature amount. 14. The information processing system according to

appendix

12 or 13, wherein the local region is arranged so as to make a round for each number.
(Appendix 15)
The second local feature quantity generation means corresponds to the correlation of the learning target object, and the second local feature quantity having a smaller number of dimensions for the learning target object having a lower correlation with another learning target object. The information processing system according to attachment 14, wherein the information processing system is generated.
(Appendix 16)
The first local feature quantity storage means corresponds to the correlation of the learning object, and the first local feature quantity having a smaller number of dimensions for the learning object having a lower correlation with another learning object. 16. The information processing system according to

appendix

14 or 15, wherein the information processing system is stored.
(Appendix 17)
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. An information processing method using an information processing system including a first local feature amount storage unit that stores a local feature amount in association with each other,
N feature points are extracted from an image in the captured video, and n second regions each consisting of a feature vector from one dimension to j dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a local feature;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
An information processing method comprising:
(Appendix 18)
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. Second local feature quantity generating means for generating the second local feature quantity of
First transmitting means for transmitting the m second local feature amounts to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature amounts;
First receiving means for receiving information indicating a learning object included in the captured image from the information processing apparatus;
A communication terminal comprising:
(Appendix 19)
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. A second local feature generation step of generating the second local feature of
A first transmission step of transmitting the m second local feature quantities to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature quantities;
A first receiving step of receiving information indicating a learning object included in the captured image from the information processing apparatus;
A control method for a communication terminal, comprising:
(Appendix 20)
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. A second local feature generation step of generating the second local feature of
A first transmission step of transmitting the m second local feature quantities to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature quantities;
A first receiving step of receiving information indicating a learning object included in the captured image from the information processing apparatus;
A program for controlling a communication terminal, which causes a computer to execute.
(Appendix 21)
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. First local feature quantity storage means for storing the local feature quantity in association with each other;
N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. Second receiving means for receiving the second local feature amount from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing means for recognizing that the learning object exists in
Second transmission means for transmitting information indicating the recognized learning object to the communication terminal;
An information processing apparatus comprising:
(Appendix 22)
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. A control method for an information processing apparatus including first local feature storage means for storing a local feature in association with each other,
N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. A second receiving step of receiving the second local feature amount of the communication terminal from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
A second transmission step of transmitting information indicating the recognized learning object to the communication terminal;
A method for controlling an information processing apparatus, comprising:
(Appendix 23)
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. A control program for an information processing apparatus including a first local feature storage unit that stores a local feature in association with each other,
N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. A second receiving step of receiving the second local feature amount of the communication terminal from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
A second transmission step of transmitting information indicating the recognized learning object to the communication terminal;
A computer-readable storage medium storing a control program for an information processing apparatus, characterized by causing a computer to execute.

Claims

Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. First local feature quantity storage means for storing the local feature quantity in association with each other;
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. Second local feature quantity generating means for generating the second local feature quantity of
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Learning object recognition means for recognizing that the learning object exists in
An information processing system comprising:
The first local feature amount storage means stores the m first local feature amounts generated from the images of the learning objects in association with the plurality of learning objects, respectively.
The information processing system according to claim 1, wherein the learning object recognition unit recognizes a plurality of learning objects included in the image captured by the imaging unit.
The information processing system according to claim 1, further comprising notification means for notifying a recognition result of the learning object recognition means.
4. The information processing system according to claim 3, wherein the notifying unit further notifies information related to the recognition result.
The information processing system according to claim 3 or 4, wherein the notification means further notifies link information for acquiring information related to the recognition result.
The notification means includes related information acquisition means for acquiring information related to the recognition result according to link information,
The information processing system according to claim 3, wherein the related information acquired according to link information is notified.
The first local feature quantity storage means further comprises a registration means for registering a local feature quantity of the learning object to be searched,
The information processing system according to claim 3, wherein the notification unit notifies the learning target recognized by the learning target recognition unit as a search result.
The learning object is a learning object including characters,
The information processing system according to any one of claims 3 to 6, wherein the notification unit notifies the content of the learning object.
The learning object is a learning object related to sound,
The information processing system according to any one of claims 3 to 6, wherein the notification unit notifies the contents of the learning object by playing a sound.
The learning object is an exhibit,
The information processing system according to claim 3, wherein the notification unit notifies the description of the learning object.
The learning object is a learning object including a mathematical formula,
The information processing system according to claim 3, wherein the notification unit calculates a mathematical expression of the learning object and notifies a calculation result.
The first local feature amount and the second local feature amount are a plurality of dimensions formed by dividing a local region including a feature point extracted from an image into a plurality of sub-regions, and comprising histograms of gradient directions in the plurality of sub-regions. The information processing system according to claim 1, wherein the information processing system is generated by generating a feature vector.
The first local feature quantity and the second local feature quantity are generated by selecting a dimension having a larger correlation between adjacent sub-regions from the generated feature vectors of a plurality of dimensions. The information processing system according to claim 12.
The plurality of dimensions of the feature vector is a predetermined dimension so that it can be selected in order from the dimension that contributes to the feature of the feature point and from the first dimension in accordance with the improvement in accuracy required for the local feature amount. The information processing system according to claim 12 or 13, wherein the local area is arranged so as to make a round for each number.
The second local feature quantity generation means corresponds to the correlation of the learning target object, and the second local feature quantity having a smaller number of dimensions for the learning target object having a lower correlation with another learning target object. The information processing system according to claim 14, wherein the information processing system is generated.
The first local feature quantity storage means corresponds to the correlation of the learning object, and the first local feature quantity having a smaller number of dimensions for the learning object having a lower correlation with another learning object. 16. The information processing system according to claim 14, wherein the information processing system is stored.
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the learning object image. An information processing method using an information processing system including a first local feature amount storage unit that stores a local feature amount in association with each other,
N feature points are extracted from an image in the captured video, and n second regions each consisting of a feature vector from one dimension to j dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a local feature;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
An information processing method comprising:
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. Second local feature quantity generating means for generating the second local feature quantity of
First transmitting means for transmitting the m second local feature amounts to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature amounts;
First receiving means for receiving information indicating a learning object included in the captured image from the information processing apparatus;
A communication terminal comprising:
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. A second local feature generation step of generating the second local feature of
A first transmission step of transmitting the m second local feature amounts to an information processing apparatus that recognizes a learning target included in the image captured based on the comparison of local feature amounts;
A first receiving step of receiving information indicating a learning object included in the captured image from the information processing apparatus;
A control method for a communication terminal, comprising:
N feature points are extracted from the image captured by the imaging means, and n local feature regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. A second local feature generation step of generating the second local feature of
A first transmission step of transmitting the m second local feature quantities to an information processing apparatus that recognizes a learning object included in the image captured based on a comparison of local feature quantities;
A first receiving step of receiving information indicating a learning object included in the captured image from the information processing apparatus;
A program for controlling a communication terminal, which causes a computer to execute.
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. First local feature quantity storage means for storing the local feature quantity in association with each other;
N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. Second receiving means for receiving the second local feature amount from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing means for recognizing that the learning object exists in
Second transmission means for transmitting information indicating the recognized learning object to the communication terminal;
An information processing apparatus comprising:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. A control method for an information processing apparatus including first local feature storage means for storing a local feature in association with each other,
N feature points are extracted from an image captured by the communication terminal, and n local regions each including the n feature points are each composed of feature vectors of 1 to j dimensions. A second receiving step of receiving the second local feature amount of the communication terminal from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
A second transmission step of transmitting information indicating the recognized learning object to the communication terminal;
A method for controlling an information processing apparatus, comprising:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of the learning object and m local regions including each of the m feature points of the image of the learning object. A control program for an information processing apparatus including a first local feature storage unit that stores a local feature in association with each other,
N feature points are extracted from an image in the video captured by the communication terminal, and n feature regions of 1 to j dimensions are respectively obtained for n local regions including the n feature points. A second receiving step of receiving the second local feature amount of the communication terminal from the communication terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and the feature vector includes up to the selected dimension number. When it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts including feature vectors up to the selected number of dimensions, the image in the video Recognizing that the learning object exists in
A second transmission step of transmitting information indicating the recognized learning object to the communication terminal;
A computer-readable storage medium storing a control program for an information processing apparatus, characterized by causing a computer to execute.