WO2013089004A1 - Video processing system, video processing method, video processing device for portable terminal or for server and method for controlling and program for controlling same - Google Patents

Video processing system, video processing method, video processing device for portable terminal or for server and method for controlling and program for controlling same Download PDF

Info

Publication number
WO2013089004A1
WO2013089004A1 PCT/JP2012/081541 JP2012081541W WO2013089004A1 WO 2013089004 A1 WO2013089004 A1 WO 2013089004A1 JP 2012081541 W JP2012081541 W JP 2012081541W WO 2013089004 A1 WO2013089004 A1 WO 2013089004A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
local
local feature
video processing
recognition
Prior art date
Application number
PCT/JP2012/081541
Other languages
French (fr)
Japanese (ja)
Inventor
野村 俊之
山田 昭雄
岩元 浩太
亮太 間瀬
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Publication of WO2013089004A1 publication Critical patent/WO2013089004A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Definitions

  • the present invention relates to a technique for accurately identifying an object existing in a video in real time.
  • Patent Document 1 describes a technique in which the recognition speed is improved by clustering feature amounts when a query image is recognized using a model dictionary generated in advance from a model image. Yes.
  • the technique described in the above document is an invention aimed at improving the recognition speed, and in real time, recognizes the recognition target object in the query image in the video while considering the tradeoff between the recognition accuracy and the recognition speed. It does not make it possible to recognize.
  • An object of the present invention is to provide a technique for solving the above-described problems.
  • an apparatus provides: Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object.
  • First local feature quantity storage means for storing the local feature quantity in association with each other; N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points.
  • Second local feature generating means for generating a quantity; Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected.
  • the image in the video corresponds to the image.
  • a recognition means for recognizing that a recognition object exists; Precision adjusting means for controlling to adjust the precision of the n second local feature values generated by the second local feature value generating means; It is characterized by providing.
  • the method according to the present invention comprises: Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object.
  • a method for controlling a video processing apparatus including a first local feature amount storage unit that stores a local feature amount in association with each other, N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points.
  • a second local feature generation step for generating a quantity; Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected.
  • the image in the video corresponds to the image.
  • a recognition step for recognizing that a recognition object exists;
  • An accuracy adjustment step for controlling to adjust the accuracy of the n second local feature values generated by the second local feature value generation step; It is characterized by including.
  • a program provides: Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object.
  • a control program for a video processing apparatus including a first local feature amount storage unit that stores a local feature amount in association with each other, N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points.
  • a second local feature generation step for generating a quantity; Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected.
  • the image in the video corresponds to the image.
  • a recognition step for recognizing that a recognition object exists; An accuracy adjustment step for controlling to adjust the accuracy of the n second local feature values generated by the second local feature value generation step; Is executed by a computer.
  • a system having a video processing device for a mobile terminal and a video processing device for a server connected via a network,
  • Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object.
  • First local feature quantity storage means for storing the local feature quantity in association with each other; N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points.
  • Second local feature generating means for generating a quantity; Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected.
  • the image in the video corresponds to the image.
  • a recognition means for recognizing that a recognition object exists; Precision adjusting means for controlling to adjust the precision of the n second local feature values generated by the second local feature value generating means; It is characterized by providing.
  • the method according to the present invention comprises: A mobile terminal video processing apparatus and a server video processing apparatus connected via a network, each of which includes a recognition target object and m feature points of an image of the recognition target object.
  • Video processing system including first local feature storage means for storing m first local feature amounts each including feature vectors from one dimension to i dimension generated for each local region in association with each other
  • a video processing method in N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points.
  • a second local feature generation step for generating a quantity Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected.
  • the image in the video corresponds to the image.
  • the present invention it is possible to recognize a recognition object in a query image in a video in real time while dynamically adjusting the accuracy of the local feature amount.
  • a video processing apparatus 100 as a first embodiment of the present invention will be described with reference to FIG.
  • the video processing device 100 is a device that recognizes a recognition object in an image in a video in real time while maintaining recognition accuracy.
  • the video processing apparatus 100 includes a first local feature quantity storage unit 110, a second local feature quantity generation unit 120, a recognition unit 130, and an accuracy adjustment unit 140.
  • the first local feature quantity storage unit 110 generates each of the recognition target object 111 and the m local regions including the m feature points of the recognition target object image from one dimension to i dimension, respectively. Are stored in association with the m first local feature quantities 112 made up of the feature vectors.
  • the second local feature quantity generation unit 120 extracts n feature points 121 from the image 101 in the video, and each of the n local regions 122 including each of the n feature points from 1 dimension to j dimension. N second local feature values 123 of the feature vectors are generated.
  • the recognizing unit 130 selects a smaller number of dimensions from the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity.
  • the recognizing unit 130 has a predetermined ratio or more of m first local feature quantities 112 consisting of feature vectors up to the selected number of dimensions in n second local feature quantities 123 consisting of feature vectors up to the selected number of dimensions. It is determined whether or not it corresponds. When it is determined that the recognition unit 130 corresponds, the recognition unit 130 recognizes that the recognition target object 111 exists in the image 101 in the video.
  • the accuracy adjustment unit 140 controls to adjust the accuracy of the n second local feature values 123 generated by the second local feature value generation unit 120 (101a ⁇ 101b).
  • the present embodiment it is possible to recognize the recognition target object in the query image in the video in real time while dynamically adjusting the accuracy of the local feature amount.
  • the present embodiment it is possible to recognize the recognition target object in the query image in the video in real time while maintaining the reliability while dynamically adjusting the accuracy of the local feature amount.
  • FIG. 2 is a block diagram showing a functional configuration of the video processing apparatus 200 according to the present embodiment.
  • the video processing apparatus 200 includes an imaging unit 210 that acquires video.
  • the captured video is displayed on the display unit 280 and input to the local feature value generation unit 220.
  • the local feature value generation unit 220 generates a local feature value from the captured video (refer to FIG. 4A for details).
  • the local feature DB 230 stores local feature values generated in advance from individual recognition objects by the same algorithm as the local feature value generation unit 220 in association with the recognition objects.
  • the contents of the local feature DB 230 may be received from the outside such as a server.
  • the collation unit 240 collates whether there is data corresponding to the local feature quantity stored in the local feature quantity DB 230 in the local feature quantity generated by the local feature quantity generation unit 220 from the captured video. . If there is corresponding data, it is determined that there is a recognition target in the captured video. Note that the fact that local feature amounts correspond does not only mean that there are the same local feature amounts, but also determines whether or not the same order and arrangement can be acquired from the same object (see FIG. 4G).
  • the accuracy adjustment unit 250 receives the collation result of the collation unit 240, evaluates the reliability of the collation result in the reliability evaluation unit 260, and performs accuracy adjustment in the local feature amount generation unit 220 (FIGS. 5A to 5C, FIG. 6). And FIG. 7).
  • the verification result generation unit 270 generates data to be displayed on the display unit 280 from the verification result of the verification unit 240. Such data includes data such as the names of recognition objects, related information, and recognition errors.
  • the display unit 280 displays the collation result superimposed on the video imaged by the imaging unit 210. Note that the data generated by the verification result generation unit 270 may be transmitted to the outside such as a server.
  • the operation unit 290 includes keys and a touch panel of the video processing device 200, and operates an operation of the video processing device 200 such as the imaging unit 210.
  • the video processing apparatus 200 of the present embodiment is not limited to the video being captured, and can be applied to a video being played back or a video being broadcast.
  • the imaging unit 210 may be replaced with a video reproduction unit or a video reception unit.
  • FIG. 3 is a sequence diagram showing an operation procedure of the video processing apparatus 200 according to the present embodiment.
  • step S ⁇ b> 301 an image captured by the imaging unit 210 is transferred to the local feature amount generation unit 220.
  • the local feature quantity generator 220 sets an initial accuracy adjustment parameter that determines the initial accuracy.
  • the accuracy adjustment parameters include the number of feature points, the number of dimensions, the size and shape of the local region, the number of sub-region divisions, the number of feature vector directions, and the like (see FIG. 10).
  • step S305 the local feature generating unit 220 generates a local feature using the initial accuracy parameter. Then, the generated local feature amount is transferred to the matching unit 240.
  • the collation unit 240 collates the local feature amount generated by the local feature amount generation unit 220 with the local feature amount stored in the local feature amount DB 230 in association with the recognition target object in advance. Then, if there is a matching local feature amount equal to or greater than a predetermined ratio, and the relationship between the feature point coordinates of the feature points with which the local feature amount matches has a linear relationship, the recognition target object is included in the captured image. It is determined that it exists. When it is determined that the recognition target is present in the captured video, the verification unit 240 notifies the accuracy adjustment unit 250 of the verification result value used for the determination in step S311.
  • the matching result value includes the number of matched feature points, the ratio in the whole, the degree of match of each feature point, the degree of match of important feature points, and the like (see FIG. 11).
  • the reliability evaluation unit 260 of the accuracy adjustment unit 250 evaluates the verification result value received from the verification unit 240 and outputs the reliability of the verification result (see FIG. 11). In step S313, it is determined whether or not the reliability exceeds a predetermined threshold Th. If the reliability exceeds a predetermined threshold value Th, the collation process ends and a recognition result is output. On the other hand, if the reliability does not exceed the predetermined threshold Th, in step S315, the accuracy adjustment unit 250 sets the accuracy adjustment parameter in the local feature value generation unit 220.
  • the local feature generation unit 220 acquires the accuracy adjustment parameter in step S317, returns to step S305, performs local feature generation again, and repeats the matching process.
  • FIG. 4A is a block diagram illustrating a configuration of the local feature value generation unit 220 according to the present embodiment.
  • the local feature quantity generation unit 220 includes a feature point detection unit 411, a local region acquisition unit 412, a sub-region division unit 413, a sub-region feature vector generation unit 414, and a dimension selection unit 415.
  • the feature point detection unit 411 detects a number of characteristic points (feature points) from the image data, and outputs the coordinate position, scale (size), and angle of each feature point.
  • the local region acquisition unit 412 acquires a local region where feature amount extraction is performed from the coordinate value, scale, and angle of each detected feature point.
  • the sub area dividing unit 413 divides the local area into sub areas.
  • the sub-region dividing unit 413 can divide the local region into 16 blocks (4 ⁇ 4 blocks) or divide the local region into 25 blocks (5 ⁇ 5 blocks).
  • the number of divisions is not limited. In the present embodiment, the case where the local area is divided into 25 blocks (5 ⁇ 5 blocks) will be described below as a representative.
  • the sub-region feature vector generation unit 414 generates a feature vector for each sub-region of the local region.
  • a gradient direction histogram can be used as the feature vector of the sub-region.
  • the dimension selection unit 415 selects a dimension to be output as a local feature amount based on the positional relationship between the sub-regions so that the correlation between the feature vectors of adjacent sub-regions becomes low (for example, deletes or thins out the dimensions). ).
  • the dimension selection unit 415 can not only select a dimension but also determine a selection priority. That is, for example, the dimension selection unit 415 can select dimensions with priorities so that dimensions in the same gradient direction are not selected between adjacent sub-regions. Then, the dimension selection unit 415 outputs a feature vector composed of the selected dimensions as a local feature amount. Note that the dimension selection unit 415 can output the local feature amount in a state where the dimensions are rearranged based on the priority order.
  • 4B to 4F are diagrams illustrating processing of the local feature value generation unit 220 according to the present embodiment.
  • FIG. 4B is a diagram showing a series of processes of feature point detection / local area acquisition / sub-area division / feature vector generation in the local feature quantity generation unit 220.
  • Such a series of processes is described in US Pat. No. 6,711,293, David G. Lowe, “Distinctive image features from scale-invariant key points” (USA), International Journal of Computer Vision, 60 (2), 2004. Year, p. 91-110.
  • (Feature point detector) 421 in FIG. 4B is a diagram illustrating a state in which feature points are detected from an image in the video in the feature point detection unit 411 in FIG. 4A.
  • generation of a local feature amount will be described by using one feature point 421a as a representative.
  • the starting point of the arrow of the feature point 421a indicates the coordinate position of the feature point
  • the length of the arrow indicates the scale (size)
  • the direction of the arrow indicates the angle.
  • the scale (size) and direction brightness, saturation, hue, and the like can be selected according to the target image.
  • FIG. 4B the case of six directions at intervals of 60 degrees is described, but the present invention is not limited to this.
  • the local region acquisition unit 412 in FIG. 4A generates a Gaussian window 422a around the starting point of the feature point 421a, and generates a local region 422 that substantially includes the Gaussian window 422a.
  • the local region acquisition unit 412 generates the square local region 422, but the local region may be circular or have another shape. This local region is acquired for each feature point.
  • the gradient direction is not only quantized to 8 directions, but may be quantized to an arbitrary quantization number such as 4 directions, 8 directions, and 10 directions.
  • the sub-region feature vector generation unit 414 may add up the magnitudes of the gradients instead of adding up the simple frequencies. Further, when the sub-region feature vector generation unit 414 aggregates the gradient histogram, the sub-region feature vector generation unit 414 assigns weight values not only to the sub-region to which the pixel belongs, but also to sub-regions adjacent to each other (such as adjacent blocks) according to the distance between the sub-regions.
  • the sub-region feature vector generation unit 414 may add weight values to gradient directions before and after the quantized gradient direction.
  • the feature vector of the sub-region is not limited to the gradient direction histogram, and may be any one having a plurality of dimensions (elements) such as color information. In the present embodiment, it is assumed that a gradient direction histogram is used as the feature vector of the sub-region.
  • the dimension selection unit 415 selects (decimates) a dimension (element) to be output as a local feature amount based on the positional relationship between the sub-regions so that the correlation between feature vectors of adjacent sub-regions becomes low. More specifically, for example, the dimension selection unit 415 selects dimensions so that at least one gradient direction differs between adjacent sub-regions.
  • the dimension selection unit 415 mainly uses adjacent sub-regions as adjacent sub-regions. However, the adjacent sub-regions are not limited to adjacent sub-regions, for example, from the target sub-region. A sub-region within a predetermined distance may be a nearby sub-region.
  • FIG. 4C shows an example of selecting a dimension from a feature vector 431 of a 150-dimensional gradient histogram generated by dividing a local region into 5 ⁇ 5 block sub-regions and quantizing gradient directions into six directions 431a.
  • FIG. 4C is a diagram showing a state of the feature vector dimension number selection processing in the local feature quantity generation unit 220.
  • the dimension selection unit 415 selects the left, right, upper and lower subregion blocks.
  • the dimension can be selected so that the same gradient direction dimension is not selected.
  • the dimension selection unit 415 selects the feature vector 433 of the 50-dimensional gradient histogram from the feature vector 432 of the 75-dimensional gradient histogram, only one direction exists between the sub-region blocks located at an angle of 45 degrees.
  • the dimensions can be selected to be identical (the remaining one direction is different).
  • the dimension selection unit 415 selects the feature vector 434 of the 25-dimensional gradient histogram from the feature vector 433 of the 50-dimensional gradient histogram, the gradient direction selected between the sub-region blocks located at an angle of 45 degrees. Dimension can be selected so that does not match.
  • the dimension selection unit 415 selects one gradient direction from each sub-region from the first dimension to the 25th dimension, selects two gradient directions from the 26th dimension to the 50th dimension, and starts from the 51st dimension. Three gradient directions are selected up to 75 dimensions.
  • the gradient directions should not be overlapped between adjacent sub-area blocks and that all gradient directions should be selected uniformly.
  • FIG. 4D is a diagram illustrating an example of the selection order of feature vectors from sub-regions in the local feature value generation unit 220.
  • the dimension selection unit 415 can determine the priority of selection so as to select not only the dimension but also the dimension that contributes to the feature point feature in order. That is, the dimension selection unit 415 can select dimensions with priorities so that, for example, dimensions in the same gradient direction are not selected between adjacent sub-area blocks. Then, the dimension selection unit 415 outputs a feature vector composed of the selected dimensions as a local feature amount. Note that the dimension selection unit 415 can output the local feature amount in a state where the dimensions are rearranged based on the priority order.
  • the dimension selection unit 415 selects between 1 to 25 dimensions, 26 dimensions to 50 dimensions, and 51 dimensions to 75 dimensions, for example, by adding dimensions in the order of the sub-region blocks as shown at 441 in FIG. 4D. You may do it.
  • the dimension selection unit 415 can select the gradient direction by increasing the priority order of the sub-region blocks close to the center.
  • 4E in FIG. 4E is a diagram illustrating an example of element numbers of 150-dimensional feature vectors in accordance with the selection order in FIG. 4D.
  • the element number of the feature vector is 6 ⁇ p + q.
  • 461 in FIG. 4F is a diagram showing that the 150-dimensional order according to the selection order in FIG. 4E is hierarchized in units of 25 dimensions. That is, 461 in FIG. 4F is a diagram showing a configuration example of local feature amounts obtained by selecting the elements shown in FIG. 4E according to the priority order shown in 441 in FIG. 4D.
  • the dimension selection unit 415 can output dimension elements in the order shown in FIG. 4F. Specifically, for example, when outputting a 150-dimensional local feature amount, the dimension selection unit 415 can output all 150-dimensional elements in the order shown in FIG. 4F.
  • the dimension selection unit 415 When the dimension selection unit 415 outputs, for example, a 25-dimensional local feature amount, the first line (76th, 45th, 83rd,..., 120th) element 471 shown in FIG. 4F is shown in FIG. 4F. Can be output in order (from left to right). For example, when outputting a 50-dimensional local feature amount, the dimension selecting unit 415 adds the element 472 in the second row shown in FIG. 4F in the order shown in FIG. To the right).
  • the local feature amount has a hierarchical structure. That is, for example, in the 25-dimensional local feature value and the 150-dimensional local feature value, the arrangement of the elements 471 to 476 in the first 25-dimensional local feature value is the same.
  • the dimension selection unit 415 selects a dimension hierarchically (progressively), and thereby, depending on the application, communication capacity, terminal specification, etc., the local feature quantity of an arbitrary number of dimensions, that is, the local size of an arbitrary size. Feature quantities can be extracted and output. Further, the dimension selection unit 415 can hierarchically select dimensions, rearrange the dimensions based on the priority order, and output them, thereby performing image matching using local feature amounts of different dimensions. . For example, when images are collated using a 75-dimensional local feature value and a 50-dimensional local feature value, the distance between the local feature values can be calculated by using only the first 50 dimensions.
  • the priorities shown in FIG. 4D from 441 to FIG. 4F are examples, and the order of selecting dimensions is not limited to this.
  • the order as shown in 442 in FIG. 4D and 443 in FIG. 4D may be used.
  • the priority order may be set so that dimensions are selected from all the sub-regions.
  • the vicinity of the center of the local region may be important, and the priority order may be determined so that the selection frequency of the sub-region near the center is increased.
  • the information indicating the dimension selection order may be defined in the program, for example, or may be stored in a table or the like (selection order storage unit) referred to when the program is executed.
  • the dimension selection unit 415 may select a dimension by skipping one sub-area block. That is, 6 dimensions are selected in a certain sub-region, and 0 dimensions are selected in other sub-regions close to the sub-region. Even in such a case, it can be said that the dimension is selected for each sub-region so that the correlation between adjacent sub-regions becomes low.
  • the shape of the local region and sub-region is not limited to a square, and can be any shape.
  • the local region acquisition unit 412 may acquire a circular local region.
  • the sub-region dividing unit 413 can divide the circular local region into, for example, nine or seventeen sub-regions into concentric circles having a plurality of local regions.
  • the dimension selection unit 415 can select a dimension in each sub-region.
  • the dimensions of the feature vectors generated while maintaining the information amount of the local feature values are hierarchically selected.
  • the This processing enables real-time object recognition and recognition result display while maintaining recognition accuracy.
  • the configuration and processing of the local feature value generation unit 220 are not limited to this example. Naturally, other processes that enable real-time object recognition and recognition result display while maintaining recognition accuracy can be applied.
  • FIG. 4G is a diagram illustrating processing of the collation unit 240 according to the present embodiment.
  • FIG. 4G shows an example in which the tower that is the recognition target is collated with three stages of accuracy, but the number of stages of accuracy is not limited to this.
  • FIG. 4G shows a video processing apparatus 200 as a mobile terminal.
  • the video currently being captured by the imaging unit 210 is displayed in the video display area 481.
  • a plurality of instruction buttons are displayed in the instruction button display area 482.
  • FIG. 4G shows the data in the local feature DB 230 for collation with three levels of accuracy from top to bottom as a schematic diagram.
  • black circles indicate feature points and their local regions.
  • the processing time of the collation processing can be shortened, but the accuracy is low.
  • the number of feature points increases and the accuracy of the collation processing increases, but the processing time increases.
  • the number of feature points is further increased, and the accuracy of the collation processing is higher, but the processing time is longer.
  • the collation unit 240 associates the local feature amounts 483 to 485 stored in the local feature amount DB 230 with the feature points whose local feature amounts match like a thin line. Then, the matching unit 240 determines that the feature points match when a predetermined ratio or more of the local feature amounts match. And the collation part 240 will recognize that it is a recognition target object, if the positional relationship between the sets of the associated feature points is a linear relationship. If such recognition is performed, it is possible to recognize by size difference, orientation difference (difference in viewpoint), or inversion. In addition, since recognition accuracy can be obtained if there are a predetermined number or more of associated feature points, recognition objects can be recognized even if a part of them is hidden from view.
  • FIG. 5A is a block diagram showing a first configuration 250-1 of the accuracy adjustment unit 250 according to the present embodiment.
  • the dimension number can be determined by the dimension number determination unit 511.
  • the dimension number determination unit 511 can determine the number of dimensions selected by the dimension selection unit 415. For example, the dimension number determination unit 511 can determine the number of dimensions by receiving information indicating the number of dimensions from the user. Note that the information indicating the number of dimensions does not need to indicate the number of dimensions per se, and may be information indicating, for example, verification accuracy or verification speed. Specifically, for example, when receiving an input requesting that the local feature generation accuracy, communication accuracy, and matching accuracy be increased, the dimension number determination unit 511 sets the dimension number so that the number of dimensions increases. decide. For example, the dimension number determination unit 511 determines the number of dimensions so as to reduce the number of dimensions when receiving an input requesting to increase the local feature generation speed, the communication speed, and the collation speed.
  • the dimension number determining unit 511 may determine the same dimension number for all feature points detected from the image, or may determine a different dimension number for each feature point. For example, when the importance of feature points is given by external information, the dimension number determination unit 511 increases the number of dimensions for feature points with high importance and decreases the number of dimensions for feature points with low importance. Also good. In this way, the number of dimensions can be determined in consideration of the matching accuracy, the local feature generation speed, the communication speed, and the matching speed.
  • FIG. 5B is a block diagram showing a second configuration 250-2 of the accuracy adjustment unit 250 according to the present embodiment.
  • the feature vector expansion unit 512 can change the number of dimensions by collecting values of a plurality of dimensions.
  • the feature vector extending unit 512 can extend the feature vector by generating a dimension in a larger scale (extended divided region) using the feature vector output from the sub-region feature vector generating unit 414. Note that the feature vector extension unit 512 can extend the feature vector using only the feature vector information output from the sub-region feature vector generation unit 414. Therefore, since it is not necessary to return to the original image and perform feature extraction in order to extend the feature vector, the processing time for extending the feature vector is negligible compared to the processing time for generating the feature vector from the original image. It is.
  • the feature vector extending unit 512 may generate a new gradient direction histogram by combining gradient direction histograms of adjacent sub-regions.
  • FIG. 5C is a diagram for explaining processing by the second configuration 250-2 of the accuracy adjustment unit 250 according to the present embodiment.
  • the feature vector extending unit 512 expands a gradient direction histogram 531 of 5 ⁇ 5 ⁇ 6 dimensions (150 dimensions), for example, thereby increasing a gradient direction of 4 ⁇ 4 ⁇ 6 dimensions (96 dimensions).
  • a histogram 541 can be generated. That is, four blocks indicated by a bold line 531a are combined into one block 541a. In addition, four blocks indicated by a broken line 531b are combined into one block 541b.
  • the feature vector extension unit 512 obtains 3 ⁇ 3 ⁇ 6 dimensions by taking the sum of the gradient direction histograms of adjacent 3 ⁇ 3 blocks from the 5 ⁇ 5 ⁇ 6 dimensions (150 dimensions) of the gradient direction histogram 541. It is also possible to generate a (54-dimensional) gradient direction histogram 551. That is, four blocks indicated by a bold line 541c are combined into one block 551c. Also, four blocks indicated by a broken line 541d are combined into one block 551d.
  • the dimension selecting unit 415 selects a 5 ⁇ 5 ⁇ 6 dimensional (150 dimensions) gradient direction histogram 531 as a 5 ⁇ 5 ⁇ 3 dimensional (75 dimensions) gradient direction histogram 532, 4 ⁇ 4.
  • the gradient direction histogram 541 of ⁇ 6 dimensions (96 dimensions) is a gradient direction histogram 542 of 4 ⁇ 4 ⁇ 6 dimensions (96 dimensions).
  • the 3 ⁇ 3 ⁇ 6 dimension (54 dimensions) gradient direction histogram 551 becomes a 3 ⁇ 3 ⁇ 3 dimension (27 dimensions) gradient direction histogram 552.
  • FIG. 6 is a block diagram showing a third configuration 250-3 of the accuracy adjustment unit 250 according to the present embodiment.
  • the feature point selection unit 611 can change the data amount of the local feature amount while maintaining the accuracy by changing the number of feature points by the feature point selection. It is.
  • the feature point selection unit 611 can hold, for example, designated number information indicating the “designated number” of feature points to be selected in advance.
  • the designated number information may be information indicating the designated number itself, or information indicating the total size (for example, the number of bytes) of the local feature amount in the image.
  • the feature point selecting unit 611 divides the total size by the size of the local feature amount at one feature point, for example. Can be calculated. Also, importance can be given to all feature points at random, and feature points can be selected in descending order of importance. Then, when a specified number of feature points are selected, information about the selected feature points can be output as a selection result.
  • only feature points included in a specific scale region can be selected from the scales of all feature points.
  • the feature points can be reduced to the designated number based on the importance, and information on the selected feature points can be output as a selection result.
  • FIG. 7 is a block diagram showing a fourth configuration 250-4 of the accuracy adjustment unit 250 according to the present embodiment.
  • the dimension number determining unit 511 and the feature point selecting unit 611 cooperate to change the data amount of the local feature amount while maintaining the accuracy.
  • the feature point selection unit 611 can select a feature point based on the number of feature points determined by the dimension number determination unit 511. Further, the dimension number determining unit 511 determines the selected dimension number so that the feature amount size becomes the specified feature amount size based on the specified feature amount size selected by the feature point selecting unit 611 and the determined feature point number. Can do. In addition, the feature point selection unit 611 selects feature points based on the feature point information output from the feature point detection unit 411.
  • the feature point selection unit 611 outputs importance level information indicating the importance level of each selected feature point to the dimension number determination unit 511, and the dimension number determination unit 511 based on the importance level information.
  • the number of dimensions selected in (1) can be determined for each feature point.
  • FIG. 8 is a diagram showing a configuration of local feature value generation data 800 according to the present embodiment. These data are stored and held in the RAM 1240 of FIG.
  • a plurality of detected feature points 802, feature point coordinates 803, and local region information 804 corresponding to the feature points are stored in association with the input image ID 801. Then, a plurality of sub-region IDs 805, sub-region information 806, a feature vector 807 corresponding to each sub-region, and a selection dimension 808 including a priority order are associated with each detected feature point 802, feature point coordinates 803, and local region information 804. Is memorized.
  • the local feature quantity 509 generated for each detected feature point 502 from the above data is stored.
  • FIG. 9 is a diagram showing a configuration of the local feature DB 230 according to the present embodiment.
  • the local feature DB 230 stores a first local feature 903, a second local feature 904,..., An mth local feature 905 in association with the recognition object ID 901 and the recognition object name 902.
  • Each local feature quantity stores a feature vector composed of 1-dimensional to 150-dimensional elements hierarchized by 25 dimensions corresponding to the 5 ⁇ 5 sub-region in FIG. 4F.
  • m is a positive integer and may be a different number corresponding to the recognition object.
  • the feature point coordinates used for the matching process are stored together with the respective local feature amounts.
  • FIG. 10 is a diagram showing a configuration of the accuracy adjustment parameter 1000 according to the present embodiment.
  • the accuracy adjustment parameter 1000 stores, as the feature point parameter 1001, a feature point selection threshold for selecting whether or not to use a feature point, a feature point, and the like. Further, as the local region parameter 1002, an area (size) corresponding to a Gaussian window, a shape indicating a rectangle, a circle, or the like is stored. In addition, as the sub region parameter 1003, the number of divisions and the shape of the local region are stored. Further, as the feature vector parameter 1004, the number of directions (for example, 8 directions and 6 directions), the number of dimensions, a dimension selection method, and the like are stored.
  • FIG. 11 is a diagram showing a configuration of the reliability determination table 1100 according to the present embodiment.
  • the reliability determination table 1100 is associated with the recognition object ID 1101 and the recognition object name 1102 of the matching result, and has a feature point count 1103, a feature point matching rate 1104 in the matching process, a feature vector dimension number 1105, and a feature vector average matching rate 1106 , Linear conversion matching rate 1107 and the like are stored. Based on these data, the reliability 1108 of the recognition result is determined.
  • the reliability determination table 1100 may store the relationship between these data and reliability in advance.
  • FIG. 12 is a block diagram showing a hardware configuration of the video processing apparatus 200 according to the present embodiment.
  • a CPU 1210 is a processor for arithmetic control, and implements each functional component of the video processing device 200 that is a portable terminal by executing a program.
  • the ROM 1220 stores fixed data and programs such as initial data and programs.
  • the communication control unit 1230 is a communication control unit, and in the present embodiment, communicates with other devices via a network. Note that the number of CPUs 1210 is not limited to one, and may be a plurality of CPUs or may include a GPU (GraphicsGraphProcessing Unit) for image processing.
  • the RAM 1240 is a random access memory that the CPU 1210 uses as a work area for temporary storage.
  • the RAM 1240 has an area for storing data necessary for realizing the present embodiment.
  • the input video 1241 is an area for storing the input video input by the imaging unit 210.
  • the feature point data 1242 is an area for storing feature point data including feature point coordinates, scales, and angles detected from the input video 1241.
  • the local feature value generation table 800 is an area for storing the local feature value generation table shown in FIG.
  • the accuracy adjustment parameter 1000 is an area for storing the accuracy adjustment parameter shown in FIG.
  • the reliability determination table 1100 is an area for storing the reliability determination table shown in FIG.
  • the matching result 1243 is an area for storing the matching result recognized from the matching between the local feature amount generated from the input video and the local feature amount stored in the local feature amount DB 230.
  • the collation result display data 1244 is an area for storing collation result display data for notifying the user of the collation result 1243. In addition, when outputting a voice, collation result voice data may be included.
  • the input video / collation result superimposition data 1245 is an area for storing input video / collation result superimposition data displayed on the display unit 280 in which the collation result 1243 is superimposed on the input video 1241.
  • the input / output data 1246 is an area for storing input / output data input / output via the input / output interface 1260.
  • Transmission / reception data 1247 is an area for storing transmission / reception data transmitted / received via the communication control unit 1230.
  • the storage 1250 stores a database, various parameters, or the following data or programs necessary for realizing the present embodiment.
  • the local feature DB 230 is an area in which the local feature DB shown in FIG. 9 is stored.
  • the collation result display format 1251 is an area in which a collation result display format used for generating a format for displaying the collation result is stored.
  • the storage 1250 stores the following programs.
  • the mobile terminal control program 1252 is an area in which a mobile terminal control program for controlling the entire video processing apparatus 200 is stored.
  • the local feature value generation module 1253 is an area in which a local feature value generation module that generates a local feature value from an input video according to FIGS. 4B to 4F in the mobile terminal control program 1252 is stored.
  • the collation control module 1254 is an area in which the collation control module that collates the local feature amount generated from the input video and the local feature amount stored in the local feature amount DB 230 in the portable terminal control program 1252 is stored.
  • the collation result notification module 1255 is an area in which a collation result notification module for notifying the user of the collation result by display or voice in the mobile terminal control program 1252 is stored.
  • the accuracy adjustment module 1256 is an area in which an accuracy adjustment module that adjusts the accuracy of the local feature generation unit 220 based on the collation result is stored in the portable terminal control program 1252.
  • the input / output interface 1260 interfaces input / output data with input / output devices.
  • the input / output interface 1260 is connected to a display unit 280, a touch panel and keyboard as the operation unit 290, a speaker 1264, a microphone 1265, and an imaging unit 210.
  • the input / output device is not limited to the above example.
  • the GPS (Global Positioning System) position generation unit 1266 acquires the current position based on a signal from a GPS satellite.
  • FIG. 13 is a flowchart showing a processing procedure of the video processing apparatus 200 according to the present embodiment. This flowchart is executed by the CPU 1210 in FIG. 12 using the RAM 1240, and implements each functional component in FIG.
  • step S1311 it is determined whether or not there is a video input for performing object recognition. As a function of the portable terminal, reception is determined in step S1331, and transmission is determined in step S1331. Otherwise, other processing is performed in step S1341.
  • step S1313 sets initial habit adjustment parameters.
  • step S1315 local feature generation processing is executed from the input video (see FIG. 14A).
  • step S1317 collation processing is executed (see FIG. 14B).
  • step S1319 it is determined whether the reliability of the recognition result exceeds the threshold Th. If the reliability is equal to or less than the threshold Th, the process proceeds to step S1321 to update the accuracy adjustment parameter. Then, the process returns to step S1315, and local features with high accuracy are read using the updated accuracy adjustment parameter, and matching is repeated.
  • step S1325 it is determined whether or not to finish the object recognition process. The end is performed by, for example, a reset button in the instruction button display area 482 of FIG. 4G. If not completed, the process returns to step S1313 to repeat the object recognition for video input.
  • the local feature DB data is received in step S1333 and stored in the local feature DB in step S1335.
  • reception processing is performed in step S1337.
  • the local feature generated from the input video is transmitted as local feature DB data in step S1343.
  • transmission processing is performed in step S1345.
  • the data transmission / reception processing as a portable terminal is not a feature of the present embodiment, and thus detailed description thereof is omitted.
  • FIG. 14A is a flowchart illustrating a processing procedure of local feature generation processing S1315 according to the present embodiment.
  • step S1411 the position coordinates, scale, and angle of the feature points are detected from the input video.
  • step S1413 a local region is acquired for one of the feature points detected in step S1411.
  • step S1415 the local area is divided into sub-areas.
  • step S1417 a feature vector for each sub-region is generated to generate a feature vector for the local region. The processing from step S1411 to S1417 is illustrated in FIG. 4B.
  • step S1419 dimension selection is performed on the feature vector of the local region generated in step S1417.
  • the dimension selection is illustrated in FIGS. 4D to 4F.
  • step S1421 it is determined whether the generation of local features and dimension selection have been completed for all feature points detected in step S1411. If not completed, the process returns to step S1413 to repeat the process for the next one feature point.
  • FIG. 14B is a flowchart showing the processing procedure S1317 of the collation processing according to this embodiment.
  • step S1433 the dimension number j of the local feature amount generated in step S1315 is acquired.
  • step S1435 the data of the dimension number j of the p-th local feature amount of the recognition target stored in the local feature amount DB 230 is acquired. That is, the j dimension is acquired from the first one dimension.
  • step S1437 the p-th local feature value acquired in step S1435 and the local feature values of all feature points generated from the input video are sequentially checked to determine whether or not they are similar.
  • step S1439 it is determined whether or not the similarity exceeds the threshold value ⁇ from the result of matching between the local feature quantities.
  • the local feature quantity matches the input video and the recognition object in step S1441.
  • a combination with the positional relationship of feature points is stored.
  • q which is a parameter for the number of matched feature points, is incremented by one.
  • the feature point of the recognition target object is advanced to the next feature point (p ⁇ p + 1). If all feature points of the recognition target object have not been matched (p ⁇ m), the process returns to step S1435. Repeat matching of matching local features.
  • the threshold value ⁇ can be changed according to the recognition accuracy required by the recognition object. Here, if the recognition object has a low correlation with other recognition objects, accurate recognition is possible even if the recognition accuracy is lowered.
  • step S1445 it is determined whether or not the ratio of the feature point number q that matches the local feature amount of the feature point of the input video among the feature point number p of the recognition target object exceeds the threshold value ⁇ . If exceeded, the process proceeds to step S1449, and it is further determined as a recognition target object whether the positional relationship between the feature point of the input video and the feature point of the recognition target object has a relationship that allows linear transformation. .
  • step S1441 the positional relationship between the feature point of the input image and the feature point of the recognition target that is stored as the local feature amount matches in step S1441 is possible even by a change such as rotation, inversion, or change of the viewpoint position. It is determined whether it is a positional relationship that is impossible or impossible. Since such a determination method is geometrically known, detailed description thereof is omitted. If it is determined in step S1451 that the shape conversion is possible, if linear conversion is possible, the process proceeds to step S1453, where it is determined that the collated recognition target exists in the input video.
  • the threshold value ⁇ can be changed according to the recognition accuracy required by the recognition object.
  • the processing for storing recognition objects in all fields in the local feature DB 230 and collating all the recognition objects with the mobile terminal is very heavy. Therefore, for example, it is conceivable that the user selects a field of an object from a menu before the object recognition from the input video, and searches and collates the field from the local feature amount DB 230. Also, the load can be reduced by downloading only the local feature amount of the field used by the user (for example, building in the example of FIG. 4G) to the local feature amount DB 230.
  • the video processing apparatus according to the present embodiment is different from the second embodiment in that the data amount of the generated local feature amount is adjusted.
  • Other configurations and operations are the same as those of the second embodiment, and thus description of the same configurations and operations is omitted.
  • the present embodiment it is possible to recognize the recognition target object in the query image in the video in real time at a higher speed while dynamically adjusting the data amount of the local feature amount.
  • FIG. 15 is a block diagram illustrating a functional configuration of the video processing device 1500 according to the present embodiment.
  • the only difference between the video processing apparatus 1500 and the second embodiment shown in FIG. 2 is the accuracy adjustment unit 1550, and the other configurations are the same as those shown in FIG. .
  • the accuracy adjustment unit 1550 adjusts the accuracy of the local feature amount by the local feature amount generation unit 220.
  • the accuracy adjustment unit 1550 includes a data amount evaluation unit 1560, and adjusts the accuracy based on the data amount of the local feature amount generated based on the information from the local feature amount generation unit 220.
  • FIG. 16 is a diagram showing a configuration of a data amount evaluation table 1600 according to the present embodiment.
  • the data amount evaluation table 1600 stores a feature point number 1602, a local region size 1603, a sub-region division number 1604, and a feature vector dimension number 1605 in association with the captured video ID 1601.
  • a predicted data amount 1606 is calculated from these numerical values.
  • An accuracy adjustment parameter 1607 is set based on the predicted data amount 1606, and the actual generated data amount 1608 is stored.
  • the accuracy adjustment unit 1550 While using this data amount evaluation table 1600, the accuracy adjustment unit 1550 performs accuracy adjustment based on the data amount evaluation of the data amount evaluation unit 1560.
  • FIG. 17 is a flowchart showing a processing procedure of the video processing device 1500 according to the present embodiment. This flowchart is executed by the CPU 1210 of FIG. 12 using the RAM 1240, and implements each functional component of FIG. Note that the processing procedure of the video processing device 1500 is obtained by replacing the update of the accuracy adjustment parameter based on the data amount with the update of the accuracy adjustment parameter based on the reliability shown in FIG. 13 of the second embodiment. Since the processing of the steps is the same, the same step number is assigned and the description is omitted.
  • step S1719 it is determined whether the amount of data generated by the local feature generation processing is between the threshold values Dh and Dl. If it is not between the threshold values Dh and Dl, the process proceeds to step S1721, the accuracy adjustment parameter is updated, and the process returns to step S1315 to repeat the process. If it is between the threshold values Dh and Dl, the process proceeds to step S1317 to perform collation processing.
  • the video processing apparatus according to the present embodiment is configured to apply a local feature amount accuracy adjustment to select a recognition target from a video and perform collation recognition in more detail. It is different in having.
  • Other configurations and operations are the same as those of the second embodiment or the third embodiment, and thus the description of the same configurations and operations is omitted.
  • the present embodiment it is possible to select a recognition target object in a query image in a video and realize more detailed recognition in real time while dynamically adjusting the accuracy of the local feature amount.
  • FIG. 18 is a diagram for explaining video processing by the video processing apparatus 1800 according to the present embodiment.
  • FIG. 18 illustrates a case where an image of an intersection is obtained by an imaging unit of a video processing device 1800 that is a portable terminal, and a vehicle that is stopped or traveling at the intersection is recognized and notified to the user. Yes.
  • the left display screen 1810 in FIG. 18 has a video display area 1811 for displaying the current intersection state and an instruction button display area 1812.
  • the video display area 1821 of the central display screen 1820 in FIG. 18 is a screen that displays the result of collating and recognizing the local feature amount with an accuracy that can recognize the name of the recognition target object from the video of the video display area 1811. is there.
  • three cars are recognized and notified.
  • a bicycle or a person in the video display area 1811 is recognized if the local feature amount matches, but the following recognition with higher accuracy is limited to a car.
  • the video display area 1831 on the display screen 1830 on the right side of FIG. 18 is a screen on which the result of collation recognition with the accuracy of the local feature amount increased is displayed for the automobile area.
  • Car types such as XX A car-BCD 1832, XX B car 1833, and XX car H car 1834, which have not been recognized in the central video display area 1821, are recognized.
  • a recognition object to be recognized in detail is further selected from a plurality of recognition objects that have been coarsely recognized with low accuracy from the entire captured image, and then highly accurate and highly accurate. This eliminates unnecessary generation and collation of local feature quantities as much as possible, and enables real-time object recognition with high accuracy.
  • FIG. 19 is a block diagram showing a functional configuration of a video processing apparatus 1800 according to this embodiment.
  • the functional configuration of the video processing device 1800 is obtained by replacing the configuration of the local feature DB and the accuracy adjusting unit in FIG. 2 of the second embodiment, and other functional components are the same as those in FIG.
  • the same reference numerals are assigned and the description is omitted.
  • the accuracy adjustment unit 1950 includes an area selection unit 1961 that selects an area to be collated with high accuracy, and a dimension number adjustment unit 1962 that adjusts the number of dimensions of the feature vector in order to adjust the accuracy.
  • Dimension number adjustment unit 1962 limits the number of dimensions in the local feature amount generation of initial local feature amount generation unit 220 so that the recognition target can be recognized from the image captured by imaging unit 210. From the collation result of the collation unit 240, the region sorting unit 1961 sorts the region of the recognition object for which more detailed recognition is desired.
  • Dimension number adjustment unit 1962 increases the number of dimensions and performs detailed collation recognition of only the region selected by region selection unit 1961 instead of the entire video.
  • the local feature DB 1930 stores the local feature quantity so that the dimension number of the local feature quantity and the recognition accuracy of the recognition target object correspond to the above processing.
  • the dimension adjustment of the feature vector is shown as the accuracy adjustment of the accuracy adjusting unit 1950.
  • other accuracy adjustments such as feature point adjustment may be used.
  • FIG. 20 is a diagram for explaining the dimension number adjustment processing according to the present embodiment.
  • FIG. 20 shows a case where the dimension number of the feature vector is adjusted in three stages.
  • the local region of each feature point is represented by a 25-dimensional feature vector (see FIG. 4F).
  • the local region of each feature point is represented by a 50-dimensional feature vector.
  • the local region of each feature point is represented by a 150-dimensional feature vector.
  • the local feature amount of the present embodiment has a hierarchical structure. Therefore, even if the process of changing the number of dimensions is performed, the local feature amount is not generated again from the input video, so that the accuracy can be adjusted in real time.
  • FIG. 21 is a diagram showing a configuration of the local feature DB 1930 according to the present embodiment. In FIG. 21, only the part of the vehicle recognition recognition shown in FIG. 18 is shown, but the processing of this embodiment can be performed by storing other local feature amounts in the same manner.
  • the local feature DB 1930 stores a plurality of vehicle names 2102 to be manufactured in association with the manufacturer 2101. Then, the local feature DB 1930 stores a plurality of molds 2103 in association with each vehicle name 2102. The local feature DB 1930 stores a one-dimensional to 150-dimensional local feature amount in association with the mold 2103.
  • the manufacturer 2101 can be recognized by collation of the 1st to 25th dimensions 2104, and the vehicle name 2102 can be recognized by collation of the 1st to 50th dimensions 2104 and 2105, and the 1st to 150th dimensions 2104 to 2106.
  • the type 2103 can be recognized by the above verification.
  • FIG. 22 is a flowchart showing a processing procedure of the video processing apparatus 1800 according to the present embodiment. This flowchart is executed by the CPU 1210 of FIG. 12 using the RAM 1240, and implements each functional component of FIG. Note that, in the processing procedure of the video processing apparatus 1800 in FIG. 22, the same steps as those in FIG. 13 of the second embodiment are denoted by the same step numbers, and description thereof is omitted. In FIG. 22, the transmission process and the reception process of FIG. 13 are omitted.
  • step S1311 If it is video input, the process proceeds from step S1311 to S2203, and the initial number of dimensions is set. Then, after the local feature generation process (S1315) and the collation process (S1317), it is determined in step S2209 whether the currently set number of dimensions is the maximum. If the number of dimensions is not the maximum, the process proceeds to step S2211, and a recognition target area for which detailed recognition is desired is selected from the collation result. In step S2213, the number of dimensions is increased from the initial number of dimensions, and the process returns to step S1315 to read out the local feature quantity having the increased number of dimensions.
  • step S2215 the process proceeds to step S2215 to display the final collation result superimposed on the video.
  • the video processing apparatus applies the accuracy adjustment of the local feature amount to identify the entire recognition target from the video, and further recognizes the recognition target, It differs in that it has a configuration that recognizes in detail.
  • Other configurations and operations are the same as those of the second embodiment or the third embodiment, and thus the description of the same configurations and operations is omitted.
  • the recognition target object in the query image in the video is identified, and then the details of the configuration of the recognition target object are collated and recognized.
  • the configuration of the object can be recognized in real time. For example, inventory and product inspection can be realized without unnecessary verification.
  • FIG. 23 is a diagram for explaining video processing by the video processing apparatus 2300 according to the present embodiment.
  • FIG. 23 a case where the image of the shelf is obtained by the imaging unit of the video processing device 2300 which is a portable terminal, the entire shelf is specified, and the object in the shelf is recognized next and notified to the user. It is shown.
  • the recognition target of this embodiment is not limited to a shelf.
  • the present invention can be applied to recognition of a substrate on a substrate and recognition of components on the substrate.
  • the 23 has a video display area 2311 for displaying video of the shelf 2313 and an instruction button display area 2312.
  • the video display area 2321 of the central display screen 2320 in FIG. 23 is a screen that displays a result of collating and recognizing a local feature amount with an accuracy sufficient to identify the shelf 2313 from the video in the video display area 2311.
  • the extent to which a shelf can be specified is not simply the type of shelf itself, but the shelf is a food shelf or a bookshelf, if it is a food shelf, it is a beverage shelf or a bread shelf, and if it is a book shelf, it is a paperback shelf Or a dictionary shelf.
  • the black circles 2322 are feature points and local areas of the entire shelf, and may or may not be displayed on the video. In the video display area 2321, for example, it is assumed that this shelf is recognized as a dictionary bookcase. It is recognized by collating local feature values of the entire shelf.
  • Black circles 2332 are feature points and local areas of individual dictionaries, and may or may not be displayed on the video.
  • the entire image is collated and recognized from the entire captured image with coarse and low accuracy, and then the recognition object is narrowed down and collated and recognized with high accuracy.
  • the recognition object is narrowed down and collated and recognized with high accuracy.
  • FIG. 24 is a block diagram illustrating a functional configuration of the video processing apparatus 2300 according to the present embodiment.
  • the functional configuration of the video processing device 2300 is a modification of the configuration of the local feature DB and the accuracy adjustment unit of FIG. 2 of the second embodiment, and the other functional configuration units are the same as those of FIG.
  • the same reference numerals are assigned and the description is omitted.
  • the accuracy adjustment unit 2450 includes a collation determination unit 2461 that determines the overall collation result, and a feature point number adjustment unit 2462 that adjusts the number of feature points in order to adjust the accuracy.
  • the feature point adjustment unit 2462 has an accuracy that allows the recognition target object to be recognized from the entire video from the video captured by the imaging unit 210 while limiting the number of feature points in the local feature generation of the initial local feature generation unit 220. To do. From the collation result of the collation unit 240, the collation determination unit 2461 determines a recognition target object from the entire video, and selects a local feature value of the recognition target object corresponding to the determination result from the local feature value DB 2430. In addition, the feature point adjustment unit 2462 is notified of the number of feature points necessary for recognition of the recognition target object corresponding to the determination result.
  • the feature point adjustment unit 2462 increases the number of feature points, and performs detailed collation recognition focused on the result of the discrimination determination unit 2461 discrimination instead of the entire video.
  • the local feature DB 2430 stores a local feature that recognizes an object from the entire video and a local feature that recognizes an individual object in the recognition object in an identifiable manner. Yes.
  • the feature point number adjustment is shown as the accuracy adjustment of the accuracy adjustment unit 2450.
  • other accuracy adjustments such as the feature vector dimension number adjustment may be used.
  • FIG. 25 is a diagram for explaining the dimension number adjustment processing according to the present embodiment.
  • FIG. 25 shows a case where the number of feature points is adjusted in three stages. Note that although the number of dimensions of the feature vector is all 25, the present invention is not limited to this.
  • the first stage 2510 shows a local feature amount in which the number of feature points is limited to 50 at the maximum.
  • the second stage 2520 shows a local feature amount in which the number of feature points is limited to a maximum of 200.
  • the third stage 2530 shows the local feature amount in which the number of feature points is limited to a maximum of 500.
  • FIG. 26 is a diagram showing the configuration of the local feature DB 2430 according to this embodiment. 26 shows the bookshelf and the books / videos displayed on the shelves, the same configuration is similarly applied to other combinations.
  • the local feature DB 26 is a local feature DB that stores local features generated from the entire shelf.
  • the local feature DB 2610 stores, for example, local feature values and feature point coordinates of 50 feature points in association with the entire shelf 2611.
  • the local feature DB 2620 in FIG. 26 is a local feature DB that stores local features generated from books and videos displayed on the shelf.
  • the local feature DB 2620 stores, for example, local feature values and feature point coordinates of 500 feature points in association with the book / video 2621.
  • the above-mentioned number of feature points is an example, and the number of feature points necessary for recognizing each target object or the number of feature points necessary for the correlation with others may be determined.
  • FIG. 27 is a flowchart showing a processing procedure of the video processing apparatus 2300 according to the present embodiment. This flowchart is executed by the CPU 1210 of FIG. 12 using the RAM 1240, and implements each functional component of FIG.
  • the same steps as those in FIG. 13 in the second embodiment are denoted by the same step numbers, and description thereof is omitted.
  • the transmission process and the reception process of FIG. 13 are omitted.
  • step S1311 If it is a video input, the process proceeds from step S1311 to S2713, and the number of shelf discrimination conditions (number of feature points in this example) is set.
  • a local feature DB for identifying a shelf in step S2715 is selected (see 2610 in FIG. 26). Then, after the local feature generation process (S1315) and the collation process (S1317), it is determined in step S2719 whether the collation target is a shelf / an article displayed on the shelf. If the shelf is a collation target, the process proceeds to step S2721, and the article condition (number of feature points) displayed on the determined shelf is set. Next, in step S2723, the local feature quantity DB for article discrimination is selected (see 2620 in FIG. 26). Then, the process returns to step S1315, and a feature point that is not initially targeted for local feature generation is also added to generate a local feature.
  • step S2725 the process proceeds to step S2725, and the article arrangement in the shelf is stored together with the video.
  • This collation result is used for inventory, for example.
  • shelf in FIG. 27 can be generalized by replacing the entire image and replacing the article with an individual object.
  • the video processing apparatus recognizes a recognition object having a change from the video by applying the accuracy adjustment of the local feature amount, and further recognizes the recognition object. Is different in that it has a configuration for recognizing in detail.
  • Other configurations and operations are the same as those of the second embodiment or the third embodiment, and thus the description of the same configurations and operations is omitted.
  • the recognition object with a change in the query image in the video is recognized, and the details of the recognition object are collated and recognized.
  • the recognition object to be recognized can be recognized in detail in real time. For example, it can be applied to a surveillance camera.
  • FIG. 28 is a diagram for explaining video processing by the video processing apparatus 2800 according to the present embodiment.
  • an image in the store is obtained by the imaging unit of the video processing device 2800, which is a portable terminal, and is monitored for changes in the video. If a change is detected in the video, the recognition object having changed is collated and recognized with high accuracy.
  • the recognition target of the present embodiment is not limited to in-store monitoring.
  • the left display screen 2810 in FIG. 28 has a video display area 2811 for displaying video in the store and an instruction button display area 2812.
  • the video display area 2821 of the central display screen 2820 in FIG. 28 is a screen that displays a result of collation and recognition by generating a local feature amount with an accuracy sufficient to detect a change from the video in the video display area 2811.
  • the video display area 2821 for example, it is assumed that changes in people 2822 and 2823 are detected. This is recognized by collating local feature values of the entire store.
  • the local feature amount of the person is collated in the video display area 2831 on the right display screen 2830 in FIG.
  • the person 2822 is the clerk A2832.
  • person 2823 is recognized and displayed as customer B2833.
  • the change of the image is collated and recognized from the entire captured image with coarse and low accuracy, and then, the changed recognition target is narrowed down and collated and recognized with high accuracy.
  • the change of the image is collated and recognized from the entire captured image with coarse and low accuracy, and then, the changed recognition target is narrowed down and collated and recognized with high accuracy.
  • FIG. 29 is a block diagram showing a functional configuration of a video processing apparatus 2800 according to this embodiment.
  • the functional configuration of the video processing device 2800 is obtained by replacing the configuration of the local feature DB and the accuracy adjusting unit in FIG. 2 of the second embodiment, and other functional components are the same as those in FIG.
  • the same reference numerals are assigned and the description is omitted.
  • the accuracy adjusting unit 2950 detects a change from the collation result of the entire video and selects a recognition target, and a feature for adjusting the number of feature points and the number of dimensions to adjust the accuracy.
  • the feature point / dimension adjustment unit 2962 can detect a change from the entire video from the video captured by the imaging unit 210 while limiting the number of feature points and the number of dimensions. The accuracy is about the same.
  • the change detection / recognition target selecting unit 2961 detects a change from the entire video and selects a local feature amount of the recognition target corresponding to the determination result from the local feature amount DB 2930.
  • the feature point number / dimension number adjustment unit 2962 is notified of the number of feature points and the number of dimensions necessary for recognition of the recognition object corresponding to the determination result.
  • the feature point number / dimension number adjustment unit 2962 increases the number of feature points and / or the number of dimensions, and performs detailed collation recognition that narrows down the target from the result detected by the change detection / recognition target selection unit 2961 instead of the entire video. To do.
  • the local feature DB 2930 stores a local feature for recognizing a change from the entire video and a local feature for recognizing an individual object in the recognition object in an identifiable manner. .
  • the number of feature points and the number of dimensions are adjusted as the accuracy adjustment of the accuracy adjustment unit 2950.
  • other accuracy adjustments may be used.
  • the configuration of the local feature DB 2930 of this embodiment is the same as the local feature DB 2430 in FIG. 26, except that the local feature DB 2610 for recognizing a shelf is changed to a local feature of a store image to recognize an article (book / video).
  • This is a configuration in which the amount DB 2620 is changed to a recognition target object (local feature amount such as a person), and here, overlapping description is avoided.
  • FIG. 30A is a diagram showing a configuration of a change detection parameter 3010 according to this embodiment.
  • the change detection parameter 3010 is based on the local feature amount generated from the video when the difference 3011 between the previous and next video recognition positions is E1 or more (up / down / left / right movement), or the recognition size difference 3012 is E2 or more.
  • the difference is in the image (forward / backward movement) or when the difference 3012 in the recognition direction is E3 or more (rotation at the same position, etc.), it is detected that the image has changed.
  • FIG. 30B is a diagram showing a configuration of change detection data 3020 according to the present embodiment.
  • the change detection data 3020 is associated with the recognition object ID 3021, the previous recognition position (center of gravity position) 3022, the previous recognition size 3023, the previous recognition direction (angle) 3024, and the current recognition position (center of gravity position) 3025.
  • the current recognition size 3026 and the current recognition direction (angle) 3027 are stored.
  • FIG. 31 is a flowchart showing a processing procedure of the video processing apparatus 2800 according to the present embodiment. This flowchart is executed by the CPU 1210 of FIG. 12 using the RAM 1240, and implements each functional component of FIG.
  • the same steps as those in FIG. 13 in the second embodiment are denoted by the same step numbers and description thereof is omitted.
  • the transmission process and the reception process of FIG. 13 are omitted.
  • step S3113 If it is a video input, the process advances from step S1311 to S3113, and the number of feature points / dimensions that are conditions for change detection are set. Then, after the local feature generation process (S1315) and the collation process (S1317), it is determined in step S3119 whether it is change detection or change object identification. If it is a change detection, the process advances to step S3121 to determine whether or not there is a change. If there is no change, the process returns to step S1315 to repeat change detection. If there is a change, the process advances to step S3123 to set a larger number of feature points / dimensions as a condition for identifying the changed object. In step S3125, the changed area is selected. If the change target can be recognized, in step S3127, the local feature quantity including the target object is selected from the local feature quantity DB 2930. And it returns to step S1315 and repeats a process.
  • the video processing system according to the present embodiment generates and transmits local feature amounts in the mobile terminal, and performs verification and accuracy adjustment instructions in the verification server. It is different in that it is a video processing system.
  • Other configurations and operations are the same as those of the second to sixth embodiments, and thus the description of the same configurations and operations is omitted.
  • the recognition target object in the query image in the video is recognized in real time while dynamically adjusting the accuracy of the local feature quantities. be able to.
  • FIG. 32 is a diagram for explaining video processing by the video processing system 3200 according to the present embodiment.
  • the video processing system 3200 includes a video processing device 3210 that is a mobile terminal and a video processing device 3220 that is a collation server connected to the video processing device 3210 via a network 3230. Then, the video processing device 3220 has a local feature DB 3221 that stores a local feature amount generated in advance from the recognition target object in association with the recognition target object, and an accuracy adjustment for setting an accuracy adjustment parameter based on the matching result. Parameter DB 3222.
  • the video processing device 3210 generates a local feature amount from the captured video and transmits it to the video processing device 3220. If the video processing device 3220 determines that the accuracy adjustment is necessary from the collation result, the video processing device 3220 transmits the accuracy adjustment parameter to the video processing device 3210.
  • the display screen 3211 on the right side of the video processing device 3210 which is a portable terminal shows a state in which a local feature amount is generated with coarse low accuracy from the captured video and transmitted to the video processing device 3220.
  • a display screen 3212 in the left diagram shows a state in which local feature amounts are generated with high precision and transmitted to the video processing device 3220 in accordance with instructions from the video processing device 3220 from the captured video.
  • Black circles 3211a and 3212a indicate feature points and local regions. Such black circles may or may not be displayed.
  • the accuracy is indicated by the number of feature points, but includes accuracy adjustment by the number of local regions, subregion divisions, and the number of dimensions of feature vectors.
  • FIG. 33 is a sequence diagram showing a processing procedure of the video processing system 3200 according to the present embodiment.
  • step S3300 if necessary, the application of this embodiment is downloaded from the verification server to the mobile terminal.
  • step S3301 an application is started and initialized in the mobile terminal and the verification server.
  • step S3303 the portable terminal captures an image by the imaging unit 210.
  • step S3305 the portable terminal generates a local feature amount.
  • step S3307 the portable terminal encodes the generated local feature amount and the position coordinates of the feature point, and in step S3309, transmits them to the matching server via the network.
  • step S3311 the collation server recognizes the object in the video by collating the local feature amount of the recognition target object in the local feature amount DB 3221 with the received local feature amount. Next, the collation server returns the collation result to the portable terminal in step S3313.
  • step S3315 the mobile terminal notifies the received recognition target object in the input video.
  • step S3317 the collation server determines whether accuracy adjustment is necessary based on the collation result. If accuracy adjustment is necessary, the process advances to step S3319 to acquire the accuracy adjustment parameter from the accuracy adjustment parameter DB 3222 and transmit it to the mobile terminal. On the other hand, if the accuracy adjustment is not necessary, the verification server ends the process.
  • the mobile terminal that has received the accuracy adjustment parameter returns from step S3321 to step S3305, and repeats the acquisition of the local feature amount whose accuracy has been adjusted. If no accuracy adjustment parameter is received, the processing of the portable terminal is also terminated.
  • such a series of processing is realized in real time, and the user can know the recognition object in the input video.
  • FIG. 34A is a block diagram showing a functional configuration of a video processing device 3210 for a portable terminal according to this embodiment.
  • the functional configuration of the video processing device 3210 is a configuration in which the configuration related to the collation processing is eliminated from the video processing device 200 of the second embodiment, and instead, a local feature transmission configuration and a collation result reception configuration are added. Therefore, the same components as those in FIG. 2 are denoted by the same reference numerals, and description thereof is omitted.
  • the video processing device 3210 includes an encoding unit 3430 that encodes the local feature amount and the feature point coordinates generated by the local feature amount generation unit 220 via the communication control unit 3470 (see FIG. 34B). ). Further, the accuracy of the local feature amount generated by the local feature amount generation unit 220 is adjusted by the accuracy adjustment unit 250 based on the accuracy adjustment parameter received by the accuracy adjustment parameter reception unit 3440 via the communication control unit 3470. .
  • a display screen superimposed on the input video is generated and displayed on the display unit 280 according to the data received by the verification result receiving unit 3460 via the communication control unit 3470. If the data received by the verification result receiving unit 3460 includes voice data, the voice is output.
  • FIG. 34B is a block diagram illustrating a configuration of the encoding unit 3430 according to the present embodiment. Note that the encoding unit is not limited to this example, and other encoding processes can be applied.
  • the encoding unit 3430 has a coordinate value scanning unit 3432 that inputs the coordinates of the feature points from the feature point detection unit 411 of the local feature quantity generation unit 220 and scans the coordinate values.
  • the coordinate value scanning unit 3432 scans the image according to a specific scanning method, and converts the two-dimensional coordinate values (X coordinate value and Y coordinate value) of the feature points into one-dimensional index values.
  • This index value is a scanning distance from the origin according to scanning. There is no restriction on the scanning direction.
  • the sorting unit 3433 has a sorting unit 3433 that sorts the index values of feature points and outputs permutation information after sorting.
  • the sorting unit 3433 sorts, for example, in ascending order. You may also sort in descending order.
  • a difference calculation unit 3434 that calculates a difference value between two adjacent index values in the sorted index value and outputs a series of difference values is provided.
  • the differential encoding unit 3435 that encodes a sequence of difference values in sequence order.
  • the sequence of the difference value may be encoded with a fixed bit length, for example.
  • the bit length may be specified in advance, but this requires the number of bits necessary to express the maximum possible difference value, so the encoding size is small. Don't be. Therefore, when encoding with a fixed bit length, the differential encoding unit 3435 can determine the bit length based on the input sequence of difference values.
  • the difference encoding unit 3435 calculates the maximum value of the difference value from the input series of difference values, determines the number of bits (expression bit number) necessary to express the maximum value, A series of difference values can be encoded with the obtained number of expression bits.
  • the local feature encoding unit 3431 encodes the local feature of the corresponding feature point in the same permutation as the index value of the sorted feature point.
  • the local feature amount encoding unit 3431 encodes a local feature amount that is dimension-selected from 150-dimensional local feature amounts for one feature point, for example, one dimension in one byte, and the number of dimensions in bytes. Can be encoded.
  • FIG. 35A is a flowchart showing a processing procedure of encoding according to the present embodiment.
  • step S3511 the coordinate values of feature points are scanned in a desired order.
  • step S3513 the scanned coordinate values are sorted.
  • step S3515 a difference value of coordinate values is calculated in the sorted order.
  • step S3517 the difference value is encoded (see FIG. 35B).
  • step S3519 the local feature amounts are encoded in the coordinate value sorting order. The difference value encoding and the local feature amount encoding may be performed in parallel.
  • FIG. 35B is a flowchart showing the processing procedure of difference value encoding S3517 according to this embodiment.
  • step S3521 it is determined whether or not the difference value is within a range that can be encoded. If it is within the range which can be encoded, it will progress to step S3527 and will encode a difference value. Then, control goes to a step S3529. If it is not within the range that can be encoded (outside the range), the process proceeds to step S3523 to encode the escape code.
  • step S3525 the difference value is encoded by an encoding method different from the encoding in step S3527. Then, control goes to a step S3529.
  • step S3529 it is determined whether the processed difference value is the last element in the series of difference values. If it is the last, the process ends. When it is not the last, it returns to step S3521 again and the process with respect to the next difference value of the series of a difference value is performed.
  • FIG. 36 is a block diagram showing a functional configuration of the server video processing device 3220 according to the present embodiment.
  • the server video processing device 3220 includes a communication control unit 3610.
  • the decoding unit 3620 decodes the encoded local feature amount and feature point coordinates received from the mobile terminal via the communication control unit 3610. Then, the collation unit 3630 collates with the local feature quantity of the recognition target in the local feature quantity DB 3221. Based on the collation result, the accuracy adjustment determination unit 3650 determines whether or not accuracy adjustment is necessary. If it is determined that accuracy adjustment is necessary, the accuracy adjustment determination unit 3650 reads the accuracy adjustment parameter from the accuracy adjustment parameter DB 3222. If necessary, the accuracy adjustment parameter is returned from the transmission unit 3640 to the portable terminal via the communication control unit 3610.
  • the video processing system according to the present embodiment is different from the seventh embodiment in that it is a video processing system that performs coarse low-precision collation in the mobile terminal and performs dense high-precision collation in the collation server.
  • Other configurations and operations are the same as those of the seventh embodiment, and thus description of the same configurations and operations is omitted.
  • roles can be divided in the video processing system, and the recognition target object in the query image in the video can be recognized in real time.
  • FIG. 37 is a sequence diagram showing a processing procedure of the video processing system 3700 according to this embodiment.
  • step S3700 if necessary, the application and data of this embodiment are downloaded from the verification server to the portable terminal.
  • step S3701 an application is started and initialized in the mobile terminal and the verification server.
  • step S3703 the portable terminal captures an image by the imaging unit 210.
  • step S3705 the mobile terminal generates a low-precision initial local feature amount.
  • step S3707 the portable terminal collates the generated initial local feature quantity with the local feature quantity stored in the portable terminal local feature quantity DB 3710 to perform initial object recognition.
  • step S3709 it is determined whether the recognition reliability is OK. If the recognition is OK and the reliability is OK, the process advances to step S3719 to display the recognition target object superimposed on the video.
  • step S3711 the process proceeds to step S3711 to adjust the accuracy and generate a highly accurate local feature.
  • step S3713 the data is transmitted to the verification server.
  • step S3715 the high-accuracy local feature received from the mobile terminal is collated with the high-accuracy local feature stored in the server local feature DB 3720 in association with the recognition target. , Recognize the object.
  • step S3717 the recognition target object of the recognition result is notified to the portable terminal.
  • step S3719 the portable terminal displays the received recognition target object superimposed on the video.
  • such a series of processing is realized in real time, and the user can know the recognition object in the input video.
  • FIG. 38A is a block diagram showing a configuration of the local feature DB 3710 for mobile terminal according to the present embodiment.
  • the local feature DB 3710 for mobile terminals stores local feature values 3813 to 3815 having a minimum feature number mmin of 50 dimensions in this example in association with the recognition object ID 3811 and the recognition object name 3812. Note that the 50 dimensions and the number of feature points mmin are examples, and are not limited thereto.
  • FIG. 38B is a block diagram showing a configuration of the server local feature DB 3720 according to the present embodiment.
  • the server local feature DB 3720 stores, in this example, local feature amounts 3823 to 3825 having a maximum feature number mmax of 150 dimensions in association with the recognition object ID 3821 and the recognition object name 3822.
  • the 150 dimensions and the number of feature points mmax are merely examples, and are not limited thereto.
  • the present invention may be applied to a system composed of a plurality of devices, or may be applied to a single device. Furthermore, the present invention can also be applied to a case where a control program that realizes the functions of the embodiments is supplied directly or remotely to a system or apparatus. Therefore, in order to realize the functions of the present invention on a computer, a control program installed in the computer, a medium storing the control program, and a WWW (World Wide Web) server that downloads the control program are also included in the scope of the present invention. include.
  • Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object.
  • First local feature quantity storage means for storing the local feature quantity in association with each other; N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points.
  • Second local feature generating means for generating a quantity; Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected.
  • the image in the video corresponds to the image.
  • a video processing apparatus comprising: (Appendix 2) The video processing apparatus according to appendix 1, wherein the accuracy adjustment unit controls the accuracy of the n second local feature values based on the reliability of the recognition result of the recognition unit. . (Appendix 3) The accuracy adjusting means controls to adjust the precision of the n second local feature values based on a data amount of the second local feature value generated by the second local feature value generating unit.
  • the video processing apparatus according to Supplementary Note 1.
  • the accuracy adjusting means adjusts the precision of the n second local feature amounts to the first precision, and the recognition means recognizes a recognition target existing in the image in the video
  • the accuracy adjusting unit adjusts the accuracy of the n second local feature amounts to a second accuracy higher than the first accuracy, and controls the recognition unit to recognize the recognition target object in more detail.
  • the video processing apparatus according to any one of supplementary notes 1 to 3, further comprising a control unit.
  • the accuracy adjusting means adjusts the precision of the n second local feature amounts to the first precision, and the recognition means recognizes a recognition target existing in the image in the video
  • the accuracy adjusting unit adjusts the accuracy of the n second local feature amounts to a second accuracy higher than the first accuracy, and recognizes a plurality of recognition objects constituting the recognition object by the recognition unit.
  • the video processing apparatus according to any one of appendices 1 to 3, further comprising control means for performing control.
  • the accuracy adjusting means adjusts the accuracy of the n second local feature amounts to the first accuracy, and the recognition means detects a change in the recognition target existing in the image in the video,
  • the accuracy adjusting unit adjusts the accuracy of the n second local feature quantities to a second accuracy higher than the first accuracy, and the recognition unit recognizes the recognition object whose change has been detected in more detail.
  • the video processing apparatus according to any one of appendices 1 to 3, further comprising control means for performing control.
  • the first local feature value and the second local feature value are obtained by dividing a local region including a feature point extracted from an image into a plurality of sub-regions, and having a plurality of dimensions including histograms of gradient directions in the plurality of sub-regions.
  • the video processing device according to any one of appendices 1 to 6, wherein the video processing device is generated by generating a feature vector.
  • the first local feature and the second local feature are generated by deleting a dimension having a larger correlation between adjacent sub-regions from the generated multi-dimensional feature vector. 8.
  • the first local feature value and the second local feature value are generated by deleting feature points determined to be less important from the plurality of feature points extracted from an image.
  • the video processing apparatus according to appendix 7 or 8.
  • the plurality of dimensions of the feature vector is a predetermined number of dimensions so that it can be selected in order from the dimension that contributes to the feature of the feature point and from the first dimension in accordance with the improvement in accuracy required for the local feature amount. 10.
  • the video processing device according to any one of appendices 7 to 9, wherein the video processing device is selected so as to go around the local region every time.
  • the accuracy adjusting means includes the number of dimensions of the feature vector in the n second local feature values generated by the second local feature value generating means, and the n number of dimensions extracted by the second local feature value generating means. 11. The video processing apparatus according to any one of appendices 1 to 10, wherein at least one of the feature points is adjusted.
  • the accuracy adjusting unit includes a size of a local region at the n feature points generated by the second local feature amount generating unit, a shape of the local region, and a division number for dividing the local region into sub-regions, 12.
  • the video processing apparatus according to appendix 11, wherein at least one of the direction number of the feature vector is adjusted.
  • the first local feature quantity storage means stores a set of the m first local feature quantities and the position coordinates of the m feature points in the image of the recognition object
  • the second local feature quantity generation means holds a set of the n second local feature quantities and the position coordinates of the n feature points in the image in the video
  • the recognizing unit is configured such that a set of a set of the n second local feature quantities and their position coordinates and a set of a set ratio of the m first local feature quantities and their position coordinates are a predetermined ratio or more are linear.
  • the video processing apparatus according to any one of appendices 1 to 12, wherein when it is determined that the relationship is a conversion relationship, the recognition target object is recognized in the image in the video.
  • a method for controlling a video processing apparatus including a first local feature amount storage unit that stores a local feature amount in association with each other, N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points.
  • the image in the video corresponds to the image.
  • a recognition step for recognizing that a recognition object exists;
  • An accuracy adjustment step for controlling to adjust the accuracy of the n second local feature values generated by the second local feature value generation step;
  • a control method for a video processing apparatus comprising: (Appendix 16) Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object.
  • a control program for a video processing apparatus including a first local feature amount storage unit that stores a local feature amount in association with each other, N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points.
  • the image in the video corresponds to the image.
  • a recognition step for recognizing that a recognition object exists;
  • An accuracy adjustment step for controlling to adjust the accuracy of the n second local feature values generated by the second local feature value generation step;
  • a control program for causing a computer to execute.
  • a video processing system having a video processing device for a mobile terminal and a video processing device for a server connected via a network,
  • m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object.
  • First local feature quantity storage means for storing the local feature quantity in association with each other; N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points.
  • Second local feature generating means for generating a quantity; Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected.
  • the image in the video corresponds to the image.
  • a video processing system comprising: (Appendix 18)
  • the video processing device for the portable terminal is: The second local feature quantity generating means; First transmitting means for encoding the n second local feature values and transmitting the encoded second local feature values to the server video processing apparatus via the network;
  • the server video processing device is: The first local feature storage means; Third receiving means for receiving and decoding the n second local feature values encoded from the video processing device for the mobile terminal;
  • Third transmission means for transmitting information indicating the recognition object recognized by the recognition
  • Second local feature generating means for generating a quantity; First transmitting means for encoding the n second local feature values and transmitting the encoded second local feature values to a video processing apparatus for a server via a network; Precision adjusting means for controlling to adjust the precision of the n second local feature values generated by the second local feature value generating means; First receiving means for receiving an instruction of accuracy adjustment by the accuracy adjusting means; Second receiving means for receiving information indicating a recognition object recognized by the server video processing device from the server video processing device;
  • a video processing apparatus comprising: (Appendix 20) A method for controlling a video processing device for a portable terminal in the video processing system according to appendix 17 or 18, N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points.
  • a second local feature generation step for generating a quantity;
  • a control method for a video processing apparatus comprising: (Appendix 21) A control program for a video processing device for a portable terminal in the video processing system according to appendix 17 or 18, N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points.
  • a second local feature generation step for generating a quantity;
  • a control program for causing a computer to execute.
  • Appendix 22 An image processing device for a server in the image processing system according to appendix 17 or 18, Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object.
  • First local feature quantity storage means for storing the local feature quantity in association with each other;
  • Third receiving means for receiving and decoding the encoded second local feature quantities from the video processing device for a mobile terminal;
  • a smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and from the feature vector up to the selected dimension number
  • Recognizing means for recognizing that the object to be recognized exists.
  • a video processing apparatus for a server comprising: (Appendix 23) In the video processing system according to attachment 17 or 18, each of the recognition target and the m local regions including each of the m feature points of the image of the recognition target is generated from one dimension to i
  • a control method for a video processing apparatus for a server comprising first local feature amount storage means for storing m first local feature amounts consisting of feature vectors up to dimensions in association with each other,
  • a method for controlling a video processing apparatus for a server comprising: (Appendix 24) In the video processing system according to attachment 17 or 18, each of the recognition target and the m local regions including each of the m feature points of the image of the recognition target is generated from one dimension to i
  • the image in the video corresponds to the image.
  • a control program for causing a computer to execute. (Appendix 25) A mobile terminal video processing apparatus and a server video processing apparatus connected via a network, each of which includes a recognition target object and m feature points of an image of the recognition target object.
  • Video processing system including first local feature storage means for storing m first local feature amounts each including feature vectors from one dimension to i dimension generated for each local region in association with each other
  • a video processing method in N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points.
  • a second local feature generation step for generating a quantity; Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected.
  • the image in the video corresponds to the image.
  • a video processing method comprising:

Abstract

The present invention recognizes in real time a recognition subject in a query image in a video while dynamically adjusting the precision of local feature quantities. In the present invention, a recognition subject is associated and recorded with m first local feature quantities each comprising a one-dimensional to i-dimensional feature vector, n features are extracted from an image in a video, n second local feature quantities are generated each comprising a one-dimensional to j-dimensional feature vector, the lower dimensionality from among dimensionality i dimensionality j is selected, and when it has been determined that at least a predetermined fraction of the m first local feature quantities comprising feature vectors up to the selected dimensionality correspond to the n second local feature quantities comprising feature vectors up to the selected dimensionality, it is recognized that the recognition subject is present in the image in the video. Also, the present invention is characterized by the precision of the n second local feature quantities that are generated being adjusted.

Description

映像処理システム、映像処理方法、携帯端末用またはサーバ用の映像処理装置およびその制御方法と制御プログラムVideo processing system, video processing method, video processing apparatus for portable terminal or server, and control method and control program therefor
 本発明は、映像中に存在する物をリアルタイムにかつ精度良く同定するための技術に関する。 The present invention relates to a technique for accurately identifying an object existing in a video in real time.
 上記技術分野において、特許文献1には、あらかじめモデル画像から生成されたモデル辞書を使用して、クエリ画像を認識する場合に、特徴量をクラスタリングすることにより認識速度を向上した技術が記載されている。 In the above technical field, Patent Document 1 describes a technique in which the recognition speed is improved by clustering feature amounts when a query image is recognized using a model dictionary generated in advance from a model image. Yes.
特開2011-221688号公報JP 2011-221688 A
 しかしながら、上記文献に記載の技術は、認識速度の向上を目的とした発明であって、認識精度と認識速度とのトレードオフを考慮しながら、映像中のクエリ画像内の認識対象物をリアルタイムで認識することを可能にするものではない。 However, the technique described in the above document is an invention aimed at improving the recognition speed, and in real time, recognizes the recognition target object in the query image in the video while considering the tradeoff between the recognition accuracy and the recognition speed. It does not make it possible to recognize.
 本発明の目的は、上述の課題を解決する技術を提供することにある。 An object of the present invention is to provide a technique for solving the above-described problems.
 上記目的を達成するため、本発明に係る装置は、
 認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段と、
 映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成手段と、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識手段と、
 前記第2局所特徴量生成手段により生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整手段と、
 を備えることを特徴とする。
In order to achieve the above object, an apparatus according to the present invention provides:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object. First local feature quantity storage means for storing the local feature quantity in association with each other;
N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. Second local feature generating means for generating a quantity;
Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition means for recognizing that a recognition object exists;
Precision adjusting means for controlling to adjust the precision of the n second local feature values generated by the second local feature value generating means;
It is characterized by providing.
 上記目的を達成するため、本発明に係る方法は、
 認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた映像処理装置の制御方法であって、
 映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識ステップと、
 前記第2局所特徴量生成ステップにより生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整ステップと、
 を含むことを特徴とする。
In order to achieve the above object, the method according to the present invention comprises:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object. A method for controlling a video processing apparatus including a first local feature amount storage unit that stores a local feature amount in association with each other,
N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a quantity;
Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition step for recognizing that a recognition object exists;
An accuracy adjustment step for controlling to adjust the accuracy of the n second local feature values generated by the second local feature value generation step;
It is characterized by including.
 上記目的を達成するため、本発明に係るプログラムは、
 認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた映像処理装置の制御プログラムであって、
 映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識ステップと、
 前記第2局所特徴量生成ステップにより生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整ステップと、
 をコンピュータに実行させることを特徴とする。
In order to achieve the above object, a program according to the present invention provides:
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object. A control program for a video processing apparatus including a first local feature amount storage unit that stores a local feature amount in association with each other,
N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a quantity;
Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition step for recognizing that a recognition object exists;
An accuracy adjustment step for controlling to adjust the accuracy of the n second local feature values generated by the second local feature value generation step;
Is executed by a computer.
 上記目的を達成するため、本発明に係るシステムは、
 ネットワークを介して接続される携帯端末用の映像処理装置とサーバ用の映像処理装置とを有する映像処理システムであって、
 認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段と、
 映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成手段と、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識手段と、
 前記第2局所特徴量生成手段により生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整手段と、
 を備えることを特徴とする。
In order to achieve the above object, a system according to the present invention provides:
A video processing system having a video processing device for a mobile terminal and a video processing device for a server connected via a network,
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object. First local feature quantity storage means for storing the local feature quantity in association with each other;
N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. Second local feature generating means for generating a quantity;
Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition means for recognizing that a recognition object exists;
Precision adjusting means for controlling to adjust the precision of the n second local feature values generated by the second local feature value generating means;
It is characterized by providing.
 上記目的を達成するため、本発明に係る方法は、
 ネットワークを介して接続される携帯端末用の映像処理装置とサーバ用の映像処理装置とを有し、認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた映像処理システムにおける映像処理方法であって、
 映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識ステップと、
 前記第2局所特徴量生成ステップにおいて生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整ステップと、
 を含むことを特徴とする。
In order to achieve the above object, the method according to the present invention comprises:
A mobile terminal video processing apparatus and a server video processing apparatus connected via a network, each of which includes a recognition target object and m feature points of an image of the recognition target object. Video processing system including first local feature storage means for storing m first local feature amounts each including feature vectors from one dimension to i dimension generated for each local region in association with each other A video processing method in
N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a quantity;
Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition step for recognizing that a recognition object exists;
An accuracy adjustment step of controlling to adjust the accuracy of the n second local feature values generated in the second local feature value generation step;
It is characterized by including.
 本発明によれば、ダイナミックに局所特徴量の精度を調整しながら、映像中のクエリ画像内の認識対象物をリアルタイムで認識することができる。 According to the present invention, it is possible to recognize a recognition object in a query image in a video in real time while dynamically adjusting the accuracy of the local feature amount.
本発明の第1実施形態に係る映像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video processing apparatus which concerns on 1st Embodiment of this invention. 本発明の第2実施形態に係る映像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video processing apparatus which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る映像処理装置の動作手順を示すシーケンス図である。It is a sequence diagram which shows the operation | movement procedure of the video processing apparatus which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成部の構成を示すブロック図である。It is a block diagram which shows the structure of the local feature-value production | generation part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成部の処理を示す図である。It is a figure which shows the process of the local feature-value production | generation part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成部の処理を示す図である。It is a figure which shows the process of the local feature-value production | generation part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成部の処理を示す図である。It is a figure which shows the process of the local feature-value production | generation part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成部の処理を示す図である。It is a figure which shows the process of the local feature-value production | generation part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成部の処理を示す図である。It is a figure which shows the process of the local feature-value production | generation part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る照合部の処理を示す図である。It is a figure which shows the process of the collation part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る精度調整部の第1の構成を示すブロック図である。It is a block diagram which shows the 1st structure of the precision adjustment part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る精度調整部の第2の構成を示すブロック図である。It is a block diagram which shows the 2nd structure of the precision adjustment part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る精度調整部の第2の構成による処理を説明する図である。It is a figure explaining the process by the 2nd structure of the precision adjustment part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る精度調整部の第3の構成を示すブロック図である。It is a block diagram which shows the 3rd structure of the precision adjustment part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る精度調整部の第4の構成を示すブロック図である。It is a block diagram which shows the 4th structure of the precision adjustment part which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成データの構成を示す図である。It is a figure which shows the structure of the local feature-value production | generation data concerning 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量DBの構成を示す図である。It is a figure which shows the structure of local feature-value DB which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る精度調整パラメータの構成を示す図である。It is a figure which shows the structure of the precision adjustment parameter which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る信頼度判定テーブルの構成を示す図である。It is a figure which shows the structure of the reliability determination table which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る映像処理装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the video processing apparatus which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る映像処理装置の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the video processing apparatus which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る局所特徴量生成処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the local feature-value production | generation process which concerns on 2nd Embodiment of this invention. 本発明の第2実施形態に係る照合処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the collation process which concerns on 2nd Embodiment of this invention. 本発明の第3実施形態に係る映像処理装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the video processing apparatus which concerns on 3rd Embodiment of this invention. 本発明の第3実施形態に係るデータ量評価テーブルの構成を示す図である。It is a figure which shows the structure of the data amount evaluation table which concerns on 3rd Embodiment of this invention. 本発明の第3実施形態に係る映像処理装置の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the video processing apparatus which concerns on 3rd Embodiment of this invention. 本発明の第4実施形態に係る映像処理装置による映像処理を説明する図である。It is a figure explaining the video processing by the video processing apparatus which concerns on 4th Embodiment of this invention. 本発明の第4実施形態に係る映像処理装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the video processing apparatus which concerns on 4th Embodiment of this invention. 本発明の第4実施形態に係る次元数調整処理を説明する図である。It is a figure explaining the dimension number adjustment process which concerns on 4th Embodiment of this invention. 本発明の第4実施形態に係る局所特徴量DBの構成を示す図である。It is a figure which shows the structure of local feature-value DB which concerns on 4th Embodiment of this invention. 本発明の第4実施形態に係る映像処理装置の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the video processing apparatus which concerns on 4th Embodiment of this invention. 本発明の第5実施形態に係る映像処理装置による映像処理を説明する図である。It is a figure explaining the video processing by the video processing apparatus which concerns on 5th Embodiment of this invention. 本発明の第5実施形態に係る映像処理装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the video processing apparatus which concerns on 5th Embodiment of this invention. 本発明の第5実施形態に係る次元数調整処理を説明する図である。It is a figure explaining the dimension number adjustment process which concerns on 5th Embodiment of this invention. 本発明の第5実施形態に係る局所特徴量DBの構成を示す図である。It is a figure which shows the structure of local feature-value DB which concerns on 5th Embodiment of this invention. 本発明の第5実施形態に係る映像処理装置の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the video processing apparatus which concerns on 5th Embodiment of this invention. 本発明の第6実施形態に係る映像処理装置による映像処理を説明する図である。It is a figure explaining the video processing by the video processing apparatus which concerns on 6th Embodiment of this invention. 本発明の第6実施形態に係る映像処理装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the video processing apparatus which concerns on 6th Embodiment of this invention. 本発明の第6実施形態に係る変化検出パラメータの構成を示す図である。It is a figure which shows the structure of the change detection parameter which concerns on 6th Embodiment of this invention. 本発明の第6実施形態に係る変化検出データの構成を示す図である。It is a figure which shows the structure of the change detection data which concerns on 6th Embodiment of this invention. 本発明の第6実施形態に係る映像処理装置の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the video processing apparatus which concerns on 6th Embodiment of this invention. 本発明の第7実施形態に係る映像処理システムによる映像処理を説明する図である。It is a figure explaining the video processing by the video processing system which concerns on 7th Embodiment of this invention. 本発明の第7実施形態に係る映像処理システムの処理手順を示すシーケンス図である。It is a sequence diagram which shows the process sequence of the video processing system which concerns on 7th Embodiment of this invention. 本発明の第7実施形態に係る携帯端末用の映像処理装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the video processing apparatus for portable terminals which concerns on 7th Embodiment of this invention. 本発明の第7実施形態に係る符号化部の構成を示すブロック図である。It is a block diagram which shows the structure of the encoding part which concerns on 7th Embodiment of this invention. 本発明の第7実施形態に係る符号化の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the encoding which concerns on 7th Embodiment of this invention. 本発明の第7実施形態に係る差分値の符号化の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the encoding of the difference value which concerns on 7th Embodiment of this invention. 本発明の第7実施形態に係るサーバ用の映像処理装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the video processing apparatus for servers which concerns on 7th Embodiment of this invention. 本発明の第8実施形態に係る映像処理システムの処理手順を示すシーケンス図である。It is a sequence diagram which shows the process sequence of the video processing system which concerns on 8th Embodiment of this invention. 本発明の第8実施形態に係る携帯端末用局所特徴量DBの構成を示すブロック図である。It is a block diagram which shows the structure of local feature-value DB for portable terminals which concerns on 8th Embodiment of this invention. 本発明の第8実施形態に係るサーバ用局所特徴量DBの構成を示すブロック図である。It is a block diagram which shows the structure of local feature-value DB for servers which concerns on 8th Embodiment of this invention.
 以下に、図面を参照して、本発明の実施の形態について例示的に詳しく説明する。ただし、以下の実施の形態に記載されている構成要素は単なる例示であり、本発明の技術範囲をそれらのみに限定する趣旨のものではない。 Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the drawings. However, the constituent elements described in the following embodiments are merely examples, and are not intended to limit the technical scope of the present invention only to them.
 [第1実施形態]
 本発明の第1実施形態としての映像処理装置100について、図1を用いて説明する。映像処理装置100は、映像中の画像内の認識対象物を、認識精度を維持してリアルタイムに認識する装置である。
[First Embodiment]
A video processing apparatus 100 as a first embodiment of the present invention will be described with reference to FIG. The video processing device 100 is a device that recognizes a recognition object in an image in a video in real time while maintaining recognition accuracy.
 図1に示すように、映像処理装置100は、第1局所特徴量記憶部110と、第2局所特徴量生成部120と、認識部130と、精度調整部140と、を含む。第1局所特徴量記憶部110は、認識対象物111と、認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量112とを、対応付けて記憶する。第2局所特徴量生成部120は、映像中の画像101からn個の特徴点121を抽出し、n個の特徴点のそれぞれを含むn個の局所領域122について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量123を生成する。認識部130は、第1局所特徴量の特徴ベクトルの次元数iおよび第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択する。認識部130は、選択した次元数までの特徴ベクトルからなるn個の第2局所特徴量123に、選択した次元数までの特徴ベクトルからなるm個の第1局所特徴量112の所定割合以上が対応するか否かを判定する。認識部130は、対応すると判定した場合に、映像中の画像101に認識対象物111が存在すると認識する。精度調整部140は、第2局所特徴量生成部120により生成するn個の第2局所特徴量123の精度を調整するように制御する(101a→101b)。 As shown in FIG. 1, the video processing apparatus 100 includes a first local feature quantity storage unit 110, a second local feature quantity generation unit 120, a recognition unit 130, and an accuracy adjustment unit 140. The first local feature quantity storage unit 110 generates each of the recognition target object 111 and the m local regions including the m feature points of the recognition target object image from one dimension to i dimension, respectively. Are stored in association with the m first local feature quantities 112 made up of the feature vectors. The second local feature quantity generation unit 120 extracts n feature points 121 from the image 101 in the video, and each of the n local regions 122 including each of the n feature points from 1 dimension to j dimension. N second local feature values 123 of the feature vectors are generated. The recognizing unit 130 selects a smaller number of dimensions from the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity. The recognizing unit 130 has a predetermined ratio or more of m first local feature quantities 112 consisting of feature vectors up to the selected number of dimensions in n second local feature quantities 123 consisting of feature vectors up to the selected number of dimensions. It is determined whether or not it corresponds. When it is determined that the recognition unit 130 corresponds, the recognition unit 130 recognizes that the recognition target object 111 exists in the image 101 in the video. The accuracy adjustment unit 140 controls to adjust the accuracy of the n second local feature values 123 generated by the second local feature value generation unit 120 (101a → 101b).
 本実施形態によれば、ダイナミックに局所特徴量の精度を調整しながら、映像中のクエリ画像内の認識対象物をリアルタイムで認識することができる。 According to the present embodiment, it is possible to recognize the recognition target object in the query image in the video in real time while dynamically adjusting the accuracy of the local feature amount.
 [第2実施形態]
 次に、本発明の第2実施形態に係る映像処理装置について説明する。本実施形態においては、携帯端末としての映像処理装置が、信頼性を評価して局所特徴量の精度調整を行なって、撮像中の映像内の物をリアルタイムに認識する処理を説明する。なお、本実施形態においては、携帯端末により撮像した映像に対する処理について説明するが、映像は映像コンテンツの再生処理や、放送番組の視聴においても同様に適応される。
[Second Embodiment]
Next, a video processing apparatus according to the second embodiment of the present invention will be described. In the present embodiment, a description will be given of processing in which a video processing apparatus as a mobile terminal recognizes an object in a video being captured in real time by evaluating reliability and adjusting the accuracy of local feature amounts. In the present embodiment, processing for video captured by a mobile terminal will be described. However, video is similarly applied to video content reproduction processing and broadcast program viewing.
 本実施形態によれば、ダイナミックに局所特徴量の精度を調整しながら、映像中のクエリ画像内の認識対象物を、信頼性を維持しつつリアルタイムに認識することができる。 According to the present embodiment, it is possible to recognize the recognition target object in the query image in the video in real time while maintaining the reliability while dynamically adjusting the accuracy of the local feature amount.
 《映像処理装置の機能構成》
 図2は、本実施形態に係る映像処理装置200の機能構成を示すブロック図である。
<Functional configuration of video processing device>
FIG. 2 is a block diagram showing a functional configuration of the video processing apparatus 200 according to the present embodiment.
 映像処理装置200は、映像を取得する撮像部210を有する。撮像された映像は、表示部280に表示されると共に、局所特徴量生成部220に入力される。局所特徴量生成部220では、撮像された映像から局所特徴量を生成する(詳細は図4A参照)。局所特徴量DB230は、あらかじめ個々の認識対象物から局所特徴量生成部220と同様のアルゴリズムで生成された局所特徴量を、認識対象物と対応付けて格納している。かかる局所特徴量DB230の内容は、サーバなど外部から受信してもよい。 The video processing apparatus 200 includes an imaging unit 210 that acquires video. The captured video is displayed on the display unit 280 and input to the local feature value generation unit 220. The local feature value generation unit 220 generates a local feature value from the captured video (refer to FIG. 4A for details). The local feature DB 230 stores local feature values generated in advance from individual recognition objects by the same algorithm as the local feature value generation unit 220 in association with the recognition objects. The contents of the local feature DB 230 may be received from the outside such as a server.
 照合部240は、撮像された映像から局所特徴量生成部220で生成された局所特徴量中に、局所特徴量DB230に格納されている局所特徴量に対応するデータがあるか否かを照合する。対応するデータがあれば、撮影された映像中に認識対象物があると判定する。なお、局所特徴量が対応するというのは、同じ局所特徴量が在るというだけでなく、その順序や配置が同じ対象物から取得し得るか否かも判断している(図4G参照)。 The collation unit 240 collates whether there is data corresponding to the local feature quantity stored in the local feature quantity DB 230 in the local feature quantity generated by the local feature quantity generation unit 220 from the captured video. . If there is corresponding data, it is determined that there is a recognition target in the captured video. Note that the fact that local feature amounts correspond does not only mean that there are the same local feature amounts, but also determines whether or not the same order and arrangement can be acquired from the same object (see FIG. 4G).
 精度調整部250は、照合部240の照合結果を受けて信頼性評価部260において照合結果の信頼性を評価し、局所特徴量生成部220における精度調整を行なう(図5Aから図5C、図6および図7参照)。 The accuracy adjustment unit 250 receives the collation result of the collation unit 240, evaluates the reliability of the collation result in the reliability evaluation unit 260, and performs accuracy adjustment in the local feature amount generation unit 220 (FIGS. 5A to 5C, FIG. 6). And FIG. 7).
 照合結果生成部270は、照合部240の照合結果から表示部280に表示するためのデータを生成する。かかるデータには、認識対象物の名称や関連情報、認識エラーなどのデータも含まれる。表示部280は、撮像部210で撮像された映像に照合結果を重畳して表示する。なお、照合結果生成部270が生成したデータは、サーバなどの外部に送信されてもよい。操作部290は、映像処理装置200のキーやタッチパネルを含み、撮像部210などの映像処理装置200の動作を操作する。 The verification result generation unit 270 generates data to be displayed on the display unit 280 from the verification result of the verification unit 240. Such data includes data such as the names of recognition objects, related information, and recognition errors. The display unit 280 displays the collation result superimposed on the video imaged by the imaging unit 210. Note that the data generated by the verification result generation unit 270 may be transmitted to the outside such as a server. The operation unit 290 includes keys and a touch panel of the video processing device 200, and operates an operation of the video processing device 200 such as the imaging unit 210.
 なお、本実施形態の映像処理装置200は、撮像中の映像に限定されず、再生中の映像や放送中の映像においても適用可能である。その場合には、撮像部210を映像再生部や映像受信部に置き換えればよい。 Note that the video processing apparatus 200 of the present embodiment is not limited to the video being captured, and can be applied to a video being played back or a video being broadcast. In that case, the imaging unit 210 may be replaced with a video reproduction unit or a video reception unit.
 《映像処理装置の動作手順》
 図3は、本実施形態に係る映像処理装置200の動作手順を示すシーケンス図である。
<Operation procedure of video processing device>
FIG. 3 is a sequence diagram showing an operation procedure of the video processing apparatus 200 according to the present embodiment.
 まず、ステップS301において、撮像部210画撮像した映像が局所特徴量生成部220に転送される。局所特徴量生成部220は、ステップS303において、初期の精度を決定する初期精度調整パラメータを設定する。精度調整パラメータには、特徴点数や、次元数、局所領域のサイズや形状、サブ領域の分割数、特徴ベクトルの方向数、などが含まれる(図10参照)。 First, in step S <b> 301, an image captured by the imaging unit 210 is transferred to the local feature amount generation unit 220. In step S303, the local feature quantity generator 220 sets an initial accuracy adjustment parameter that determines the initial accuracy. The accuracy adjustment parameters include the number of feature points, the number of dimensions, the size and shape of the local region, the number of sub-region divisions, the number of feature vector directions, and the like (see FIG. 10).
 ステップS305において、局所特徴量生成部220は、初期精度パラメータを使用して局所特徴量を生成する。そして、生成された局所特徴量を照合部240に転送する。 In step S305, the local feature generating unit 220 generates a local feature using the initial accuracy parameter. Then, the generated local feature amount is transferred to the matching unit 240.
 照合部240は、ステップS309において、局所特徴量生成部220で生成された局所特徴量と、局所特徴量DB230にあらかじめ認識対象物に対応付けて格納されている局所特徴量と照合する。そして、所定の割合以上の合致する局所特徴量があり、かつ、局所特徴量が合致する互いの特徴点の特徴点座標の関係が線形関係を有するならば、撮像した映像中に認識対象物が存在すると判定する。照合部240は、撮像した映像中に認識対象物が存在すると判定した場合に、ステップS311において、判定に使用した照合結果値を精度調整部250に通知する。照合結果値には、合致した特徴点数や全体中の割合、各特徴点の合致度、重要な特徴点の合致度、などが含まれる(図11参照)。 In step S309, the collation unit 240 collates the local feature amount generated by the local feature amount generation unit 220 with the local feature amount stored in the local feature amount DB 230 in association with the recognition target object in advance. Then, if there is a matching local feature amount equal to or greater than a predetermined ratio, and the relationship between the feature point coordinates of the feature points with which the local feature amount matches has a linear relationship, the recognition target object is included in the captured image. It is determined that it exists. When it is determined that the recognition target is present in the captured video, the verification unit 240 notifies the accuracy adjustment unit 250 of the verification result value used for the determination in step S311. The matching result value includes the number of matched feature points, the ratio in the whole, the degree of match of each feature point, the degree of match of important feature points, and the like (see FIG. 11).
 精度調整部250の信頼性評価部260は、照合部240から受け取った照合結果値を評価し、照合結果の信頼度を出力する(図11参照)。そして、ステップS313において、その信頼度が所定の閾値Thを超えるか否かを判定する。信頼度が所定の閾値Thを超えていれば,照合処理は終了して認識結果が出力される。一方、信頼度が所定の閾値Thを超えていなければ、ステップS315において、精度調整部250は、精度調整パラメータを局所特徴量生成部220に設定する。 The reliability evaluation unit 260 of the accuracy adjustment unit 250 evaluates the verification result value received from the verification unit 240 and outputs the reliability of the verification result (see FIG. 11). In step S313, it is determined whether or not the reliability exceeds a predetermined threshold Th. If the reliability exceeds a predetermined threshold value Th, the collation process ends and a recognition result is output. On the other hand, if the reliability does not exceed the predetermined threshold Th, in step S315, the accuracy adjustment unit 250 sets the accuracy adjustment parameter in the local feature value generation unit 220.
 局所特徴量生成部220は、ステップS317において、精度調整パラメータを取得して、ステップS305に戻って、再度、局所特徴量生成を行ない、照合処理を繰り返す。 The local feature generation unit 220 acquires the accuracy adjustment parameter in step S317, returns to step S305, performs local feature generation again, and repeats the matching process.
 《局所特徴量生成部》
 図4Aは、本実施形態に係る局所特徴量生成部220の構成を示すブロック図である。
<< Local feature generator >>
FIG. 4A is a block diagram illustrating a configuration of the local feature value generation unit 220 according to the present embodiment.
 局所特徴量生成部220は、特徴点検出部411、局所領域取得部412、サブ領域分割部413、サブ領域特徴ベクトル生成部414、および次元選定部415を含んで構成される。 The local feature quantity generation unit 220 includes a feature point detection unit 411, a local region acquisition unit 412, a sub-region division unit 413, a sub-region feature vector generation unit 414, and a dimension selection unit 415.
 特徴点検出部411は、画像データから特徴的な点(特徴点)を多数検出し、各特徴点の座標位置、スケール(大きさ)、および角度を出力する。 The feature point detection unit 411 detects a number of characteristic points (feature points) from the image data, and outputs the coordinate position, scale (size), and angle of each feature point.
 局所領域取得部412は、検出された各特徴点の座標値、スケール、および角度から、特徴量抽出を行う局所領域を取得する。 The local region acquisition unit 412 acquires a local region where feature amount extraction is performed from the coordinate value, scale, and angle of each detected feature point.
 サブ領域分割部413は、局所領域をサブ領域に分割する。例えば、サブ領域分割部413は、局所領域を16ブロック(4×4ブロック)に分割することも、局所領域を25ブロック(5×5ブロック)に分割することもできる。なお、分割数は限定されない。本実施形態においては、以下、局所領域を25ブロック(5×5ブロック)に分割する場合を代表して説明する。 The sub area dividing unit 413 divides the local area into sub areas. For example, the sub-region dividing unit 413 can divide the local region into 16 blocks (4 × 4 blocks) or divide the local region into 25 blocks (5 × 5 blocks). The number of divisions is not limited. In the present embodiment, the case where the local area is divided into 25 blocks (5 × 5 blocks) will be described below as a representative.
 サブ領域特徴ベクトル生成部414は、局所領域のサブ領域ごとに特徴ベクトルを生成する。サブ領域の特徴ベクトルとしては、例えば、勾配方向ヒストグラムを用いることができる。 The sub-region feature vector generation unit 414 generates a feature vector for each sub-region of the local region. As the feature vector of the sub-region, for example, a gradient direction histogram can be used.
 次元選定部415は、サブ領域の位置関係に基づいて、近接するサブ領域の特徴ベクトル間の相関が低くなるように、局所特徴量として出力する次元を選定する(例えば、次元を削除あるいは間引きする)。また、次元選定部415は、単に次元を選定するだけではなく、選定の優先順位を決定することができる。すなわち、次元選定部415は、例えば、隣接するサブ領域間では同一の勾配方向の次元が選定されないように、優先順位をつけて次元を選定することができる。そして、次元選定部415は、選定した次元から構成される特徴ベクトルを、局所特徴量として出力する。なお、次元選定部415は、優先順位に基づいて次元を並び替えた状態で、局所特徴量を出力することができる。 The dimension selection unit 415 selects a dimension to be output as a local feature amount based on the positional relationship between the sub-regions so that the correlation between the feature vectors of adjacent sub-regions becomes low (for example, deletes or thins out the dimensions). ). In addition, the dimension selection unit 415 can not only select a dimension but also determine a selection priority. That is, for example, the dimension selection unit 415 can select dimensions with priorities so that dimensions in the same gradient direction are not selected between adjacent sub-regions. Then, the dimension selection unit 415 outputs a feature vector composed of the selected dimensions as a local feature amount. Note that the dimension selection unit 415 can output the local feature amount in a state where the dimensions are rearranged based on the priority order.
 《局所特徴量生成部の処理》
 図4B~図4Fは、本実施形態に係る局所特徴量生成部220の処理を示す図である。
<< Processing of local feature generator >>
4B to 4F are diagrams illustrating processing of the local feature value generation unit 220 according to the present embodiment.
 まず、図4Bは、局所特徴量生成部220における、特徴点検出/局所領域取得/サブ領域分割/特徴ベクトル生成の一連の処理を示す図である。かかる一連の処理については、米国特許第6711293号明細書や、David G. Lowe著、「Distinctive image features from scale-invariant key points」、(米国)、International Journal of Computer Vision、60(2)、2004年、p. 91-110を参照されたい。 First, FIG. 4B is a diagram showing a series of processes of feature point detection / local area acquisition / sub-area division / feature vector generation in the local feature quantity generation unit 220. Such a series of processes is described in US Pat. No. 6,711,293, David G. Lowe, “Distinctive image features from scale-invariant key points” (USA), International Journal of Computer Vision, 60 (2), 2004. Year, p. 91-110.
 (特徴点検出部)
 図4Bの421は、図4Aの特徴点検出部411において、映像中の画像から特徴点を検出した状態を示す図である。以下、1つの特徴点421aを代表させて局所特徴量の生成を説明する。特徴点421aの矢印の起点が特徴点の座標位置を示し、矢印の長さがスケール(大きさ)を示し、矢印の方向が角度を示す。ここで、スケール(大きさ)や方向は、対象映像に従って輝度や彩度、色相などを選択できる。また、図4Bの例では、60度間隔で6方向の場合を説明するが、これに限定されない。
(Feature point detector)
421 in FIG. 4B is a diagram illustrating a state in which feature points are detected from an image in the video in the feature point detection unit 411 in FIG. 4A. Hereinafter, generation of a local feature amount will be described by using one feature point 421a as a representative. The starting point of the arrow of the feature point 421a indicates the coordinate position of the feature point, the length of the arrow indicates the scale (size), and the direction of the arrow indicates the angle. Here, as the scale (size) and direction, brightness, saturation, hue, and the like can be selected according to the target image. In the example of FIG. 4B, the case of six directions at intervals of 60 degrees is described, but the present invention is not limited to this.
 (局所領域取得部)
 図4Aの局所領域取得部412は、例えば、特徴点421aの起点を中心にガウス窓422aを生成し、このガウス窓422aを略含む局所領域422を生成する。図4Bの例では、局所領域取得部412は正方形の局所領域422を生成したが、局所領域は円形であっても他の形状であってもよい。この局所領域を各特徴点について取得する。
(Local area acquisition unit)
For example, the local region acquisition unit 412 in FIG. 4A generates a Gaussian window 422a around the starting point of the feature point 421a, and generates a local region 422 that substantially includes the Gaussian window 422a. In the example of FIG. 4B, the local region acquisition unit 412 generates the square local region 422, but the local region may be circular or have another shape. This local region is acquired for each feature point.
 (サブ領域分割部)
 次に、サブ領域分割部413において、上記特徴点421aの局所領域422に含まれる各画素のスケールおよび角度をサブ領域423に分割した状態が示されている。なお、図4Bでは4×4=16画素をサブ領域とする5×5=25のサブ領域に分割した例を示す。しかし、サブ領域は、4×4=16や他の形状、分割数であってもよい。
(Sub-region division part)
Next, a state in which the scale and angle of each pixel included in the local region 422 of the feature point 421a is divided into sub regions 423 in the sub region dividing unit 413 is shown. FIG. 4B shows an example in which 4 × 4 = 16 pixels are divided into 5 × 5 = 25 subregions. However, the sub-region may be 4 × 4 = 16, other shapes, or the number of divisions.
 (サブ領域特徴ベクトル生成部)
 サブ領域特徴ベクトル生成部414は、サブ領域内の各画素のスケールを8方向の角度単位にヒストグラムを生成して量子化し、サブ領域の特徴ベクトル424とする。すなわち、特徴点検出部411が出力する角度に対して正規化された方向である。そして、サブ領域特徴ベクトル生成部414は、サブ領域ごとに量子化された8方向の頻度を集計し、ヒストグラムを生成する。この場合、サブ領域特徴ベクトル生成部414は、各特徴点に対して生成される25サブ領域ブロック×6方向=150次元のヒストグラムにより構成される特徴ベクトルを出力する。また、勾配方向を8方向に量子化するだけに限らず、4方向、8方向、10方向など任意の量子化数に量子化してよい。勾配方向をD方向に量子化する場合、量子化前の勾配方向をG(0~2πラジアン)とすると、勾配方向の量子化値Qq(q=0,…,D-1)は、例えば式(1)や式(2)などで求めることができるが、これに限られない。
(Sub-region feature vector generator)
The sub-region feature vector generation unit 414 generates and quantizes the histogram of each pixel in the sub-region in units of angular directions in eight directions to obtain the sub-region feature vector 424. That is, the direction is normalized with respect to the angle output by the feature point detection unit 411. Then, the sub-region feature vector generation unit 414 aggregates the frequencies in the eight directions quantized for each sub-region, and generates a histogram. In this case, the sub-region feature vector generation unit 414 outputs a feature vector constituted by a histogram of 25 sub-region blocks × 6 directions = 150 dimensions generated for each feature point. In addition, the gradient direction is not only quantized to 8 directions, but may be quantized to an arbitrary quantization number such as 4 directions, 8 directions, and 10 directions. When the gradient direction is quantized in the D direction, if the gradient direction before quantization is G (0 to 2π radians), the quantized value Qq (q = 0,..., D−1) in the gradient direction can be expressed by, for example, Although it can obtain | require by (1), Formula (2), etc., it is not restricted to this.
 Qq=floor(G×D/2π)    …(1)
 Qq=round(G×D/2π)modD …(2)
 ここで、floor()は小数点以下を切り捨てる関数、round()は四捨五入を行う関数、modは剰余を求める演算である。また、サブ領域特徴ベクトル生成部414は勾配ヒストグラムを生成するときに、単純な頻度を集計するのではなく、勾配の大きさを加算して集計してもよい。また、サブ領域特徴ベクトル生成部414は勾配ヒストグラムを集計するときに、画素が属するサブ領域だけではなく、サブ領域間の距離に応じて近接するサブ領域(隣接するブロックなど)にも重み値を加算するようにしてもよい。また、サブ領域特徴ベクトル生成部414は量子化された勾配方向の前後の勾配方向にも重み値を加算するようにしてもよい。なお、サブ領域の特徴ベクトルは勾配方向ヒストグラムに限られず、色情報など、複数の次元(要素)を有するものであればよい。本実施形態においては、サブ領域の特徴ベクトルとして、勾配方向ヒストグラムを用いることとして説明する。
Qq = floor (G × D / 2π) (1)
Qq = round (G × D / 2π) mod D (2)
Here, floor () is a function for rounding off the decimal point, round () is a function for rounding off, and mod is an operation for obtaining a remainder. Further, when generating the gradient histogram, the sub-region feature vector generation unit 414 may add up the magnitudes of the gradients instead of adding up the simple frequencies. Further, when the sub-region feature vector generation unit 414 aggregates the gradient histogram, the sub-region feature vector generation unit 414 assigns weight values not only to the sub-region to which the pixel belongs, but also to sub-regions adjacent to each other (such as adjacent blocks) according to the distance between the sub-regions. You may make it add. Further, the sub-region feature vector generation unit 414 may add weight values to gradient directions before and after the quantized gradient direction. Note that the feature vector of the sub-region is not limited to the gradient direction histogram, and may be any one having a plurality of dimensions (elements) such as color information. In the present embodiment, it is assumed that a gradient direction histogram is used as the feature vector of the sub-region.
 (次元選定部)
 次に、図4C~図4Fに従って、局所特徴量生成部220における、次元選定部415に処理を説明する。
(Dimension selection part)
Next, processing will be described in the dimension selection unit 415 in the local feature amount generation unit 220 according to FIGS. 4C to 4F.
 次元選定部415は、サブ領域の位置関係に基づいて、近接するサブ領域の特徴ベクトル間の相関が低くなるように、局所特徴量として出力する次元(要素)を選定する(間引きする)。より具体的には、次元選定部415は、例えば、隣接するサブ領域間では少なくとも1つの勾配方向が異なるように次元を選定する。なお、本実施形態では、次元選定部415は近接するサブ領域として主に隣接するサブ領域を用いることとするが、近接するサブ領域は隣接するサブ領域に限られず、例えば、対象のサブ領域から所定距離内にあるサブ領域を近接するサブ領域とすることもできる。 The dimension selection unit 415 selects (decimates) a dimension (element) to be output as a local feature amount based on the positional relationship between the sub-regions so that the correlation between feature vectors of adjacent sub-regions becomes low. More specifically, for example, the dimension selection unit 415 selects dimensions so that at least one gradient direction differs between adjacent sub-regions. In the present embodiment, the dimension selection unit 415 mainly uses adjacent sub-regions as adjacent sub-regions. However, the adjacent sub-regions are not limited to adjacent sub-regions, for example, from the target sub-region. A sub-region within a predetermined distance may be a nearby sub-region.
 図4Cは、局所領域を5×5ブロックのサブ領域に分割し、勾配方向を6方向431aに量子化して生成された150次元の勾配ヒストグラムの特徴ベクトル431から次元を選定する場合の一例を示す図である。図4Cの例では、150次元(5×5=25サブ領域ブロック×6方向)の特徴ベクトルから次元の選定が行われている。 FIG. 4C shows an example of selecting a dimension from a feature vector 431 of a 150-dimensional gradient histogram generated by dividing a local region into 5 × 5 block sub-regions and quantizing gradient directions into six directions 431a. FIG. In the example of FIG. 4C, dimensions are selected from feature vectors of 150 dimensions (5 × 5 = 25 sub-region blocks × 6 directions).
  (局所領域の次元選定)
 図4Cは、局所特徴量生成部220における、特徴ベクトルの次元数の選定処理の様子を示す図である。
(Dimension selection of local area)
FIG. 4C is a diagram showing a state of the feature vector dimension number selection processing in the local feature quantity generation unit 220.
 図4Cに示すように、次元選定部415は、150次元の勾配ヒストグラムの特徴ベクトル431から半分の75次元の勾配ヒストグラムの特徴ベクトル432を選定する場合に、隣接する左右、上下のサブ領域ブロックでは、同一の勾配方向の次元が選定されないように、次元を選定することができる。 As shown in FIG. 4C, when selecting a feature vector 432 of a half 75-dimensional gradient histogram from a feature vector 431 of a 150-dimensional gradient histogram, the dimension selection unit 415 selects the left, right, upper and lower subregion blocks. The dimension can be selected so that the same gradient direction dimension is not selected.
 この例では、勾配方向ヒストグラムにおける量子化された勾配方向をq(q=0,1,2,3,4,5)とした場合に、q=0,2,4の要素を選定するブロックと、q=1,3,5の要素を選定するサブ領域ブロックとが交互に並んでいる。そして、図4Cの例では、隣接するサブ領域ブロックで選定された勾配方向を合わせると、全6方向となっている。 In this example, when the quantized gradient direction in the gradient direction histogram is q (q = 0, 1, 2, 3, 4, 5), a block for selecting elements of q = 0, 2, 4 and , Q = 1, 3, and 5 are alternately arranged with sub-region blocks for selecting elements. In the example of FIG. 4C, when the gradient directions selected in the adjacent sub-region blocks are combined, there are six directions.
 また、次元選定部415は、75次元の勾配ヒストグラムの特徴ベクトル432から50次元の勾配ヒストグラムの特徴ベクトル433を選定する場合は、斜め45度に位置するサブ領域ブロック間で、1つの方向のみが同一になる(残り1つの方向は異なる)ように次元を選定することができる。 In addition, when the dimension selection unit 415 selects the feature vector 433 of the 50-dimensional gradient histogram from the feature vector 432 of the 75-dimensional gradient histogram, only one direction exists between the sub-region blocks located at an angle of 45 degrees. The dimensions can be selected to be identical (the remaining one direction is different).
 また、次元選定部415は、50次元の勾配ヒストグラムの特徴ベクトル433から25次元の勾配ヒストグラムの特徴ベクトル434を選定する場合は、斜め45度に位置するサブ領域ブロック間で、選定される勾配方向が一致しないように次元を選定することができる。図4Cに示す例では、次元選定部415は、1次元から25次元までは各サブ領域から1つの勾配方向を選定し、26次元から50次元までは2つの勾配方向を選定し、51次元から75次元までは3つの勾配方向を選定している。 In addition, when the dimension selection unit 415 selects the feature vector 434 of the 25-dimensional gradient histogram from the feature vector 433 of the 50-dimensional gradient histogram, the gradient direction selected between the sub-region blocks located at an angle of 45 degrees. Dimension can be selected so that does not match. In the example shown in FIG. 4C, the dimension selection unit 415 selects one gradient direction from each sub-region from the first dimension to the 25th dimension, selects two gradient directions from the 26th dimension to the 50th dimension, and starts from the 51st dimension. Three gradient directions are selected up to 75 dimensions.
 このように、隣接するサブ領域ブロック間で勾配方向が重ならないように、また全勾配方向が均等に選定されることが望ましい。また同時に、図4Cに示す例のように、局所領域の全体から均等に次元が選定されることが望ましい。なお、図4Cに示した次元選定方法は一例であり、この選定方法に限らない。 As described above, it is desirable that the gradient directions should not be overlapped between adjacent sub-area blocks and that all gradient directions should be selected uniformly. At the same time, as in the example shown in FIG. 4C, it is desirable that dimensions be selected uniformly from the entire local region. Note that the dimension selection method illustrated in FIG. 4C is an example, and is not limited to this selection method.
  (局所領域の優先順位)
 図4Dは、局所特徴量生成部220における、サブ領域からの特徴ベクトルの選定順位の一例を示す図である。
(Local area priority)
FIG. 4D is a diagram illustrating an example of the selection order of feature vectors from sub-regions in the local feature value generation unit 220.
 次元選定部415は、単に次元を選定するだけではなく、特徴点の特徴に寄与する次元から順に選定するように、選定の優先順位を決定することができる。すなわち、次元選定部415は、例えば、隣接するサブ領域ブロック間では同一の勾配方向の次元が選定されないように、優先順位をつけて次元を選定することができる。そして、次元選定部415は、選定した次元から構成される特徴ベクトルを、局所特徴量として出力する。なお、次元選定部415は、優先順位に基づいて次元を並び替えた状態で、局所特徴量を出力することができる。 The dimension selection unit 415 can determine the priority of selection so as to select not only the dimension but also the dimension that contributes to the feature point feature in order. That is, the dimension selection unit 415 can select dimensions with priorities so that, for example, dimensions in the same gradient direction are not selected between adjacent sub-area blocks. Then, the dimension selection unit 415 outputs a feature vector composed of the selected dimensions as a local feature amount. Note that the dimension selection unit 415 can output the local feature amount in a state where the dimensions are rearranged based on the priority order.
 すなわち、次元選定部415は、1~25次元、26次元~50次元、51次元~75次元の間は、例えば図4Dの441に示すようなサブ領域ブロックの順番で次元を追加するように選定していってもよい。図4Dの441に示す優先順位を用いる場合、次元選定部415は、中心に近いサブ領域ブロックの優先順位を高くして、勾配方向を選定していくことができる。 That is, the dimension selection unit 415 selects between 1 to 25 dimensions, 26 dimensions to 50 dimensions, and 51 dimensions to 75 dimensions, for example, by adding dimensions in the order of the sub-region blocks as shown at 441 in FIG. 4D. You may do it. When the priority order indicated by 441 in FIG. 4D is used, the dimension selection unit 415 can select the gradient direction by increasing the priority order of the sub-region blocks close to the center.
 図4Eの451は、図4Dの選定順位に従って、150次元の特徴ベクトルの要素の番号の一例を示す図である。この例では、5×5=25ブロックをラスタスキャン順に番号p(p=0,1,…,25)で表し、量子化された勾配方向をq(q=0,1,2,3,4,5)とした場合に、特徴ベクトルの要素の番号を6×p+qとしている。 4E in FIG. 4E is a diagram illustrating an example of element numbers of 150-dimensional feature vectors in accordance with the selection order in FIG. 4D. In this example, 5 × 5 = 25 blocks are represented by numbers p (p = 0, 1,..., 25) in raster scan order, and the quantized gradient direction is represented by q (q = 0, 1, 2, 3, 4). , 5), the element number of the feature vector is 6 × p + q.
 図4Fの461は、図4Eの選定順位による150次元の順位が、25次元単位に階層化されていることを示す図である。すなわち、図4Fの461は、図4Dの441に示した優先順位に従って図4Eに示した要素を選定していくことにより得られる局所特徴量の構成例を示す図である。次元選定部415は、図4Fに示す順序で次元要素を出力することができる。具体的には、次元選定部415は、例えば150次元の局所特徴量を出力する場合、図4Fに示す順序で全150次元の要素を出力することができる。また、次元選定部415は、例えば25次元の局所特徴量を出力する場合、図4Fに示す1行目(76番目、45番目、83番目、…、120番目)の要素471を図4Fに示す順(左から右)に出力することができる。また、次元選定部415は、例えば50次元の局所特徴量を出力する場合、図4Fに示す1行目に加えて、図4Fに示す2行目の要素472を図4Fに示す順(左から右)に出力することができる。 461 in FIG. 4F is a diagram showing that the 150-dimensional order according to the selection order in FIG. 4E is hierarchized in units of 25 dimensions. That is, 461 in FIG. 4F is a diagram showing a configuration example of local feature amounts obtained by selecting the elements shown in FIG. 4E according to the priority order shown in 441 in FIG. 4D. The dimension selection unit 415 can output dimension elements in the order shown in FIG. 4F. Specifically, for example, when outputting a 150-dimensional local feature amount, the dimension selection unit 415 can output all 150-dimensional elements in the order shown in FIG. 4F. When the dimension selection unit 415 outputs, for example, a 25-dimensional local feature amount, the first line (76th, 45th, 83rd,..., 120th) element 471 shown in FIG. 4F is shown in FIG. 4F. Can be output in order (from left to right). For example, when outputting a 50-dimensional local feature amount, the dimension selecting unit 415 adds the element 472 in the second row shown in FIG. 4F in the order shown in FIG. To the right).
 ところで、図4Fに示す例では、局所特徴量は階層的な構造となっている。すなわち、例えば、25次元の局所特徴量と150次元の局所特徴量とにおいて、先頭の25次元分の局所特徴量における要素471~476の並びは同一となっている。このように、次元選定部415は、階層的(プログレッシブ)に次元を選定することにより、アプリケーションや通信容量、端末スペックなどに応じて、任意の次元数の局所特徴量、すなわち任意のサイズの局所特徴量を抽出して出力することができる。また、次元選定部415が、階層的に次元を選定し、優先順位に基づいて次元を並び替えて出力することにより、異なる次元数の局所特徴量を用いて、画像の照合を行うことができる。例えば、75次元の局所特徴量と50次元の局所特徴量を用いて画像の照合が行われる場合、先頭の50次元だけを用いることにより、局所特徴量間の距離計算を行うことができる。 By the way, in the example shown in FIG. 4F, the local feature amount has a hierarchical structure. That is, for example, in the 25-dimensional local feature value and the 150-dimensional local feature value, the arrangement of the elements 471 to 476 in the first 25-dimensional local feature value is the same. In this way, the dimension selection unit 415 selects a dimension hierarchically (progressively), and thereby, depending on the application, communication capacity, terminal specification, etc., the local feature quantity of an arbitrary number of dimensions, that is, the local size of an arbitrary size. Feature quantities can be extracted and output. Further, the dimension selection unit 415 can hierarchically select dimensions, rearrange the dimensions based on the priority order, and output them, thereby performing image matching using local feature amounts of different dimensions. . For example, when images are collated using a 75-dimensional local feature value and a 50-dimensional local feature value, the distance between the local feature values can be calculated by using only the first 50 dimensions.
 なお、図4Dの441から図4Fに示す優先順位は一例であり、次元を選定する際の順序はこれに限られない。例えば、ブロックの順番に関しては、図4Dの441の例の他に、図4Dの442や図4Dの443に示すような順番でもよい。また、例えば、すべてのサブ領域からまんべんなく次元が選定されるように優先順位が定められることとしてもよい。また、局所領域の中央付近が重要として、中央付近のサブ領域の選定頻度が高くなるように優先順位が定められることとしてもよい。また、次元の選定順序を示す情報は、例えば、プログラムにおいて規定されていてもよいし、プログラムの実行時に参照されるテーブル等(選定順序記憶部)に記憶されていてもよい。 Note that the priorities shown in FIG. 4D from 441 to FIG. 4F are examples, and the order of selecting dimensions is not limited to this. For example, regarding the order of blocks, in addition to the example of 441 in FIG. 4D, the order as shown in 442 in FIG. 4D and 443 in FIG. 4D may be used. Further, for example, the priority order may be set so that dimensions are selected from all the sub-regions. Also, the vicinity of the center of the local region may be important, and the priority order may be determined so that the selection frequency of the sub-region near the center is increased. Further, the information indicating the dimension selection order may be defined in the program, for example, or may be stored in a table or the like (selection order storage unit) referred to when the program is executed.
 また、次元選定部415は、サブ領域ブロックを1つ飛びに選択して、次元の選定を行ってもよい。すなわち、あるサブ領域では6次元が選定され、当該サブ領域に近接する他のサブ領域では0次元が選定される。このような場合においても、近接するサブ領域間の相関が低くなるようにサブ領域ごとに次元が選定されていると言うことができる。 Also, the dimension selection unit 415 may select a dimension by skipping one sub-area block. That is, 6 dimensions are selected in a certain sub-region, and 0 dimensions are selected in other sub-regions close to the sub-region. Even in such a case, it can be said that the dimension is selected for each sub-region so that the correlation between adjacent sub-regions becomes low.
 また、局所領域やサブ領域の形状は、正方形に限られず、任意の形状とすることができる。例えば、局所領域取得部412が、円状の局所領域を取得することとしてもよい。この場合、サブ領域分割部413は、円状の局所領域を例えば複数の局所領域を有する同心円に9分割や17分割のサブ領域に分割することができる。この場合においても、次元選定部415は、各サブ領域において、次元を選定することができる。 In addition, the shape of the local region and sub-region is not limited to a square, and can be any shape. For example, the local region acquisition unit 412 may acquire a circular local region. In this case, the sub-region dividing unit 413 can divide the circular local region into, for example, nine or seventeen sub-regions into concentric circles having a plurality of local regions. Even in this case, the dimension selection unit 415 can select a dimension in each sub-region.
 以上、図4B~図4Fに示したように、本実施形態の局所特徴量生成部220によれば、局所特徴量の情報量を維持しながら生成された特徴ベクトルの次元が階層的に選定される。この処理により、認識精度を維持しながらリアルタイムでの対象物認識と認識結果の表示が可能となる。なお、局所特徴量生成部220の構成および処理は本例に限定されない。認識精度を維持しながらリアルタイムでの対象物認識と認識結果の表示が可能となる他の処理が当然に適用できる。 As described above, as shown in FIGS. 4B to 4F, according to the local feature value generation unit 220 of this embodiment, the dimensions of the feature vectors generated while maintaining the information amount of the local feature values are hierarchically selected. The This processing enables real-time object recognition and recognition result display while maintaining recognition accuracy. Note that the configuration and processing of the local feature value generation unit 220 are not limited to this example. Naturally, other processes that enable real-time object recognition and recognition result display while maintaining recognition accuracy can be applied.
 《照合部》
 図4Gは、本実施形態に係る照合部240の処理を示す図である。図4Gは、認識対象物であるタワーについて、3段階の精度で照合を行なう例を示していが、精度の段階数などはこれに限定されない。
<Verification part>
FIG. 4G is a diagram illustrating processing of the collation unit 240 according to the present embodiment. FIG. 4G shows an example in which the tower that is the recognition target is collated with three stages of accuracy, but the number of stages of accuracy is not limited to this.
 図4Gの左図には、携帯端末としての映像処理装置200が図示されている。映像処理装置200の表示画面480の内、映像表示領域481には撮像部210で現在撮像中の映像が表示されている。また、指示ボタン表示領域482には複数の指示ボタンが表示されている。 4G shows a video processing apparatus 200 as a mobile terminal. Of the display screen 480 of the video processing apparatus 200, the video currently being captured by the imaging unit 210 is displayed in the video display area 481. A plurality of instruction buttons are displayed in the instruction button display area 482.
 図4Gの右図は、上から下に3段階の精度による照合を行なうための局所特徴量DB230中のデータが、模式図として示されている。ここで、黒丸は特徴点とその局所領域とを示している。 The right figure of FIG. 4G shows the data in the local feature DB 230 for collation with three levels of accuracy from top to bottom as a schematic diagram. Here, black circles indicate feature points and their local regions.
 右図の上483においては、特徴点が少ないため照合処理の処理時間は短くできるが精度は低くなる。右図の中484においては、特徴点が多くなり照合処理の精度は高くなるが処理時間は長くなる。右図の下485においては、特徴点がさらに多くなり照合処理の精度はより高くなるが処理時間はより長くなる。 In the upper 483 of the right figure, since the number of feature points is small, the processing time of the collation processing can be shortened, but the accuracy is low. In the right figure, 484, the number of feature points increases and the accuracy of the collation processing increases, but the processing time increases. In the lower part 485 of the right figure, the number of feature points is further increased, and the accuracy of the collation processing is higher, but the processing time is longer.
 なお、図4Gにおいては、図面を簡略化するため特徴点の数のみで精度調整を説明したが、局所領域のサイズや形状、あるいはサブ領域の分割数や次元数なども調整される。 In FIG. 4G, the accuracy adjustment has been described only with the number of feature points in order to simplify the drawing. However, the size and shape of the local area, the number of sub-area divisions, the number of dimensions, and the like are also adjusted.
 図4Gに示すように、照合部240は、局所特徴量DB230に格納されている局所特徴量483~485と局所特徴量が合致する各特徴点を細線のように関連付ける。そして、照合部240は、局所特徴量の所定割合以上が一致する場合を特徴点の合致とする。そして、照合部240は、関連付けられた特徴点の集合間の位置関係が線形関係であれば、認識対象物であると認識する。このような認識を行なえば、サイズの大小や向きの違い(視点の違い)、あるいは反転などによっても認識が可能である。また、所定数以上の関連付けられた特徴点があれば認識精度が得られるので、一部が視界から隠れていても認識対象物の認識が可能である。 As shown in FIG. 4G, the collation unit 240 associates the local feature amounts 483 to 485 stored in the local feature amount DB 230 with the feature points whose local feature amounts match like a thin line. Then, the matching unit 240 determines that the feature points match when a predetermined ratio or more of the local feature amounts match. And the collation part 240 will recognize that it is a recognition target object, if the positional relationship between the sets of the associated feature points is a linear relationship. If such recognition is performed, it is possible to recognize by size difference, orientation difference (difference in viewpoint), or inversion. In addition, since recognition accuracy can be obtained if there are a predetermined number or more of associated feature points, recognition objects can be recognized even if a part of them is hidden from view.
 《精度調整部》
 以下、図5A~図5C、図6および図7を参照して、精度調整部250の数例の構成を説明する。
<Accuracy adjustment unit>
Hereinafter, with reference to FIGS. 5A to 5C, FIG. 6, and FIG. 7, configurations of several examples of the accuracy adjustment unit 250 will be described.
 (第1の構成)
 図5Aは、本実施形態に係る精度調整部250の第1の構成250-1を示すブロック図である。精度調整部250の第1の構成250-1においては、次元数決定部511で次元数を決定可能である。
(First configuration)
FIG. 5A is a block diagram showing a first configuration 250-1 of the accuracy adjustment unit 250 according to the present embodiment. In the first configuration 250-1 of the accuracy adjustment unit 250, the dimension number can be determined by the dimension number determination unit 511.
 次元数決定部511は、次元選定部415において選定される次元数を決定することができる。例えば、次元数決定部511は、次元数を示す情報をユーザから受け付けることにより、次元数を決定することができる。なお、次元数を示す情報は、次元数そのものを示すものである必要はなく、例えば、照合精度や照合速度などを示す情報であってもよい。具体的には、次元数決定部511は、例えば、局所特徴量生成精度、通信精度および照合精度を高くすることを要求する入力を受け付けた場合には、次元数が多くなるように次元数を決定する。例えば、次元数決定部511は、局所特徴量生成速度、通信速度および照合速度を速くすることを要求する入力を受け付けた場合には、次元数が少なくなるように次元数を決定する。 The dimension number determination unit 511 can determine the number of dimensions selected by the dimension selection unit 415. For example, the dimension number determination unit 511 can determine the number of dimensions by receiving information indicating the number of dimensions from the user. Note that the information indicating the number of dimensions does not need to indicate the number of dimensions per se, and may be information indicating, for example, verification accuracy or verification speed. Specifically, for example, when receiving an input requesting that the local feature generation accuracy, communication accuracy, and matching accuracy be increased, the dimension number determination unit 511 sets the dimension number so that the number of dimensions increases. decide. For example, the dimension number determination unit 511 determines the number of dimensions so as to reduce the number of dimensions when receiving an input requesting to increase the local feature generation speed, the communication speed, and the collation speed.
 なお、次元数決定部511は、画像から検出されるすべての特徴点に対して同じ次元数を決定することとしてもよいし、特徴点ごとに異なる次元数を決定してもよい。例えば、次元数決定部511は、外部情報によって特徴点の重要度が与えられた場合に、重要度の高い特徴点は次元数を多くし、重要度の低い特徴点は次元数を少なくしてもよい。このようにして、照合精度と、局所特徴量生成速度、通信速度および照合速度とを考慮に入れて、次元数を決定することができる。 Note that the dimension number determining unit 511 may determine the same dimension number for all feature points detected from the image, or may determine a different dimension number for each feature point. For example, when the importance of feature points is given by external information, the dimension number determination unit 511 increases the number of dimensions for feature points with high importance and decreases the number of dimensions for feature points with low importance. Also good. In this way, the number of dimensions can be determined in consideration of the matching accuracy, the local feature generation speed, the communication speed, and the matching speed.
 本実施形態においては、他の精度に関連する条件が同じであれば、認識対象物に適切な次元数を決定したり、適切な次元数の前後で次元数を変化させたりする処理が考えられる。 In the present embodiment, if other conditions related to accuracy are the same, a process of determining an appropriate dimension number for the recognition object or changing the dimension number before and after the appropriate dimension number can be considered. .
 (第2の構成)
 図5Bは、本実施形態に係る精度調整部250の第2の構成250-2を示すブロック図である。精度調整部250の第2の構成250-2においては、特徴ベクトル拡張部512が複数次元の値をまとめることで、次元数を変更することが可能である。
(Second configuration)
FIG. 5B is a block diagram showing a second configuration 250-2 of the accuracy adjustment unit 250 according to the present embodiment. In the second configuration 250-2 of the accuracy adjustment unit 250, the feature vector expansion unit 512 can change the number of dimensions by collecting values of a plurality of dimensions.
 特徴ベクトル拡張部512は、サブ領域特徴ベクトル生成部414から出力された特徴ベクトルを用いて、より大きなスケール(拡張分割領域)における次元を生成することにより、特徴ベクトルを拡張することができる。なお、特徴ベクトル拡張部512は、サブ領域特徴ベクトル生成部414から出力される特徴ベクトルのみの情報を用いて特徴ベクトルを拡張することができる。したがって、特徴ベクトルを拡張するために元の画像に戻って特徴抽出を行う必要がないため、元の画像から特徴ベクトルを生成する処理時間と比較すると、特徴ベクトルの拡張にかかる処理時間はごくわずかである。例えば、特徴ベクトル拡張部512は、隣接するサブ領域の勾配方向ヒストグラムを合成することにより、新たな勾配方向ヒストグラムを生成してもよい。 The feature vector extending unit 512 can extend the feature vector by generating a dimension in a larger scale (extended divided region) using the feature vector output from the sub-region feature vector generating unit 414. Note that the feature vector extension unit 512 can extend the feature vector using only the feature vector information output from the sub-region feature vector generation unit 414. Therefore, since it is not necessary to return to the original image and perform feature extraction in order to extend the feature vector, the processing time for extending the feature vector is negligible compared to the processing time for generating the feature vector from the original image. It is. For example, the feature vector extending unit 512 may generate a new gradient direction histogram by combining gradient direction histograms of adjacent sub-regions.
 図5Cは、本実施形態に係る精度調整部250の第2の構成250-2による処理を説明する図である。図5Cでは、2×2=4ブロックの勾配ヒストグラムの総和を拡張した各ブロックとすることにより、精度を向上させながら次元数を変更できる。 FIG. 5C is a diagram for explaining processing by the second configuration 250-2 of the accuracy adjustment unit 250 according to the present embodiment. In FIG. 5C, the number of dimensions can be changed while improving accuracy by using each block obtained by expanding the sum of the gradient histograms of 2 × 2 = 4 blocks.
 図5Cに示すように、特徴ベクトル拡張部512は、例えば、5×5×6次元(150次元)の勾配方向ヒストグラム531を拡張することにより、4×4×6次元(96次元)の勾配方向ヒストグラム541を生成することができる。すなわち、531aの太線で示す4ブロックが541aの1ブロックにまとめられる。また、531bの破線で示す4ブロックが541bの1ブロックにまとめられる。 As illustrated in FIG. 5C, the feature vector extending unit 512 expands a gradient direction histogram 531 of 5 × 5 × 6 dimensions (150 dimensions), for example, thereby increasing a gradient direction of 4 × 4 × 6 dimensions (96 dimensions). A histogram 541 can be generated. That is, four blocks indicated by a bold line 531a are combined into one block 541a. In addition, four blocks indicated by a broken line 531b are combined into one block 541b.
 同様に、特徴ベクトル拡張部512は、5×5×6次元(150次元)の勾配方向ヒストグラム541から、隣接する3×3ブロックの勾配方向ヒストグラムの総和をとることにより、3×3×6次元(54次元)の勾配方向ヒストグラム551を生成することも可能である。すなわち、541cの太線で示す4ブロックが551cの1ブロックにまとめられる。また、541dの破線で示す4ブロックが551dの1ブロックにまとめられる。 Similarly, the feature vector extension unit 512 obtains 3 × 3 × 6 dimensions by taking the sum of the gradient direction histograms of adjacent 3 × 3 blocks from the 5 × 5 × 6 dimensions (150 dimensions) of the gradient direction histogram 541. It is also possible to generate a (54-dimensional) gradient direction histogram 551. That is, four blocks indicated by a bold line 541c are combined into one block 551c. Also, four blocks indicated by a broken line 541d are combined into one block 551d.
 なお、次元選定部415が、5×5×6次元(150次元)の勾配方向ヒストグラム531を5×5×3次元(75次元)の勾配方向ヒストグラム532に次元選定する場合には、4×4×6次元(96次元)の勾配方向ヒストグラム541は、4×4×6次元(96次元)の勾配方向ヒストグラム542となる。また、3×3×6次元(54次元)の勾配方向ヒストグラム551は、3×3×3次元(27次元)の勾配方向ヒストグラム552となる。 When the dimension selecting unit 415 selects a 5 × 5 × 6 dimensional (150 dimensions) gradient direction histogram 531 as a 5 × 5 × 3 dimensional (75 dimensions) gradient direction histogram 532, 4 × 4. The gradient direction histogram 541 of × 6 dimensions (96 dimensions) is a gradient direction histogram 542 of 4 × 4 × 6 dimensions (96 dimensions). Further, the 3 × 3 × 6 dimension (54 dimensions) gradient direction histogram 551 becomes a 3 × 3 × 3 dimension (27 dimensions) gradient direction histogram 552.
 (第3の構成)
 図6は、本実施形態に係る精度調整部250の第3の構成250-3を示すブロック図である。精度調整部250の第3の構成250-3においては、特徴点選定部611が特徴点選定で特徴点数を変更することで、精度を維持しながら局所特徴量のデータ量を変更することが可能である。
(Third configuration)
FIG. 6 is a block diagram showing a third configuration 250-3 of the accuracy adjustment unit 250 according to the present embodiment. In the third configuration 250-3 of the accuracy adjustment unit 250, the feature point selection unit 611 can change the data amount of the local feature amount while maintaining the accuracy by changing the number of feature points by the feature point selection. It is.
 特徴点選定部611は、例えば、選定される特徴点の「指定数」を示す指定数情報をあらかじめ保持しておくことができる。また、指定数情報は、指定数そのものを示す情報であってもよいし、画像における局所特徴量のトータルサイズ(例えばバイト数)を示す情報であってもよい。指定数情報が、画像における局所特徴量のトータルサイズを示す情報である場合、特徴点選定部611は、例えば、トータルサイズを、1つの特徴点における局所特徴量のサイズで割ることにより、指定数を算出することができる。また、全特徴点に対してランダムに重要度を付与し、重要度が高い順に特徴点を選定することができる。そして、指定数の特徴点を選定した時点で、選定した特徴点に関する情報を選定結果として出力することができる。また、特徴点情報に基づいて、全特徴点のスケールの中で、特定のスケール領域に含まれる特徴点のみを選定することができる。そして、選定された特徴点が指定数よりも多い場合、例えば、重要度に基づいて特徴点を指定数まで削減し、選定した特徴点に関する情報を選定結果として出力することができる。 The feature point selection unit 611 can hold, for example, designated number information indicating the “designated number” of feature points to be selected in advance. The designated number information may be information indicating the designated number itself, or information indicating the total size (for example, the number of bytes) of the local feature amount in the image. When the designated number information is information indicating the total size of the local feature amount in the image, the feature point selecting unit 611 divides the total size by the size of the local feature amount at one feature point, for example. Can be calculated. Also, importance can be given to all feature points at random, and feature points can be selected in descending order of importance. Then, when a specified number of feature points are selected, information about the selected feature points can be output as a selection result. Also, based on the feature point information, only feature points included in a specific scale region can be selected from the scales of all feature points. When the number of selected feature points is larger than the designated number, for example, the feature points can be reduced to the designated number based on the importance, and information on the selected feature points can be output as a selection result.
 (第4の構成)
 図7は、本実施形態に係る精度調整部250の第4の構成250-4を示すブロック図である。精度調整部250の第4の構成250-4においては、次元数決定部511と特徴点選定部611とが協働しながら、精度を維持しながら局所特徴量のデータ量を変更する。
(Fourth configuration)
FIG. 7 is a block diagram showing a fourth configuration 250-4 of the accuracy adjustment unit 250 according to the present embodiment. In the fourth configuration 250-4 of the accuracy adjusting unit 250, the dimension number determining unit 511 and the feature point selecting unit 611 cooperate to change the data amount of the local feature amount while maintaining the accuracy.
 第4の構成250-4における次元数決定部511と特徴点選定部611との関係は種々考えられる。例えば、特徴点選定部611は、次元数決定部511により決定された特徴点数に基づいて、特徴点を選定することができる。また、次元数決定部511は、特徴点選定部611が選定した指定特徴量サイズおよび決定した特徴点数に基づいて、特徴量サイズが指定特徴量サイズになるように、選定次元数を決定することができる。また、特徴点選定部611は、特徴点検出部411から出力される特徴点情報に基づいて特徴点の選定を行う。そして、特徴点選定部611は、選定した各特徴点の重要度を示す重要度情報を次元数決定部511に出力し、次元数決定部511は、重要度情報に基づいて、次元選定部415で選定される次元数を特徴点ごとに決定することができる。 Various relationships between the dimension number determination unit 511 and the feature point selection unit 611 in the fourth configuration 250-4 can be considered. For example, the feature point selection unit 611 can select a feature point based on the number of feature points determined by the dimension number determination unit 511. Further, the dimension number determining unit 511 determines the selected dimension number so that the feature amount size becomes the specified feature amount size based on the specified feature amount size selected by the feature point selecting unit 611 and the determined feature point number. Can do. In addition, the feature point selection unit 611 selects feature points based on the feature point information output from the feature point detection unit 411. Then, the feature point selection unit 611 outputs importance level information indicating the importance level of each selected feature point to the dimension number determination unit 511, and the dimension number determination unit 511 based on the importance level information. The number of dimensions selected in (1) can be determined for each feature point.
 (局所特徴量生成データ)
 図8は、本実施形態に係る局所特徴量生成データ800の構成を示す図である。これらのデータは、図12のRAM1240に記憶保持される。
(Local feature generation data)
FIG. 8 is a diagram showing a configuration of local feature value generation data 800 according to the present embodiment. These data are stored and held in the RAM 1240 of FIG.
 局所特徴量生成データ800には、入力画像ID801に対応付けて、複数の検出された検出特徴点802,特徴点座標803および特徴点に対応する局所領域情報804が記憶される。そして、各検出特徴点802,特徴点座標803および局所領域情報804に対応付けて、複数のサブ領域ID805,サブ領域情報806,各サブ領域に対応する特徴ベクトル807および優先順位を含む選定次元808が記憶される。 In the local feature amount generation data 800, a plurality of detected feature points 802, feature point coordinates 803, and local region information 804 corresponding to the feature points are stored in association with the input image ID 801. Then, a plurality of sub-region IDs 805, sub-region information 806, a feature vector 807 corresponding to each sub-region, and a selection dimension 808 including a priority order are associated with each detected feature point 802, feature point coordinates 803, and local region information 804. Is memorized.
 以上のデータから各検出特徴点502に対して生成された局所特徴量509が記憶される。 The local feature quantity 509 generated for each detected feature point 502 from the above data is stored.
 (局所特徴量DB)
 図9は、本実施形態に係る局所特徴量DB230の構成を示す図である。
(Local feature DB)
FIG. 9 is a diagram showing a configuration of the local feature DB 230 according to the present embodiment.
 局所特徴量DB230は、認識対象物ID901と認識対象物名902に対応付けて、第1番局所特徴量903、第2番局所特徴量904、…、第m番局所特徴量905を記憶する。各局所特徴量は、図4Fにおける5×5のサブ領域に対応して、25次元ずつに階層化された1次元から150次元の要素からなる特徴ベクトルを記憶する。 The local feature DB 230 stores a first local feature 903, a second local feature 904,..., An mth local feature 905 in association with the recognition object ID 901 and the recognition object name 902. Each local feature quantity stores a feature vector composed of 1-dimensional to 150-dimensional elements hierarchized by 25 dimensions corresponding to the 5 × 5 sub-region in FIG. 4F.
 なお、mは正の整数であり、認識対象物に対応して異なる数でよい。また、本実施形態においては、それぞれの局所特徴量と共に照合処理に使用される特徴点座標が記憶される。 Note that m is a positive integer and may be a different number corresponding to the recognition object. In the present embodiment, the feature point coordinates used for the matching process are stored together with the respective local feature amounts.
 (精度調整パラメータ)
 図10は、本実施形態に係る精度調整パラメータ1000の構成を示す図である。
(Accuracy adjustment parameter)
FIG. 10 is a diagram showing a configuration of the accuracy adjustment parameter 1000 according to the present embodiment.
 精度調整パラメータ1000は、特徴点パラメータ1001として、特徴点数や特徴点とするか否かを選別する特徴点選別閾値などを記憶する。また、局所領域パラメータ1002として、ガウス窓に対応する面積(サイズ)や矩形や円などを示す形状などを記憶する。また、サブ領域パラメータ1003として、局所領域の分割数や形状などを記憶する。また、特徴ベクトルパラメータ1004として、方向数(例えば8方向や6方向など)や次元数、次元の選択方法などを記憶する。 The accuracy adjustment parameter 1000 stores, as the feature point parameter 1001, a feature point selection threshold for selecting whether or not to use a feature point, a feature point, and the like. Further, as the local region parameter 1002, an area (size) corresponding to a Gaussian window, a shape indicating a rectangle, a circle, or the like is stored. In addition, as the sub region parameter 1003, the number of divisions and the shape of the local region are stored. Further, as the feature vector parameter 1004, the number of directions (for example, 8 directions and 6 directions), the number of dimensions, a dimension selection method, and the like are stored.
 なお、図10に示した精度調整パラメータは一例であって、これに限定されない。 Note that the accuracy adjustment parameter shown in FIG. 10 is an example, and the present invention is not limited to this.
 (信頼度判定データ)
 図11は、本実施形態に係る信頼度判定テーブル1100の構成を示す図である。
(Reliability judgment data)
FIG. 11 is a diagram showing a configuration of the reliability determination table 1100 according to the present embodiment.
 信頼度判定テーブル1100は、照合結果の認識対象物ID1101および認識対象物名1102に対応付けて、特徴点数1103、照合処理における特徴点一致率1104、特徴ベクトル次元数1105、特徴ベクトル平均一致率1106、線形変換適合率1107などを記憶する。これらのデータに基づいて、認識結果の信頼度1108が決定される。なお、信頼度判定テーブル1100は、これらのデータと信頼度との関係をあらかじめ設定して、記憶しておいてもよい。 The reliability determination table 1100 is associated with the recognition object ID 1101 and the recognition object name 1102 of the matching result, and has a feature point count 1103, a feature point matching rate 1104 in the matching process, a feature vector dimension number 1105, and a feature vector average matching rate 1106 , Linear conversion matching rate 1107 and the like are stored. Based on these data, the reliability 1108 of the recognition result is determined. The reliability determination table 1100 may store the relationship between these data and reliability in advance.
 《映像処理装置のハードウェア構成》
 図12は、本実施形態に係る映像処理装置200のハードウェア構成を示すブロック図である。
<< Hardware configuration of video processing device >>
FIG. 12 is a block diagram showing a hardware configuration of the video processing apparatus 200 according to the present embodiment.
 図12で、CPU1210は演算制御用のプロセッサであり、プログラムを実行することで携帯端末である映像処理装置200の各機能構成部を実現する。ROM1220は、初期データおよびプログラムなどの固定データおよびプログラムを記憶する。また、通信制御部1230は通信制御部であり、本実施形態においては、ネットワークを介して他の装置と通信する。なお、CPU1210は1つに限定されず、複数のCPUであっても、あるいは画像処理用のGPU(Graphics Processing Unit)を含んでもよい。 In FIG. 12, a CPU 1210 is a processor for arithmetic control, and implements each functional component of the video processing device 200 that is a portable terminal by executing a program. The ROM 1220 stores fixed data and programs such as initial data and programs. The communication control unit 1230 is a communication control unit, and in the present embodiment, communicates with other devices via a network. Note that the number of CPUs 1210 is not limited to one, and may be a plurality of CPUs or may include a GPU (GraphicsGraphProcessing Unit) for image processing.
 RAM1240は、CPU1210が一時記憶のワークエリアとして使用するランダムアクセスメモリである。RAM1240には、本実施形態の実現に必要なデータを記憶する領域が確保されている。入力映像1241は、撮像部210が撮像して入力された入力映像を記憶する領域である。特徴点データ1242は、入力映像1241から検出した特徴点座標、スケール、角度を含む特徴点データを記憶する領域である。局所特徴量生成テーブル800は、図8で示した局所特徴量生成テーブルを記憶する領域である。精度調整パラメータ1000は、図10で示した精度調整パラメータを記憶する領域である。信頼度判定テーブル1100は、図11で示した信頼度判定テーブルを記憶する領域である。照合結果1243は、入力映像から生成された局所特徴量と局所特徴量DB230に格納された局所特徴量との照合から認識された、照合結果を記憶する領域である。照合結果表示データ1244は、照合結果1243をユーザに報知するための照合結果表示データを記憶する領域である。なお、音声出力をする場合には、照合結果音声データが含まれてもよい。入力映像/照合結果重畳データ1245は、入力映像1241に照合結果1243を重畳した表示部280に表示される入力映像/照合結果重畳データを記憶する領域である。入出力データ1246は、入出力インタフェース1260を介して入出力される入出力データを記憶する領域である。送受信データ1247は、通信制御部1230を介して送受信される送受信データを記憶する領域である。 The RAM 1240 is a random access memory that the CPU 1210 uses as a work area for temporary storage. The RAM 1240 has an area for storing data necessary for realizing the present embodiment. The input video 1241 is an area for storing the input video input by the imaging unit 210. The feature point data 1242 is an area for storing feature point data including feature point coordinates, scales, and angles detected from the input video 1241. The local feature value generation table 800 is an area for storing the local feature value generation table shown in FIG. The accuracy adjustment parameter 1000 is an area for storing the accuracy adjustment parameter shown in FIG. The reliability determination table 1100 is an area for storing the reliability determination table shown in FIG. The matching result 1243 is an area for storing the matching result recognized from the matching between the local feature amount generated from the input video and the local feature amount stored in the local feature amount DB 230. The collation result display data 1244 is an area for storing collation result display data for notifying the user of the collation result 1243. In addition, when outputting a voice, collation result voice data may be included. The input video / collation result superimposition data 1245 is an area for storing input video / collation result superimposition data displayed on the display unit 280 in which the collation result 1243 is superimposed on the input video 1241. The input / output data 1246 is an area for storing input / output data input / output via the input / output interface 1260. Transmission / reception data 1247 is an area for storing transmission / reception data transmitted / received via the communication control unit 1230.
 ストレージ1250には、データベースや各種のパラメータ、あるいは本実施形態の実現に必要な以下のデータまたはプログラムが記憶されている。局所特徴量DB230は、図9に示した局所特徴量DBが格納される領域である。照合結果表示フォーマット1251は、照合結果を表示するフォーマットを生成するために使用される照合結果表示フォーマットが格納される領域である。 The storage 1250 stores a database, various parameters, or the following data or programs necessary for realizing the present embodiment. The local feature DB 230 is an area in which the local feature DB shown in FIG. 9 is stored. The collation result display format 1251 is an area in which a collation result display format used for generating a format for displaying the collation result is stored.
 ストレージ1250には、以下のプログラムが格納される。携帯端末制御プログラム1252には、本映像処理装置200の全体を制御する携帯端末制御プログラムが格納される領域である。局所特徴量生成モジュール1253は、携帯端末制御プログラム1252において、入力映像から図4B~図4Fに従って局所特徴量を生成する局所特徴量生成モジュールが格納される領域である。照合制御モジュール1254は、携帯端末制御プログラム1252において、入力映像から生成された局所特徴量と局所特徴量DB230に格納された局所特徴量とを照合する照合制御モジュールが格納される領域である。照合結果報知モジュール1255は、携帯端末制御プログラム1252において、照合結果を表示または音声によりユーザに報知するための照合結果報知モジュールが格納される領域である。精度調整モジュール1256は、携帯端末制御プログラム1252において、照合結果に基づき局所特徴量生成部220の精度調整を行なう精度調整モジュールが格納され領域である。 The storage 1250 stores the following programs. The mobile terminal control program 1252 is an area in which a mobile terminal control program for controlling the entire video processing apparatus 200 is stored. The local feature value generation module 1253 is an area in which a local feature value generation module that generates a local feature value from an input video according to FIGS. 4B to 4F in the mobile terminal control program 1252 is stored. The collation control module 1254 is an area in which the collation control module that collates the local feature amount generated from the input video and the local feature amount stored in the local feature amount DB 230 in the portable terminal control program 1252 is stored. The collation result notification module 1255 is an area in which a collation result notification module for notifying the user of the collation result by display or voice in the mobile terminal control program 1252 is stored. The accuracy adjustment module 1256 is an area in which an accuracy adjustment module that adjusts the accuracy of the local feature generation unit 220 based on the collation result is stored in the portable terminal control program 1252.
 入出力インタフェース1260は、入出力機器との入出力データをインタフェースする。入出力インタフェース1260には、表示部280、操作部290であるタッチパネルやキーボード、スピーカ1264、マイク1265、撮像部210が接続される。入出力機器は上記例に限定されない。また、GPS(Global Positioning System)位置生成部1266は、GPS衛星からの信号に基づいて現在位置を取得する。 The input / output interface 1260 interfaces input / output data with input / output devices. The input / output interface 1260 is connected to a display unit 280, a touch panel and keyboard as the operation unit 290, a speaker 1264, a microphone 1265, and an imaging unit 210. The input / output device is not limited to the above example. The GPS (Global Positioning System) position generation unit 1266 acquires the current position based on a signal from a GPS satellite.
 なお、図12には、本実施形態に必須なデータやプログラムのみが示されており、本実施形態に関連しないデータやプログラムは図示されていない。 In FIG. 12, only data and programs essential to the present embodiment are shown, and data and programs not related to the present embodiment are not shown.
 《映像処理装置の処理手順》
 図13は、本実施形態に係る映像処理装置200の処理手順を示すフローチャートである。このフローチャートは、図12のCPU1210によってRAM1240を用いて実行され、図2の各機能構成部を実現する。
《Processing procedure of video processing device》
FIG. 13 is a flowchart showing a processing procedure of the video processing apparatus 200 according to the present embodiment. This flowchart is executed by the CPU 1210 in FIG. 12 using the RAM 1240, and implements each functional component in FIG.
 まず、ステップS1311において、対象物認識を行なうための映像入力があったか否かを判定する。また、携帯端末の機能として、ステップS1331においては受信を判定し、ステップS1331においては送信を判定する。いずれでもなければ、ステップS1341において他の処理を行なう。 First, in step S1311, it is determined whether or not there is a video input for performing object recognition. As a function of the portable terminal, reception is determined in step S1331, and transmission is determined in step S1331. Otherwise, other processing is performed in step S1341.
 映像入力があればステップS1313に進んで、初期性痔調整パラメータを設定する。そして、ステップS1315において、入力映像から局所特徴量生成処理を実行する(図14A参照)。次に、ステップS1317において、照合処理を実行する(図14B参照)。ステップS1319においては、認識結果の信頼度が閾値Thを超えるかを判定する。信頼度が閾値Th以下であればステップS1321に進んで、精度調整パラメータを更新する。そして、ステップS1315に戻って、更新された精度調整パラメータにより精度の高い局所特徴量を読出して照合を繰り返す。 If there is a video input, the process proceeds to step S1313 to set initial habit adjustment parameters. In step S1315, local feature generation processing is executed from the input video (see FIG. 14A). Next, in step S1317, collation processing is executed (see FIG. 14B). In step S1319, it is determined whether the reliability of the recognition result exceeds the threshold Th. If the reliability is equal to or less than the threshold Th, the process proceeds to step S1321 to update the accuracy adjustment parameter. Then, the process returns to step S1315, and local features with high accuracy are read using the updated accuracy adjustment parameter, and matching is repeated.
 信頼度が閾値Thを超えていればステップS1325に進んで、照合処理の結果を入力映像に重畳して映像/照合結果重畳表示処理を実行する。そして、ステップS1325において、対象物認識を行なう処理を終了するかを判定する。終了は、例えば図4Gの指示ボタン表示領域482にあるリセットボタンで行なわれる。終了でなければステップS1313に戻って、映像入力の対象物認識を繰り返す。 If the reliability exceeds the threshold Th, the process advances to step S1325 to superimpose the result of the collation process on the input video and execute the video / collation result superimposed display process. Then, in step S1325, it is determined whether or not to finish the object recognition process. The end is performed by, for example, a reset button in the instruction button display area 482 of FIG. 4G. If not completed, the process returns to step S1313 to repeat the object recognition for video input.
 受信であり、局所特徴量DB用のデータをダウンロードする場合は、ステップS1333において局所特徴量DB用データを受信して、ステップS1335において局所特徴量DBに記憶する。一方、その他の携帯端末としてのデータ受信であれば、ステップS1337において受信処理を行なう。また、送信であり、局所特徴量DB用のデータをアップロードする場合は、ステップS1343において入力映像から生成した局所特徴量を局所特徴量DB用データとして送信する。一方、その他の携帯端末としてのデータ送信であれば、ステップS1345において送信処理を行なう。携帯端末としてのデータ送受信処理については、本実施形態の特徴ではないので詳細な説明は省略する。 In the case of receiving and downloading the local feature DB data, the local feature DB data is received in step S1333 and stored in the local feature DB in step S1335. On the other hand, if it is data reception as another portable terminal, reception processing is performed in step S1337. In the case of transmission and uploading data for local feature DB, the local feature generated from the input video is transmitted as local feature DB data in step S1343. On the other hand, if it is data transmission as another portable terminal, transmission processing is performed in step S1345. The data transmission / reception processing as a portable terminal is not a feature of the present embodiment, and thus detailed description thereof is omitted.
 (局所特徴量生成処理)
 図14Aは、本実施形態に係る局所特徴量生成処理S1315の処理手順を示すフローチャートである。
(Local feature generation processing)
FIG. 14A is a flowchart illustrating a processing procedure of local feature generation processing S1315 according to the present embodiment.
 まず、ステップS1411において、入力映像から特徴点の位置座標、スケール、角度を検出する。ステップS1413において、ステップS1411で検出された特徴点の1つに対して局所領域を取得する。次に、ステップS1415において、局所領域をサブ領域に分割する。ステップS1417においては、各サブ領域の特徴ベクトルを生成して局所領域の特徴ベクトルを生成する。ステップS1411からS1417の処理は図4Bに図示されている。 First, in step S1411, the position coordinates, scale, and angle of the feature points are detected from the input video. In step S1413, a local region is acquired for one of the feature points detected in step S1411. Next, in step S1415, the local area is divided into sub-areas. In step S1417, a feature vector for each sub-region is generated to generate a feature vector for the local region. The processing from step S1411 to S1417 is illustrated in FIG. 4B.
 次に、ステップS1419において、ステップS1417において生成された局所領域の特徴ベクトルに対して次元選定を実行する。次元選定については、図4D~図4Fに図示されている。 Next, in step S1419, dimension selection is performed on the feature vector of the local region generated in step S1417. The dimension selection is illustrated in FIGS. 4D to 4F.
 ステップS1421においては、ステップS1411で検出した全特徴点について局所特徴量の生成と次元選定とが終了したかを判定する。終了していない場合はステップS1413に戻って、次の1つの特徴点について処理を繰り返す。 In step S1421, it is determined whether the generation of local features and dimension selection have been completed for all feature points detected in step S1411. If not completed, the process returns to step S1413 to repeat the process for the next one feature point.
 (照合処理)
 図14Bは、本実施形態に係る照合処理の処理手順S1317を示すフローチャートである。
(Verification process)
FIG. 14B is a flowchart showing the processing procedure S1317 of the collation processing according to this embodiment.
 まず、ステップS1431において、初期化として、パラメータp=1,q=0を設定する。次に、ステップS1433において、ステップS1315において生成した局所特徴量の次元数jを取得する。 First, in step S1431, parameters p = 1 and q = 0 are set as initialization. Next, in step S1433, the dimension number j of the local feature amount generated in step S1315 is acquired.
 ステップS1435~S1445のループにおいて、p>m(m=認識対象物の特徴点数)となるまで各局所特徴量の照合を繰り返す。まず、ステップS1435において、局所特徴量DB230に格納された認識対象物の第p番局所特徴量の次元数jのデータを取得する。すなわち、最初の1次元からj次元を取得する。次に、ステップS1437において、ステップS1435において取得した第p番局所特徴量と入力映像から生成した全特徴点の局所特徴量を順に照合して、類似か否かを判定する。ステップS1439においては、局所特徴量間の照合の結果から類似度が閾値αを超えるか否かを判断し、超える場合はステップS1441において、局所特徴量と、入力映像と認識対象物とにおける合致した特徴点の位置関係との組みを記憶する。そして、合致した特徴点数のパラメータであるqを1つカウントアップする。ステップS1443においては、認識対象物の特徴点を次の特徴点に進め(p←p+1)、認識対象物の全特徴点の照合が終わってない場合には(p≦m)、ステップS1435に戻って合致する局所特徴量の照合を繰り返す。なお、閾値αは、認識対象物によって求められる認識精度に対応して変更可能である。ここで、他の認識対象物との相関が低い認識対象物であれば認識精度を低くしても、正確な認識が可能である。 In the loop of steps S1435 to S1445, the collation of each local feature amount is repeated until p> m (m = number of feature points of recognition object). First, in step S1435, the data of the dimension number j of the p-th local feature amount of the recognition target stored in the local feature amount DB 230 is acquired. That is, the j dimension is acquired from the first one dimension. Next, in step S1437, the p-th local feature value acquired in step S1435 and the local feature values of all feature points generated from the input video are sequentially checked to determine whether or not they are similar. In step S1439, it is determined whether or not the similarity exceeds the threshold value α from the result of matching between the local feature quantities. If so, the local feature quantity matches the input video and the recognition object in step S1441. A combination with the positional relationship of feature points is stored. Then, q, which is a parameter for the number of matched feature points, is incremented by one. In step S1443, the feature point of the recognition target object is advanced to the next feature point (p ← p + 1). If all feature points of the recognition target object have not been matched (p ≦ m), the process returns to step S1435. Repeat matching of matching local features. Note that the threshold value α can be changed according to the recognition accuracy required by the recognition object. Here, if the recognition object has a low correlation with other recognition objects, accurate recognition is possible even if the recognition accuracy is lowered.
 認識対象物の全特徴点との照合が終了すると、ステップS1445からS1447に進んで、ステップS1447~S1453において、認識対象物が入力映像に存在するか否かが判定される。まず、ステップS1447において、認識対象物の特徴点数pの内で入力映像の特徴点の局所特徴量と合致した特徴点数qの割合が、閾値βを超えたか否かを判定する。超えていればステップS1449に進んで、認識対象物候補として、さらに、入力映像の特徴点と認識対象物の特徴点との位置関係が、線形変換が可能な関係を有しているかを判定する。すなわち、ステップS1441において局所特徴量が合致したとして記憶した、入力映像の特徴点と認識対象物の特徴点との位置関係が、回転や反転、視点の位置変更などの変化によっても可能な位置関係なのか、不可能な位置関係なのかを判定する。かかる判定方法は幾何学的に既知であるので、詳細な説明は省略する。ステップS1451において、整形変換可能か否かの判定結果により、線形変換可能であればステップS1453に進んで、照合した認識対象物が入力映像に存在すると判定する。なお、閾値βは、認識対象物によって求められる認識精度に対応して変更可能である。ここで、他の認識対象物との相関が低い、あるいは一部分からでも特徴が判断可能な認識対象物であれば合致した特徴点が少なくても、正確な認識が可能である。すなわち、一部分が隠れて見えなくても、あるいは特徴的な一部分が見えてさえいれば、対象物の認識が可能である。 When collation with all feature points of the recognition target object is completed, the process proceeds from step S1445 to S1447, and it is determined in steps S1447 to S1453 whether the recognition target object exists in the input video. First, in step S1447, it is determined whether or not the ratio of the feature point number q that matches the local feature amount of the feature point of the input video among the feature point number p of the recognition target object exceeds the threshold value β. If exceeded, the process proceeds to step S1449, and it is further determined as a recognition target object whether the positional relationship between the feature point of the input video and the feature point of the recognition target object has a relationship that allows linear transformation. . That is, the positional relationship between the feature point of the input image and the feature point of the recognition target that is stored as the local feature amount matches in step S1441 is possible even by a change such as rotation, inversion, or change of the viewpoint position. It is determined whether it is a positional relationship that is impossible or impossible. Since such a determination method is geometrically known, detailed description thereof is omitted. If it is determined in step S1451 that the shape conversion is possible, if linear conversion is possible, the process proceeds to step S1453, where it is determined that the collated recognition target exists in the input video. The threshold value β can be changed according to the recognition accuracy required by the recognition object. Here, accurate recognition is possible even if there are few matching feature points as long as the recognition object has a low correlation with other recognition objects or a feature can be determined even from a part. That is, even if a part is hidden and cannot be seen, or if a characteristic part is visible, the object can be recognized.
 ステップS1455においては、局所特徴量DB230に未照合の認識対象物が残っているか否かを判定する。まだ認識対象物が残っていれば、ステップS1457において次の認識対象物を設定して、パラメータp=1,q=0に初期化し、ステップS1435に戻って照合を繰り返す。 In step S1455, it is determined whether or not an unmatched recognition target remains in the local feature DB 230. If the recognition object still remains, the next recognition object is set in step S1457, parameters p = 1 and q = 0 are initialized, and the process returns to step S1435 to repeat the collation.
 なお、かかる照合処理の説明からも明らかなように、あらゆる分野の認識対象物を局所特徴量DB230に記憶して、全認識対象物を携帯端末で照合する処理は、負荷が非常に大きくなる。したがって、例えば、入力映像からの対象物認識の前にユーザが対象物の分野をメニューから選択して、その分野を局所特徴量DB230から検索して照合することが考えられる。また、局所特徴量DB230にユーザが使用する分野(例えば、図4Gの例であれば、建築物など)の局所特徴量のみをダウンロードすることによっても、負荷を軽減できる。 Note that, as is clear from the description of the collation processing, the processing for storing recognition objects in all fields in the local feature DB 230 and collating all the recognition objects with the mobile terminal is very heavy. Therefore, for example, it is conceivable that the user selects a field of an object from a menu before the object recognition from the input video, and searches and collates the field from the local feature amount DB 230. Also, the load can be reduced by downloading only the local feature amount of the field used by the user (for example, building in the example of FIG. 4G) to the local feature amount DB 230.
 [第3実施形態]
 次に、本発明の第3実施形態に係る映像処理装置について説明する。本実施形態に係る映像処理装置は、上記第2実施形態と比べると、生成される局所特徴量のデータ量を調整する点で異なる。その他の構成および動作は、第2実施形態と同様であるため、同じ構成および動作については説明を省略する。
[Third Embodiment]
Next, a video processing apparatus according to the third embodiment of the present invention will be described. The video processing apparatus according to the present embodiment is different from the second embodiment in that the data amount of the generated local feature amount is adjusted. Other configurations and operations are the same as those of the second embodiment, and thus description of the same configurations and operations is omitted.
 本実施形態によれば、ダイナミックに局所特徴量のデータ量を調整しながら、映像中のクエリ画像内の認識対象物をより高速にリアルタイムで認識することができる。 According to the present embodiment, it is possible to recognize the recognition target object in the query image in the video in real time at a higher speed while dynamically adjusting the data amount of the local feature amount.
 《映像処理装置の機能構成》
 図15は、本実施形態に係る映像処理装置1500の機能構成を示すブロック図である。なお、映像処理装置1500の第2実施形態の図2との相違点は、精度調整部1550のみであり、他の構成は図2と同様であるので同じ参照番号を付して説明は省略する。
<Functional configuration of video processing device>
FIG. 15 is a block diagram illustrating a functional configuration of the video processing device 1500 according to the present embodiment. The only difference between the video processing apparatus 1500 and the second embodiment shown in FIG. 2 is the accuracy adjustment unit 1550, and the other configurations are the same as those shown in FIG. .
 映像処理装置1500においては、精度調整部1550が、局所特徴量生成部220による局所特徴量の精度を調整する。精度調整部1550は、データ量評価部1560を有し、局所特徴量生成部220からの情報に基づき生成される局所特徴量のデータ量に基づいて精度を調整する。 In the video processing device 1500, the accuracy adjustment unit 1550 adjusts the accuracy of the local feature amount by the local feature amount generation unit 220. The accuracy adjustment unit 1550 includes a data amount evaluation unit 1560, and adjusts the accuracy based on the data amount of the local feature amount generated based on the information from the local feature amount generation unit 220.
 (データ量評価テーブル)
 図16は、本実施形態に係るデータ量評価テーブル1600の構成を示す図である。
(Data amount evaluation table)
FIG. 16 is a diagram showing a configuration of a data amount evaluation table 1600 according to the present embodiment.
 データ量評価テーブル1600は、撮像された映像ID1601に対応付けて、特徴点数1602、局所領域サイズ1603、サブ領域分割数1604、特徴ベクトル次元数1605を記憶する。これらの数値から予測データ量1606を算出する。この予測データ量1606に基づいて精度調整パラメータ1607を設定し、実際の生成データ量1608を記憶する。 The data amount evaluation table 1600 stores a feature point number 1602, a local region size 1603, a sub-region division number 1604, and a feature vector dimension number 1605 in association with the captured video ID 1601. A predicted data amount 1606 is calculated from these numerical values. An accuracy adjustment parameter 1607 is set based on the predicted data amount 1606, and the actual generated data amount 1608 is stored.
 このデータ量評価テーブル1600を使用しながら、精度調整部1550はデータ量評価部1560のデータ量評価に基づいて、精度調整を行なう。 While using this data amount evaluation table 1600, the accuracy adjustment unit 1550 performs accuracy adjustment based on the data amount evaluation of the data amount evaluation unit 1560.
 《映像処理装置の処理手順》
 図17は、本実施形態に係る映像処理装置1500の処理手順を示すフローチャートである。このフローチャートは、図12のCPU1210によってRAM1240を用いて実行され、図15の各機能構成部を実現する。なお、映像処理装置1500の処理手順は、データ量に基づく精度調整パラメータの更新を、第2実施形態の図13に示した信頼度に基づく精度調整パラメータの更新に置き換えたものであり、他のステップの処理は同様であるので、同じステップ番号を付して、説明は省略する。
《Processing procedure of video processing device》
FIG. 17 is a flowchart showing a processing procedure of the video processing device 1500 according to the present embodiment. This flowchart is executed by the CPU 1210 of FIG. 12 using the RAM 1240, and implements each functional component of FIG. Note that the processing procedure of the video processing device 1500 is obtained by replacing the update of the accuracy adjustment parameter based on the data amount with the update of the accuracy adjustment parameter based on the reliability shown in FIG. 13 of the second embodiment. Since the processing of the steps is the same, the same step number is assigned and the description is omitted.
 ステップS1719において、局所特徴量生成処理により生成されるデータ量が閾値DhとDlとの間であるかを判定する。閾値DhとDlとの間でない場合はステップS1721に進んで、精度調整パラメータを更新して、ステップS1315に戻って処理を繰り返す。閾値DhとDlとの間であればステップS1317に進んで、照合処理を行なう。 In step S1719, it is determined whether the amount of data generated by the local feature generation processing is between the threshold values Dh and Dl. If it is not between the threshold values Dh and Dl, the process proceeds to step S1721, the accuracy adjustment parameter is updated, and the process returns to step S1315 to repeat the process. If it is between the threshold values Dh and Dl, the process proceeds to step S1317 to perform collation processing.
 [第4実施形態]
 次に、本発明の第4実施形態に係る映像処理装置について説明する。本実施形態に係る映像処理装置は、上記第2実施形態および第3実施形態と比べると、局所特徴量の精度調整を適用して映像から認識対象物を選別してより詳細に照合認識する構成を有する点で異なる。その他の構成および動作は、第2実施形態あるいは第3実施形態と同様であるため、同じ構成および動作については説明を省略する。
[Fourth Embodiment]
Next, a video processing apparatus according to the fourth embodiment of the present invention will be described. Compared with the second and third embodiments, the video processing apparatus according to the present embodiment is configured to apply a local feature amount accuracy adjustment to select a recognition target from a video and perform collation recognition in more detail. It is different in having. Other configurations and operations are the same as those of the second embodiment or the third embodiment, and thus the description of the same configurations and operations is omitted.
 本実施形態によれば、ダイナミックに局所特徴量の精度を調整しながら、映像中のクエリ画像内の認識対象物を選別してより詳細な認識をリアルタイムで実現することができる。 According to the present embodiment, it is possible to select a recognition target object in a query image in a video and realize more detailed recognition in real time while dynamically adjusting the accuracy of the local feature amount.
 《本実施形態に係る映像処理》
 図18は、本実施形態に係る映像処理装置1800による映像処理を説明する図である。
<< Video processing according to this embodiment >>
FIG. 18 is a diagram for explaining video processing by the video processing apparatus 1800 according to the present embodiment.
 図18においては、携帯端末である映像処理装置1800の撮像部で交差点の映像を得て、交差点で停車している、あるいは走行している自動車を認識してユーザに報知する場合を図示している。 FIG. 18 illustrates a case where an image of an intersection is obtained by an imaging unit of a video processing device 1800 that is a portable terminal, and a vehicle that is stopped or traveling at the intersection is recognized and notified to the user. Yes.
 図18の左の表示画面1810は、現在の交差点の様子を表示する映像表示領域1811と、指示ボタン表示領域1812とを有している。 The left display screen 1810 in FIG. 18 has a video display area 1811 for displaying the current intersection state and an instruction button display area 1812.
 図18の中央の表示画面1820の映像表示領域1821は、映像表示領域1811の映像から、認識対象物の名称は分かる程度の精度の局所特徴量を生成して照合認識した結果を表示した画面である。映像表示領域1821では、3台の自動車が認識されて報知されている。交差点中を走行する○○車1822と、停車中の○○車1823と××車1824とである。なお、この照合処理においては、映像表示領域1811中の自転車や人なども局所特徴量が合致すれば認識されているが、以下の精度を高めた認識を自動車に限定している。 The video display area 1821 of the central display screen 1820 in FIG. 18 is a screen that displays the result of collating and recognizing the local feature amount with an accuracy that can recognize the name of the recognition target object from the video of the video display area 1811. is there. In the video display area 1821, three cars are recognized and notified. A XX car 1822 traveling in the intersection, a XX car 1823 and a XX car 1824 being stopped. In this verification process, a bicycle or a person in the video display area 1811 is recognized if the local feature amount matches, but the following recognition with higher accuracy is limited to a car.
 図18の右の表示画面1830の映像表示領域1831は、自動車の領域について、局所特徴量の精度を高めて照合認識を行なった結果が表示された画面である。中央の映像表示領域1821では認識されていなかった、○○のA車-BCD1832や、○○のB車1833、××車のH車1834のように、車の型までが認識されている。 The video display area 1831 on the display screen 1830 on the right side of FIG. 18 is a screen on which the result of collation recognition with the accuracy of the local feature amount increased is displayed for the automobile area. Car types, such as XX A car-BCD 1832, XX B car 1833, and XX car H car 1834, which have not been recognized in the central video display area 1821, are recognized.
 このように、まず撮像した映像全体から粗い低精度で照合認識した複数の認識対象物から、さらに、詳細に認識したい認識対象物を選別して密な高精度で照合認識する。これにより、できるだけ不必要な局所特徴量の生成照合を無くし、かつ、リアルタイムで高精度な対象物の認識が可能である。 As described above, first, a recognition object to be recognized in detail is further selected from a plurality of recognition objects that have been coarsely recognized with low accuracy from the entire captured image, and then highly accurate and highly accurate. This eliminates unnecessary generation and collation of local feature quantities as much as possible, and enables real-time object recognition with high accuracy.
 《映像処理装置の機能構成》
 図19は、本実施形態に係る映像処理装置1800の機能構成を示すブロック図である。映像処理装置1800の機能構成は、局所特徴量DBの構成の変形と、第2実施形態の図2の精度調整部を置き換えたものであり、他の機能構成部は図2と同様であるので、同じ参照番号を付し説明は省略する。
<Functional configuration of video processing device>
FIG. 19 is a block diagram showing a functional configuration of a video processing apparatus 1800 according to this embodiment. The functional configuration of the video processing device 1800 is obtained by replacing the configuration of the local feature DB and the accuracy adjusting unit in FIG. 2 of the second embodiment, and other functional components are the same as those in FIG. The same reference numerals are assigned and the description is omitted.
 精度調整部1950は、高精度の照合を行なう領域を選別する領域選別部1961と、精度を調整するために特徴ベクトルの次元数を調整する次元数調整部1962と、を有する。 The accuracy adjustment unit 1950 includes an area selection unit 1961 that selects an area to be collated with high accuracy, and a dimension number adjustment unit 1962 that adjusts the number of dimensions of the feature vector in order to adjust the accuracy.
 次元数調整部1962は、初期の局所特徴量生成部220の局所特徴量生成においては、次元数を制限して撮像部210が撮像した映像から認識対象物が認識できる程度の精度とする。その照合部240の照合結果から、領域選別部1961は、さらに詳細な認識を求める認識対象物の領域を選別する。 Dimension number adjustment unit 1962 limits the number of dimensions in the local feature amount generation of initial local feature amount generation unit 220 so that the recognition target can be recognized from the image captured by imaging unit 210. From the collation result of the collation unit 240, the region sorting unit 1961 sorts the region of the recognition object for which more detailed recognition is desired.
 次元数調整部1962は、次元数を増加して、映像全体ではなく領域選別部1961で選別した領域のみの詳細な照合認識を行なう。 Dimension number adjustment unit 1962 increases the number of dimensions and performs detailed collation recognition of only the region selected by region selection unit 1961 instead of the entire video.
 局所特徴量DB1930は、上記処理に対応して、局所特徴量の次元数と、認識対象物の認識精度が対応するように局所特徴量を格納している。 The local feature DB 1930 stores the local feature quantity so that the dimension number of the local feature quantity and the recognition accuracy of the recognition target object correspond to the above processing.
 なお、図19においては、精度調整部1950の精度調整として特徴ベクトルの次元数調整を示したが、特徴点数調整などの他の精度調整であってもよい。 In FIG. 19, the dimension adjustment of the feature vector is shown as the accuracy adjustment of the accuracy adjusting unit 1950. However, other accuracy adjustments such as feature point adjustment may be used.
 (次元数調整処理)
 図20は、本実施形態に係る次元数調整処理を説明する図である。
(Dimension adjustment processing)
FIG. 20 is a diagram for explaining the dimension number adjustment processing according to the present embodiment.
 図20においては、3段階に特徴ベクトルの次元数を調整する場合を示す。第1段階2010は、各特徴点の局所領域を25次元の特徴ベクトルで表わしたものである(図4F参照)。第2段階2020は、各特徴点の局所領域を50次元の特徴ベクトルで表わしたものである。さらに、第3段階2030は、各特徴点の局所領域を150次元の特徴ベクトルで表わしたものである。 FIG. 20 shows a case where the dimension number of the feature vector is adjusted in three stages. In the first stage 2010, the local region of each feature point is represented by a 25-dimensional feature vector (see FIG. 4F). In the second stage 2020, the local region of each feature point is represented by a 50-dimensional feature vector. Further, in the third stage 2030, the local region of each feature point is represented by a 150-dimensional feature vector.
 なお、図4D~図4Fにおいて説明したように、本実施形態の局所特徴量は階層構造になっている。したがって、次元数を変更する処理を行なっても、再度、入力映像から局所特徴量を生成することがないため、リアルタイムに精度調整ができる。 Note that, as described with reference to FIGS. 4D to 4F, the local feature amount of the present embodiment has a hierarchical structure. Therefore, even if the process of changing the number of dimensions is performed, the local feature amount is not generated again from the input video, so that the accuracy can be adjusted in real time.
 (局所特徴量DB)
 図21は、本実施形態に係る局所特徴量DB1930の構成を示す図である。なお、図21においては、図18で示した自動車の照合認識の部分のみを示すが、他の局所特徴量も同様に格納することで、本実施形態の処理が可能である。
(Local feature DB)
FIG. 21 is a diagram showing a configuration of the local feature DB 1930 according to the present embodiment. In FIG. 21, only the part of the vehicle recognition recognition shown in FIG. 18 is shown, but the processing of this embodiment can be performed by storing other local feature amounts in the same manner.
 局所特徴量DB1930は、メーカ2101に対応付けて、製造する複数の車名2102を記憶する。そして、局所特徴量DB1930は、各車名2102に対応付けて複数の型2103を記憶する。そして、局所特徴量DB1930は、型2103に対応付けて、1次元から150次元の局所特徴量を記憶する。 The local feature DB 1930 stores a plurality of vehicle names 2102 to be manufactured in association with the manufacturer 2101. Then, the local feature DB 1930 stores a plurality of molds 2103 in association with each vehicle name 2102. The local feature DB 1930 stores a one-dimensional to 150-dimensional local feature amount in association with the mold 2103.
 この局所特徴量DB1930の構成により、1次元~25次元2104の照合でメーカ2101が認識でき、1次元~50次元2104と2105の照合で車名2102が認識でき、1次元~150次元2104~2106の照合で型2103までを認識できる。 With this configuration of the local feature DB 1930, the manufacturer 2101 can be recognized by collation of the 1st to 25th dimensions 2104, and the vehicle name 2102 can be recognized by collation of the 1st to 50th dimensions 2104 and 2105, and the 1st to 150th dimensions 2104 to 2106. Up to the type 2103 can be recognized by the above verification.
 なお、図21の次元数の認識対象は概念的なものであり、具体例を限定するものではない。 Note that the recognition target of the number of dimensions in FIG. 21 is conceptual and does not limit a specific example.
 《映像処理装置の処理手順》
 図22は、本実施形態に係る映像処理装置1800の処理手順を示すフローチャートである。このフローチャートは、図12のCPU1210によってRAM1240を用いて実行され、図19の各機能構成部を実現する。なお、図22の映像処理装置1800の処理手順において、第2実施形態の図13と同様のステップには同じステップ番号を付して説明は省略する。また、図22においては、図13の送信処理および受信処理は省略している。
《Processing procedure of video processing device》
FIG. 22 is a flowchart showing a processing procedure of the video processing apparatus 1800 according to the present embodiment. This flowchart is executed by the CPU 1210 of FIG. 12 using the RAM 1240, and implements each functional component of FIG. Note that, in the processing procedure of the video processing apparatus 1800 in FIG. 22, the same steps as those in FIG. 13 of the second embodiment are denoted by the same step numbers, and description thereof is omitted. In FIG. 22, the transmission process and the reception process of FIG. 13 are omitted.
 映像入力であれば、ステップS1311からS2203に進んで、初期次元数を設定する。そして、局所特徴量生成処理(S1315)と照合処理(S1317)との後、ステップS2209において、現在設定されている次元数は最大か否かを判定する。次元数が最大でない場合はステップS2211に進んで、照合結果から詳細認識を求める認識対象物の領域を選別する。そして、ステップS2213において、次元数を初期次元数から増加して、ステップS1315に戻り、増加した次元数の局所特徴量を読み出す。 If it is video input, the process proceeds from step S1311 to S2203, and the initial number of dimensions is set. Then, after the local feature generation process (S1315) and the collation process (S1317), it is determined in step S2209 whether the currently set number of dimensions is the maximum. If the number of dimensions is not the maximum, the process proceeds to step S2211, and a recognition target area for which detailed recognition is desired is selected from the collation result. In step S2213, the number of dimensions is increased from the initial number of dimensions, and the process returns to step S1315 to read out the local feature quantity having the increased number of dimensions.
 一方、次元数が最大となっている場合はステップS2215に進んで、映像に最終の照合結果を重畳して表示する。 On the other hand, if the number of dimensions is the maximum, the process proceeds to step S2215 to display the final collation result superimposed on the video.
 [第5実施形態]
 次に、本発明の第5実施形態に係る映像処理装置について説明する。本実施形態に係る映像処理装置は、上記第2実施形態および第3実施形態と比べると、局所特徴量の精度調整を適用して映像から認識対象物全体を識別し、さらに認識対象物を、詳細に照合認識する構成を有する点で異なる。その他の構成および動作は、第2実施形態あるいは第3実施形態と同様であるため、同じ構成および動作については説明を省略する。
[Fifth Embodiment]
Next, a video processing apparatus according to the fifth embodiment of the present invention will be described. Compared with the second embodiment and the third embodiment, the video processing apparatus according to the present embodiment applies the accuracy adjustment of the local feature amount to identify the entire recognition target from the video, and further recognizes the recognition target, It differs in that it has a configuration that recognizes in detail. Other configurations and operations are the same as those of the second embodiment or the third embodiment, and thus the description of the same configurations and operations is omitted.
 本実施形態によれば、ダイナミックに局所特徴量の精度を調整しながら、映像中のクエリ画像内の認識対象物を識別してから、認識対象物の構成の詳細を照合認識することにより、認識対象物の構成をリアルタイムで認識することができる。例えば、棚卸しや製品の検査などを不必要な照合なしに実現できる。 According to the present embodiment, while dynamically adjusting the accuracy of the local feature quantity, the recognition target object in the query image in the video is identified, and then the details of the configuration of the recognition target object are collated and recognized. The configuration of the object can be recognized in real time. For example, inventory and product inspection can be realized without unnecessary verification.
 《本実施形態に係る映像処理》
 図23は、本実施形態に係る映像処理装置2300による映像処理を説明する図である。
<< Video processing according to this embodiment >>
FIG. 23 is a diagram for explaining video processing by the video processing apparatus 2300 according to the present embodiment.
 図23においては、携帯端末である映像処理装置2300の撮像部で棚の映像を得て、棚全体を特定して、次に棚に入っている対象物を認識してユーザに報知する場合を図示している。しかしながら、本実施形態の認識対象は棚に限定されない。例えば、基板における基板の認識と、基板上の部品の認識などにも適用できる。 In FIG. 23, a case where the image of the shelf is obtained by the imaging unit of the video processing device 2300 which is a portable terminal, the entire shelf is specified, and the object in the shelf is recognized next and notified to the user. It is shown. However, the recognition target of this embodiment is not limited to a shelf. For example, the present invention can be applied to recognition of a substrate on a substrate and recognition of components on the substrate.
 図23の左の表示画面2310は、棚2313の映像を表示する映像表示領域2311と、指示ボタン表示領域2312とを有している。 23 has a video display area 2311 for displaying video of the shelf 2313 and an instruction button display area 2312.
 図23の中央の表示画面2320の映像表示領域2321は、映像表示領域2311の映像から、棚2313を特定できる程度の精度の局所特徴量を生成して照合認識した結果を表示した画面である。ここで、棚を特定できる程度とは、単に棚自体の種別ではなく、その棚が食品棚か本棚か、食品棚であれば飲料品の棚かパンの棚か、本棚であれば文庫本の棚か辞書の棚かなどが特定できる程度である。なお、黒丸2322は棚全体の特徴点と局所領域とであり、映像上に表示されてもされなくてもよい。映像表示領域2321では、例えば、この棚が辞書の本棚であることが認識されたとする。それは、棚全体の局所特徴量の照合によって認識される。 23. The video display area 2321 of the central display screen 2320 in FIG. 23 is a screen that displays a result of collating and recognizing a local feature amount with an accuracy sufficient to identify the shelf 2313 from the video in the video display area 2311. Here, the extent to which a shelf can be specified is not simply the type of shelf itself, but the shelf is a food shelf or a bookshelf, if it is a food shelf, it is a beverage shelf or a bread shelf, and if it is a book shelf, it is a paperback shelf Or a dictionary shelf. The black circles 2322 are feature points and local areas of the entire shelf, and may or may not be displayed on the video. In the video display area 2321, for example, it is assumed that this shelf is recognized as a dictionary bookcase. It is recognized by collating local feature values of the entire shelf.
 辞書の本棚であることが認識されたので、図23の右の表示画面2330の映像表示領域2331では、辞書の局所特徴量が照合されることになる。なお、黒丸2332は個々の辞書の特徴点と局所領域とであり、映像上に表示されてもされなくてもよい。 Since it is recognized that the book shelf is a dictionary, in the video display area 2331 of the display screen 2330 on the right in FIG. Black circles 2332 are feature points and local areas of individual dictionaries, and may or may not be displayed on the video.
 このように、まず撮像した映像全体から粗い低精度で映像全体を照合認識し、次に、認識対象物を絞って密な高精度で照合認識する。これにより、できるだけ不必要な局所特徴量の生成や照合処理を無くし、かつ、リアルタイムで高精度な対象物の認識が可能である。 In this way, first, the entire image is collated and recognized from the entire captured image with coarse and low accuracy, and then the recognition object is narrowed down and collated and recognized with high accuracy. As a result, it is possible to eliminate the generation and collation processing of unnecessary local features as much as possible, and to recognize the target object with high accuracy in real time.
 《映像処理装置の機能構成》
 図24は、本実施形態に係る映像処理装置2300の機能構成を示すブロック図である。映像処理装置2300の機能構成は、局所特徴量DBの構成の変形と、第2実施形態の図2の精度調整部を置き換えたものであり、他の機能構成部は図2と同様であるので、同じ参照番号を付し説明は省略する。
<Functional configuration of video processing device>
FIG. 24 is a block diagram illustrating a functional configuration of the video processing apparatus 2300 according to the present embodiment. The functional configuration of the video processing device 2300 is a modification of the configuration of the local feature DB and the accuracy adjustment unit of FIG. 2 of the second embodiment, and the other functional configuration units are the same as those of FIG. The same reference numerals are assigned and the description is omitted.
 精度調整部2450は、全体の照合結果を判定する照合判定部2461と、精度を調整するために特徴点数を調整する特徴点数調整部2462と、を有する。 The accuracy adjustment unit 2450 includes a collation determination unit 2461 that determines the overall collation result, and a feature point number adjustment unit 2462 that adjusts the number of feature points in order to adjust the accuracy.
 特徴点数調整部2462は、初期の局所特徴量生成部220の局所特徴量生成においては、特徴点数を制限して撮像部210が撮像した映像から映像全体から認識対象物が認識できる程度の精度とする。その照合部240の照合結果から、照合判定部2461は、映像全体から認識対象物を判定し、その判定結果に対応する認識対象物の局所特徴量を局所特徴量DB2430から選択する。また、判定結果に対応する認識対象物の認識に必要な特徴点数を特徴点数調整部2462に通知する。 The feature point adjustment unit 2462 has an accuracy that allows the recognition target object to be recognized from the entire video from the video captured by the imaging unit 210 while limiting the number of feature points in the local feature generation of the initial local feature generation unit 220. To do. From the collation result of the collation unit 240, the collation determination unit 2461 determines a recognition target object from the entire video, and selects a local feature value of the recognition target object corresponding to the determination result from the local feature value DB 2430. In addition, the feature point adjustment unit 2462 is notified of the number of feature points necessary for recognition of the recognition target object corresponding to the determination result.
 特徴点数調整部2462は、特徴点数を増加して、映像全体ではなくで照合判定部2461判別した結果から対象を絞った詳細な照合認識を行なう。 The feature point adjustment unit 2462 increases the number of feature points, and performs detailed collation recognition focused on the result of the discrimination determination unit 2461 discrimination instead of the entire video.
 局所特徴量DB2430は、上記処理に対応して、映像全体から対象物を認識する局所特徴量と、認識対象物内の個別の対象物を認識する局所特徴量とを、識別可能に格納している。 Corresponding to the above processing, the local feature DB 2430 stores a local feature that recognizes an object from the entire video and a local feature that recognizes an individual object in the recognition object in an identifiable manner. Yes.
 なお、図24においては、精度調整部2450の精度調整として特徴点数調整を示したが、特徴ベクトルの次元数調整などの他の精度調整であってもよい。 In FIG. 24, the feature point number adjustment is shown as the accuracy adjustment of the accuracy adjustment unit 2450. However, other accuracy adjustments such as the feature vector dimension number adjustment may be used.
 (次元数調整処理)
 図25は、本実施形態に係る次元数調整処理を説明する図である。
(Dimension adjustment processing)
FIG. 25 is a diagram for explaining the dimension number adjustment processing according to the present embodiment.
 図25においては、3段階に特徴点数を調整する場合を示す。なお、特徴ベクトルの次元数はすべて25次元として説明するが、これに限定されない。第1段階2510は、特徴点数を最大50に制限した局所特徴量を示したものである。第2段階2520は、特徴点数を最大200に制限した局所特徴量を示したものである。第3段階2530は、特徴点数を最大500に制限した局所特徴量を示したものである。 FIG. 25 shows a case where the number of feature points is adjusted in three stages. Note that although the number of dimensions of the feature vector is all 25, the present invention is not limited to this. The first stage 2510 shows a local feature amount in which the number of feature points is limited to 50 at the maximum. The second stage 2520 shows a local feature amount in which the number of feature points is limited to a maximum of 200. The third stage 2530 shows the local feature amount in which the number of feature points is limited to a maximum of 500.
 例えば、50特徴点や200特徴点においては、図23の棚が何の棚であるかは認識可能である。しかし、棚の中身、図23では本棚の本が何であるかは、500特徴点が必要と考えられる。この説明は概念的なものであり、具体的な内容を制限するものではない。
 (局所特徴量DB)
 図26は、本実施形態に係る局所特徴量DB2430の構成を示す図である。なお、図26では、本棚と棚に陳列される本/ビデオについて示すが、同様の構成が他の組み合わせにおいても同様に適用される。
For example, at 50 feature points and 200 feature points, it is possible to recognize what shelves are in FIG. However, it is considered that 500 feature points are necessary for the contents of the shelf, what is the book on the bookshelf in FIG. This description is conceptual and does not limit the specific contents.
(Local feature DB)
FIG. 26 is a diagram showing the configuration of the local feature DB 2430 according to this embodiment. 26 shows the bookshelf and the books / videos displayed on the shelves, the same configuration is similarly applied to other combinations.
 図26の局所特徴量DB2610は、棚全体から生成された局所特徴量を記憶する局所特徴量DBである。局所特徴量DB2610は、棚全体2611に対応付けて、例えば50特徴点の局所特徴量と特徴点座標とが記憶されている。 26 is a local feature DB that stores local features generated from the entire shelf. The local feature DB 2610 stores, for example, local feature values and feature point coordinates of 50 feature points in association with the entire shelf 2611.
 一方、図26の局所特徴量DB2620は、棚に陳列される本やビデオから生成された局所特徴量を記憶する局所特徴量DBである。局所特徴量DB2620は、本/ビデオ2621に対応付けて、例えば500特徴点の局所特徴量と特徴点座標とが記憶されている。 On the other hand, the local feature DB 2620 in FIG. 26 is a local feature DB that stores local features generated from books and videos displayed on the shelf. The local feature DB 2620 stores, for example, local feature values and feature point coordinates of 500 feature points in association with the book / video 2621.
 なお、上記の特徴点数は一例であって、それぞれの対象物を認識するために必要な特徴点数、あるいは他との相関の大小によって必要な特徴点数を決めてもよい。 Note that the above-mentioned number of feature points is an example, and the number of feature points necessary for recognizing each target object or the number of feature points necessary for the correlation with others may be determined.
 《映像処理装置の処理手順》
 図27は、本実施形態に係る映像処理装置2300の処理手順を示すフローチャートである。このフローチャートは、図12のCPU1210によってRAM1240を用いて実行され、図24の各機能構成部を実現する。なお、図27の映像処理装置2300の処理手順において、第2実施形態の図13と同様のステップには同じステップ番号を付して説明は省略する。また、図27においては、図13の送信処理および受信処理は省略している。
《Processing procedure of video processing device》
FIG. 27 is a flowchart showing a processing procedure of the video processing apparatus 2300 according to the present embodiment. This flowchart is executed by the CPU 1210 of FIG. 12 using the RAM 1240, and implements each functional component of FIG. In the processing procedure of the video processing apparatus 2300 in FIG. 27, the same steps as those in FIG. 13 in the second embodiment are denoted by the same step numbers, and description thereof is omitted. In FIG. 27, the transmission process and the reception process of FIG. 13 are omitted.
 映像入力であれば、ステップS1311からS2713に進んで、棚判別用条件(本例では特徴点数)数を設定する。次に、ステップS2715棚識別用局所特徴量DBを選択する(図26の2610参照)。そして、局所特徴量生成処理(S1315)と照合処理(S1317)との後、ステップS2719において、照合対象が棚であるか/棚に陳列された物品であるかが判定される。棚が照合対象であった場合はステップS2721に進んで、判別した棚に陳列されている物品用条件(特徴点数)を設定する。次に、ステップS2723において、物品判別用局所特徴量DBを選別する(図26の2620参照)。
そして、ステップS1315に戻り、最初には局所特徴量生成の対象にしなかった特徴点も追加して局所特徴量を生成する。
If it is a video input, the process proceeds from step S1311 to S2713, and the number of shelf discrimination conditions (number of feature points in this example) is set. Next, a local feature DB for identifying a shelf in step S2715 is selected (see 2610 in FIG. 26). Then, after the local feature generation process (S1315) and the collation process (S1317), it is determined in step S2719 whether the collation target is a shelf / an article displayed on the shelf. If the shelf is a collation target, the process proceeds to step S2721, and the article condition (number of feature points) displayed on the determined shelf is set. Next, in step S2723, the local feature quantity DB for article discrimination is selected (see 2620 in FIG. 26).
Then, the process returns to step S1315, and a feature point that is not initially targeted for local feature generation is also added to generate a local feature.
 一方、照合対象が物品であった場合はステップS2725に進んで、棚内の物品配置を映像と共に記憶する。なお、この照合結果は、例えば棚卸しなどに使用される。 On the other hand, if the verification target is an article, the process proceeds to step S2725, and the article arrangement in the shelf is stored together with the video. This collation result is used for inventory, for example.
 なお、図27における棚は映像全体に置き換えられ、物品は個別対象物に置き換えられて、一般化することができる。 It should be noted that the shelf in FIG. 27 can be generalized by replacing the entire image and replacing the article with an individual object.
 [第6実施形態]
 次に、本発明の第6実施形態に係る映像処理装置について説明する。本実施形態に係る映像処理装置は、上記第2実施形態および第3実施形態と比べると、局所特徴量の精度調整を適用して映像から変化のある認識対象物を認識し、さらに認識対象物を、詳細に照合認識する構成を有する点で異なる。その他の構成および動作は、第2実施形態あるいは第3実施形態と同様であるため、同じ構成および動作については説明を省略する。
[Sixth Embodiment]
Next, a video processing apparatus according to the sixth embodiment of the present invention will be described. Compared with the second embodiment and the third embodiment, the video processing apparatus according to the present embodiment recognizes a recognition object having a change from the video by applying the accuracy adjustment of the local feature amount, and further recognizes the recognition object. Is different in that it has a configuration for recognizing in detail. Other configurations and operations are the same as those of the second embodiment or the third embodiment, and thus the description of the same configurations and operations is omitted.
 本実施形態によれば、ダイナミックに局所特徴量の精度を調整しながら、映像中のクエリ画像内の変化のある認識対象物を認識して、認識対象物の詳細を照合認識することにより、変化する認識対象物をリアルタイムで詳細に認識することができる。例えば、監視カメラなどに適用できる。 According to the present embodiment, while dynamically adjusting the accuracy of the local feature amount, the recognition object with a change in the query image in the video is recognized, and the details of the recognition object are collated and recognized. The recognition object to be recognized can be recognized in detail in real time. For example, it can be applied to a surveillance camera.
 《本実施形態に係る映像処理》
 図28は、本実施形態に係る映像処理装置2800による映像処理を説明する図である。
<< Video processing according to this embodiment >>
FIG. 28 is a diagram for explaining video processing by the video processing apparatus 2800 according to the present embodiment.
 図28においては、携帯端末である映像処理装置2800の撮像部で店舗内の映像を得て、映像中に変化がないかを監視する。映像中に変化が検出されれば、その変化をした認識対象物を高精度に照合認識する。しかしながら、本実施形態の認識対象は店舗内の監視に限定されない。 In FIG. 28, an image in the store is obtained by the imaging unit of the video processing device 2800, which is a portable terminal, and is monitored for changes in the video. If a change is detected in the video, the recognition object having changed is collated and recognized with high accuracy. However, the recognition target of the present embodiment is not limited to in-store monitoring.
 図28の左の表示画面2810は、店舗内の映像を表示する映像表示領域2811と、指示ボタン表示領域2812とを有している。 28. The left display screen 2810 in FIG. 28 has a video display area 2811 for displaying video in the store and an instruction button display area 2812.
 図28の中央の表示画面2820の映像表示領域2821は、映像表示領域2811の映像から、変化を検出できる程度の精度の局所特徴量を生成して照合認識した結果を表示した画面である。映像表示領域2821では、例えば、人2822と2823の変化が検出されたとする。それは、店舗全体の局所特徴量の照合によって認識される。 28. The video display area 2821 of the central display screen 2820 in FIG. 28 is a screen that displays a result of collation and recognition by generating a local feature amount with an accuracy sufficient to detect a change from the video in the video display area 2811. In the video display area 2821, for example, it is assumed that changes in people 2822 and 2823 are detected. This is recognized by collating local feature values of the entire store.
 人2822と2823の変化が検出されたので、図28の右の表示画面2830の映像表示領域2831では、人の局所特徴量が照合されることになる。その結果、局所特徴量DBの局所特徴量と照合の結果、人2822は店員A2832であることが認識され表示される。一方、人2823は客B2833であることが認識され表示される。 Since changes in the persons 2822 and 2823 are detected, the local feature amount of the person is collated in the video display area 2831 on the right display screen 2830 in FIG. As a result, as a result of collation with the local feature amount of the local feature amount DB, it is recognized and displayed that the person 2822 is the clerk A2832. On the other hand, person 2823 is recognized and displayed as customer B2833.
 このように、まず撮像した映像全体から粗い低精度で映像の変化を照合認識し、次に、変化した認識対象物を絞って密な高精度で照合認識する。これにより、できるだけ不必要な局所特徴量の生成や照合処理を無くし、かつ、リアルタイムで高精度な対象物の認識が可能である。 In this way, first, the change of the image is collated and recognized from the entire captured image with coarse and low accuracy, and then, the changed recognition target is narrowed down and collated and recognized with high accuracy. As a result, it is possible to eliminate the generation and collation processing of unnecessary local features as much as possible, and to recognize the target object with high accuracy in real time.
 《映像処理装置の機能構成》
 図29は、本実施形態に係る映像処理装置2800の機能構成を示すブロック図である。映像処理装置2800の機能構成は、局所特徴量DBの構成の変形と、第2実施形態の図2の精度調整部を置き換えたものであり、他の機能構成部は図2と同様であるので、同じ参照番号を付し説明は省略する。
<Functional configuration of video processing device>
FIG. 29 is a block diagram showing a functional configuration of a video processing apparatus 2800 according to this embodiment. The functional configuration of the video processing device 2800 is obtained by replacing the configuration of the local feature DB and the accuracy adjusting unit in FIG. 2 of the second embodiment, and other functional components are the same as those in FIG. The same reference numerals are assigned and the description is omitted.
 精度調整部2950は、全体映像の照合結果から変化を検出して認識対象物を選別する変化検出/認識対象物選別部2961と、精度を調整するために特徴点数と次元数とを調整する特徴点数/次元数調整部2962と、を有する。 The accuracy adjusting unit 2950 detects a change from the collation result of the entire video and selects a recognition target, and a feature for adjusting the number of feature points and the number of dimensions to adjust the accuracy. A score / dimension number adjustment unit 2962.
 特徴点数/次元数調整部2962は、初期の局所特徴量生成部220の局所特徴量生成においては、特徴点数および次元数を制限して撮像部210が撮像した映像から映像全体から変化が検出できる程度の精度とする。その照合部240の照合結果から、変化検出/認識対象物選別部2961は、映像全体から変化を検出し、その判定結果に対応する認識対象物の局所特徴量を局所特徴量DB2930から選択する。また、判定結果に対応する認識対象物の認識に必要な特徴点数および次元数を特徴点数/次元数調整部2962に通知する。 In the initial local feature generation by the local feature generation unit 220, the feature point / dimension adjustment unit 2962 can detect a change from the entire video from the video captured by the imaging unit 210 while limiting the number of feature points and the number of dimensions. The accuracy is about the same. From the collation result of the collation unit 240, the change detection / recognition target selecting unit 2961 detects a change from the entire video and selects a local feature amount of the recognition target corresponding to the determination result from the local feature amount DB 2930. In addition, the feature point number / dimension number adjustment unit 2962 is notified of the number of feature points and the number of dimensions necessary for recognition of the recognition object corresponding to the determination result.
 特徴点数/次元数調整部2962は、特徴点数および/または次元数を増加して、映像全体ではなくで変化検出/認識対象物選別部2961が検出した結果から対象物を絞った詳細な照合認識を行なう。 The feature point number / dimension number adjustment unit 2962 increases the number of feature points and / or the number of dimensions, and performs detailed collation recognition that narrows down the target from the result detected by the change detection / recognition target selection unit 2961 instead of the entire video. To do.
 局所特徴量DB2930は、上記処理に対応して、映像全体から変化を認識する局所特徴量と、認識対象物内の個別の対象物を認識する局所特徴量とを、識別可能に格納している。 Corresponding to the above processing, the local feature DB 2930 stores a local feature for recognizing a change from the entire video and a local feature for recognizing an individual object in the recognition object in an identifiable manner. .
 なお、図29においては、精度調整部2950の精度調整として特徴点数および次元数調整を示したが、他の精度調整であってもよい。 In FIG. 29, the number of feature points and the number of dimensions are adjusted as the accuracy adjustment of the accuracy adjustment unit 2950. However, other accuracy adjustments may be used.
 (局所特徴量DB)
 本実施形態の局所特徴量DB2930の構成は、図26の局所特徴量DB2430において、棚を認識する局所特徴量DB2610を店舗映像の局所特徴量に変え、物品(本/ビデオ)を認識する局所特徴量DB2620を認識対象物(人などの局所特徴量に変えた構成であり、ここでは説明の重複を避ける。
(Local feature DB)
The configuration of the local feature DB 2930 of this embodiment is the same as the local feature DB 2430 in FIG. 26, except that the local feature DB 2610 for recognizing a shelf is changed to a local feature of a store image to recognize an article (book / video). This is a configuration in which the amount DB 2620 is changed to a recognition target object (local feature amount such as a person), and here, overlapping description is avoided.
 (変化検出パラメータ)
 図30Aは、本実施形態に係る変化検出パラメータ3010の構成を示す図である。
(Change detection parameter)
FIG. 30A is a diagram showing a configuration of a change detection parameter 3010 according to this embodiment.
 変化検出パラメータ3010は、映像から生成された局所特徴量に基づいて、前後の映像の認識位置の差3011がE1以上である場合(上下左右の動き)、あるいは、認識サイズの差3012がE2以上である場合(前後の動き)、あるいは、認識方向の差3012がE3以上である場合(同じ位置の回転など)、に映像に変化が合ったと検出する。 The change detection parameter 3010 is based on the local feature amount generated from the video when the difference 3011 between the previous and next video recognition positions is E1 or more (up / down / left / right movement), or the recognition size difference 3012 is E2 or more. When the difference is in the image (forward / backward movement) or when the difference 3012 in the recognition direction is E3 or more (rotation at the same position, etc.), it is detected that the image has changed.
 (変化検出データ)
 図30Bは、本実施形態に係る変化検出データ3020の構成を示す図である。
(Change detection data)
FIG. 30B is a diagram showing a configuration of change detection data 3020 according to the present embodiment.
 変化検出データ3020は、認識対象物ID3021に対応付けて、前回の認識位置(重心位置)3022、前回の認識サイズ3023、前回の認識方向(角度)3024と、今回の認識位置(重心位置)3025、今回の認識サイズ3026、今回の認識方向(角度)3027とが記憶される。 The change detection data 3020 is associated with the recognition object ID 3021, the previous recognition position (center of gravity position) 3022, the previous recognition size 3023, the previous recognition direction (angle) 3024, and the current recognition position (center of gravity position) 3025. The current recognition size 3026 and the current recognition direction (angle) 3027 are stored.
 それぞれを比較して、図30Aに示した条件を満足する場合には、映像中に変化があったと検出して、変化有無3028にフラグを記憶する。 When the conditions shown in FIG. 30A are satisfied by comparing each of them, it is detected that there is a change in the video, and a flag is stored in the change presence / absence 3028.
 《映像処理装置の処理手順》
 図31は、本実施形態に係る映像処理装置2800の処理手順を示すフローチャートである。このフローチャートは、図12のCPU1210によってRAM1240を用いて実行され、図29の各機能構成部を実現する。なお、図31の映像処理装置2800の処理手順において、第2実施形態の図13と同様のステップには同じステップ番号を付して説明は省略する。また、図31においては、図13の送信処理および受信処理は省略している。
《Processing procedure of video processing device》
FIG. 31 is a flowchart showing a processing procedure of the video processing apparatus 2800 according to the present embodiment. This flowchart is executed by the CPU 1210 of FIG. 12 using the RAM 1240, and implements each functional component of FIG. In the processing procedure of the video processing apparatus 2800 in FIG. 31, the same steps as those in FIG. 13 in the second embodiment are denoted by the same step numbers and description thereof is omitted. In FIG. 31, the transmission process and the reception process of FIG. 13 are omitted.
 映像入力であれば、ステップS1311からS3113に進んで、変化検出用の条件である特徴点数/次元数を設定する。そして、局所特徴量生成処理(S1315)と照合処理(S1317)との後、ステップS3119において、変化検出であるか/変化物の同定であるかが判定される。変化検出であった場合はステップS3121に進んで、変化があったか否かを判定する。変化がなければステップS1315に戻って変化検出を繰り返す。変化があればステップS3123に進んで、変化物を同定するための条件としてより多い特徴点数/次元数を設定する。ステップS3125においては、変化のあった領域を選別する。変化の対象物が認識できるのであれば、ステップS3127において、対象物が含まれる局所特徴量を局所特徴量DB2930から選別する。そして、ステップS1315に戻って処理を繰り返す。 If it is a video input, the process advances from step S1311 to S3113, and the number of feature points / dimensions that are conditions for change detection are set. Then, after the local feature generation process (S1315) and the collation process (S1317), it is determined in step S3119 whether it is change detection or change object identification. If it is a change detection, the process advances to step S3121 to determine whether or not there is a change. If there is no change, the process returns to step S1315 to repeat change detection. If there is a change, the process advances to step S3123 to set a larger number of feature points / dimensions as a condition for identifying the changed object. In step S3125, the changed area is selected. If the change target can be recognized, in step S3127, the local feature quantity including the target object is selected from the local feature quantity DB 2930. And it returns to step S1315 and repeats a process.
 [第7実施形態]
 次に、本発明の第7実施形態に係る映像処理システムについて説明する。本実施形態に係る映像処理システムは、上記第2実施形態乃至第6実施形態と比べると、携帯端末において局所特徴量の生成と送信とを行ない、照合サーバにおいて照合と精度調整の指示とを行なう映像処理システムである点で異なる。その他の構成および動作は、第2実施形態乃至第6実施形態と同様であるため、同じ構成および動作については説明を省略する。
[Seventh Embodiment]
Next, a video processing system according to the seventh embodiment of the present invention will be described. Compared with the second to sixth embodiments, the video processing system according to the present embodiment generates and transmits local feature amounts in the mobile terminal, and performs verification and accuracy adjustment instructions in the verification server. It is different in that it is a video processing system. Other configurations and operations are the same as those of the second to sixth embodiments, and thus the description of the same configurations and operations is omitted.
 本実施形態によれば、ネットワークを介して局所特徴量を伝送する映像処理システムにおいても、ダイナミックに局所特徴量の精度を調整しながら、映像中のクエリ画像内の認識対象物をリアルタイムで認識することができる。 According to the present embodiment, even in a video processing system that transmits local feature quantities via a network, the recognition target object in the query image in the video is recognized in real time while dynamically adjusting the accuracy of the local feature quantities. be able to.
 《本実施形態に係る映像処理》
 図32は、本実施形態に係る映像処理システム3200による映像処理を説明する図である。
<< Video processing according to this embodiment >>
FIG. 32 is a diagram for explaining video processing by the video processing system 3200 according to the present embodiment.
 映像処理システム3200は、携帯端末である映像処理装置3210と、ネットワーク3230を介して映像処理装置3210と接続された照合サーバである映像処理装置3220とを有する。そして、映像処理装置3220は、認識対象物からあらかじめ生成された局所特徴量を認識対象物に対応付けて格納する局所特徴量DB3221と、照合結果に基づいて精度調整パラメータを設定するための精度調整パラメータDB3222とを有する。 The video processing system 3200 includes a video processing device 3210 that is a mobile terminal and a video processing device 3220 that is a collation server connected to the video processing device 3210 via a network 3230. Then, the video processing device 3220 has a local feature DB 3221 that stores a local feature amount generated in advance from the recognition target object in association with the recognition target object, and an accuracy adjustment for setting an accuracy adjustment parameter based on the matching result. Parameter DB 3222.
 映像処理装置3210は、撮像した映像から局所特徴量を生成して、映像処理装置3220に送信する。映像処理装置3220は、照合結果から精度調整が必要と判断すれば、映像処理装置3210に精度調整パラメータを送信する。 The video processing device 3210 generates a local feature amount from the captured video and transmits it to the video processing device 3220. If the video processing device 3220 determines that the accuracy adjustment is necessary from the collation result, the video processing device 3220 transmits the accuracy adjustment parameter to the video processing device 3210.
 携帯端末である映像処理装置3210の右図の表示画面3211は、撮像した映像から粗い低精度で局所特徴量を生成して映像処理装置3220に送信した状態を示している。左図の表示画面3212は、撮像した映像から映像処理装置3220からの指示に従って、密な高精度で局所特徴量を生成して映像処理装置3220に送信した状態を示している。なお、黒丸3211aおよび3212aは、特徴点と局所領域とを示している。かかる黒丸は表示しても表示しなくてもよい。 The display screen 3211 on the right side of the video processing device 3210 which is a portable terminal shows a state in which a local feature amount is generated with coarse low accuracy from the captured video and transmitted to the video processing device 3220. A display screen 3212 in the left diagram shows a state in which local feature amounts are generated with high precision and transmitted to the video processing device 3220 in accordance with instructions from the video processing device 3220 from the captured video. Black circles 3211a and 3212a indicate feature points and local regions. Such black circles may or may not be displayed.
 なお、上記では、特徴点数で精度を示したが、局所領域やサブ領域分割数、特徴ベクトルの次元数による精度調整も含むものである。 In the above description, the accuracy is indicated by the number of feature points, but includes accuracy adjustment by the number of local regions, subregion divisions, and the number of dimensions of feature vectors.
 《映像処理システムの処理手順》
 図33は、本実施形態に係る映像処理システム3200の処理手順を示すシーケンス図である。なお、ステップS3300においては、必要であれば、照合サーバから携帯端末に対して本実施形態のアプリケーションのダウンロードを行なう。
《Processing procedure of video processing system》
FIG. 33 is a sequence diagram showing a processing procedure of the video processing system 3200 according to the present embodiment. In step S3300, if necessary, the application of this embodiment is downloaded from the verification server to the mobile terminal.
 まず、ステップS3301において、携帯端末および照合サーバにおいてアプリケーションを起動し、初期化する。携帯端末は、ステップS3303において、撮像部210により映像を撮影する。携帯端末は次にステップS3305において、局所特徴量を生成する。そして、携帯端末は、ステップS3307において、生成した局所特徴量と特徴点の位置座標を符号化して、ステップS3309において、ネットワークを介して照合サーバに送信する。 First, in step S3301, an application is started and initialized in the mobile terminal and the verification server. In step S3303, the portable terminal captures an image by the imaging unit 210. Next, in step S3305, the portable terminal generates a local feature amount. In step S3307, the portable terminal encodes the generated local feature amount and the position coordinates of the feature point, and in step S3309, transmits them to the matching server via the network.
 照合サーバでは、ステップS3311において、局所特徴量DB3221の認識対象物の局所特徴量と受信した局所特徴量との照合により、映像中の対象物を認識する。次に、照合サーバは、ステップS3313において、照合結果を携帯端末に返信する。 In step S3311, the collation server recognizes the object in the video by collating the local feature amount of the recognition target object in the local feature amount DB 3221 with the received local feature amount. Next, the collation server returns the collation result to the portable terminal in step S3313.
 携帯端末は、ステップS3315において、入力映像に受信した認識対象物を報知する。 In step S3315, the mobile terminal notifies the received recognition target object in the input video.
 照合サーバは、ステップS3317において、照合結果に基づいて精度調整が必要か否かを判定する。精度調整が必要でればステップS3319に進んで、精度調整パラメータDB3222から精度調整パラメータを取得して携帯端末に送信する。一方、精度調整の必要がなければ、照合サーバは処理を終了する。 In step S3317, the collation server determines whether accuracy adjustment is necessary based on the collation result. If accuracy adjustment is necessary, the process advances to step S3319 to acquire the accuracy adjustment parameter from the accuracy adjustment parameter DB 3222 and transmit it to the mobile terminal. On the other hand, if the accuracy adjustment is not necessary, the verification server ends the process.
 精度調整パラメータを受信した携帯端末は、ステップS3321からステップS3305に戻って、精度調整した局所特徴量の取得を繰り返す。精度調整パラメータの受信がなければ、携帯端末の処理も終了する。 The mobile terminal that has received the accuracy adjustment parameter returns from step S3321 to step S3305, and repeats the acquisition of the local feature amount whose accuracy has been adjusted. If no accuracy adjustment parameter is received, the processing of the portable terminal is also terminated.
 本実施形態においては、かかる一連の処理がリアルタイムで実現され、ユーザは入力映像中で認識対象物を知ることができる。 In the present embodiment, such a series of processing is realized in real time, and the user can know the recognition object in the input video.
 《携帯端末用の映像処理装置の機能構成》
 図34Aは、本実施形態に係る携帯端末用の映像処理装置3210の機能構成を示すブロック図である。なお、映像処理装置3210の機能構成は、第2実施形態の映像処理装置200から照合処理に関連する構成を無くし、代わりに、局所特徴量の送信構成と照合結果の受信構成を追加した構成であるので、図2と同じ構成要素には同じ参照番号を付し、説明は省略する。
<Functional configuration of video processing device for portable terminal>
FIG. 34A is a block diagram showing a functional configuration of a video processing device 3210 for a portable terminal according to this embodiment. The functional configuration of the video processing device 3210 is a configuration in which the configuration related to the collation processing is eliminated from the video processing device 200 of the second embodiment, and instead, a local feature transmission configuration and a collation result reception configuration are added. Therefore, the same components as those in FIG. 2 are denoted by the same reference numerals, and description thereof is omitted.
 映像処理装置3210は、通信制御部3470を介して局所特徴量生成部220で生成した局所特徴量および特徴点座標を送信するために、それらを符号化する符号化部3430を有する(図34B参照)。また、局所特徴量生成部220で生成する局所特徴量の精度は、通信制御部3470を介して精度調整パラメータ受信部3440により受信された精度調整パラメータに基づいて、精度調整部250で調整される。 The video processing device 3210 includes an encoding unit 3430 that encodes the local feature amount and the feature point coordinates generated by the local feature amount generation unit 220 via the communication control unit 3470 (see FIG. 34B). ). Further, the accuracy of the local feature amount generated by the local feature amount generation unit 220 is adjusted by the accuracy adjustment unit 250 based on the accuracy adjustment parameter received by the accuracy adjustment parameter reception unit 3440 via the communication control unit 3470. .
 一方、通信制御部3470を介して照合結果受信部3460で受信したデータに従って、入力映像に重畳した表示画面を生成して、表示部280に表示する。また、照合結果受信部3460で受信したデータに音声データがある場合には、音声出力される。 On the other hand, a display screen superimposed on the input video is generated and displayed on the display unit 280 according to the data received by the verification result receiving unit 3460 via the communication control unit 3470. If the data received by the verification result receiving unit 3460 includes voice data, the voice is output.
 (符号化部)
 図34Bは、本実施形態に係る符号化部3430の構成を示すブロック図である。なお、符号化部は本例に限定されず、他の符号化処理も適用可能である。
(Encoding part)
FIG. 34B is a block diagram illustrating a configuration of the encoding unit 3430 according to the present embodiment. Note that the encoding unit is not limited to this example, and other encoding processes can be applied.
 符号化部3430は、局所特徴量生成部220の特徴点検出部411から特徴点の座標を入力して、座標値を走査する座標値走査部3432を有する。座標値走査部3432は、画像をある特定の走査方法に従って走査し、特徴点の2次元座標値(X座標値とY座標値)を1次元のインデックス値に変換する。このインデックス値は、走査に従った原点からの走査距離である。なお、走査方向については、制限はない。 The encoding unit 3430 has a coordinate value scanning unit 3432 that inputs the coordinates of the feature points from the feature point detection unit 411 of the local feature quantity generation unit 220 and scans the coordinate values. The coordinate value scanning unit 3432 scans the image according to a specific scanning method, and converts the two-dimensional coordinate values (X coordinate value and Y coordinate value) of the feature points into one-dimensional index values. This index value is a scanning distance from the origin according to scanning. There is no restriction on the scanning direction.
 また、特徴点のインデックス値をソートし、ソート後の順列の情報を出力するソート部3433を有する。ここでソート部3433は、例えば昇順にソートする。また降順にソートしてもよい。 Also, it has a sorting unit 3433 that sorts the index values of feature points and outputs permutation information after sorting. Here, the sorting unit 3433 sorts, for example, in ascending order. You may also sort in descending order.
 また、ソートされたインデックス値における、隣接する2つのインデックス値の差分値を算出し、差分値の系列を出力する差分算出部3434を有する。 Also, a difference calculation unit 3434 that calculates a difference value between two adjacent index values in the sorted index value and outputs a series of difference values is provided.
 そして、差分値の系列を系列順に符号化する差分符号化部3435を有する。差分値の系列の符号化は、例えば固定ビット長の符号化でもよい。固定ビット長で符号化する場合、そのビット長はあらかじめ規定されていてもよいが、これでは考えられうる差分値の最大値を表現するのに必要なビット数を要するため、符号化サイズは小さくならない。そこで、差分符号化部3435は、固定ビット長で符号化する場合、入力された差分値の系列に基づいてビット長を決定することができる。具体的には、例えば、差分符号化部3435は、入力された差分値の系列から差分値の最大値を求め、その最大値を表現するのに必要なビット数(表現ビット数)を求め、求められた表現ビット数で差分値の系列を符号化することができる。 And, it has a differential encoding unit 3435 that encodes a sequence of difference values in sequence order. The sequence of the difference value may be encoded with a fixed bit length, for example. When encoding with a fixed bit length, the bit length may be specified in advance, but this requires the number of bits necessary to express the maximum possible difference value, so the encoding size is small. Don't be. Therefore, when encoding with a fixed bit length, the differential encoding unit 3435 can determine the bit length based on the input sequence of difference values. Specifically, for example, the difference encoding unit 3435 calculates the maximum value of the difference value from the input series of difference values, determines the number of bits (expression bit number) necessary to express the maximum value, A series of difference values can be encoded with the obtained number of expression bits.
 一方、ソートされた特徴点のインデックス値と同じ順列で、対応する特徴点の局所特徴量を符号化する局所特徴量符号化部3431を有する。ソートされたインデックス値と同じ順列で符号化することで、差分符号化部3435で符号化された座標値と、それに対応する局所特徴量とを1対1で対応付けることが可能となる。局所特徴量符号化部3431は、本実施形態においては、1つの特徴点に対する150次元の局所特徴量から次元選定された局所特徴量を、例えば1次元を1バイトで符号化し、次元数のバイトで符号化することができる。 On the other hand, it has a local feature encoding unit 3431 that encodes the local feature of the corresponding feature point in the same permutation as the index value of the sorted feature point. By encoding with the same permutation as the sorted index value, it is possible to associate the coordinate value encoded by the differential encoding unit 3435 and the corresponding local feature amount on a one-to-one basis. In the present embodiment, the local feature amount encoding unit 3431 encodes a local feature amount that is dimension-selected from 150-dimensional local feature amounts for one feature point, for example, one dimension in one byte, and the number of dimensions in bytes. Can be encoded.
 (符号化の処理手順)
 図35Aは、本実施形態に係る符号化の処理手順を示すフローチャートである。
(Encoding procedure)
FIG. 35A is a flowchart showing a processing procedure of encoding according to the present embodiment.
 まず、ステップS3511において、特徴点の座標値を所望の順序で走査する。次に、ステップS3513において、走査した座標値をソートする。ステップS3515において、ソートした順に座標値の差分値を算出する。ステップS3517においては、差分値を符号化する(図35B参照)。そして、ステップS3519において、座標値のソート順に局所特徴量を符号化する。なお、差分値の符号化と局所特徴量の符号化とは並列に行なってもよい。 First, in step S3511, the coordinate values of feature points are scanned in a desired order. Next, in step S3513, the scanned coordinate values are sorted. In step S3515, a difference value of coordinate values is calculated in the sorted order. In step S3517, the difference value is encoded (see FIG. 35B). In step S3519, the local feature amounts are encoded in the coordinate value sorting order. The difference value encoding and the local feature amount encoding may be performed in parallel.
 (差分値の符号化)
 図35Bは、本実施形態に係る差分値の符号化S3517の処理手順を示すフローチャートである。
(Encoding of difference value)
FIG. 35B is a flowchart showing the processing procedure of difference value encoding S3517 according to this embodiment.
 まず、ステップS3521において、差分値が符号化可能な値域内であるか否かを判定する。符号化可能な値域内であればステップS3527に進んで、差分値を符号化する。そして、ステップS3529へ移行する。符号化可能な値域内でない場合(値域外)はステップS3523に進んで、エスケープコードを符号化する。そしてステップS3525において、ステップS3527の符号化とは異なる符号化方法で差分値を符号化する。そして、ステップS3529へ移行する。ステップS3529では、処理された差分値が差分値の系列の最後の要素であるかを判定する。最後である場合は、処理が終了する。最後でない場合は、再度ステップS3521に戻って、差分値の系列の次の差分値に対する処理が実行される。 First, in step S3521, it is determined whether or not the difference value is within a range that can be encoded. If it is within the range which can be encoded, it will progress to step S3527 and will encode a difference value. Then, control goes to a step S3529. If it is not within the range that can be encoded (outside the range), the process proceeds to step S3523 to encode the escape code. In step S3525, the difference value is encoded by an encoding method different from the encoding in step S3527. Then, control goes to a step S3529. In step S3529, it is determined whether the processed difference value is the last element in the series of difference values. If it is the last, the process ends. When it is not the last, it returns to step S3521 again and the process with respect to the next difference value of the series of a difference value is performed.
 《サーバ用の映像処理装置の機能構成》
 図36は、本実施形態に係るサーバ用の映像処理装置3220の機能構成を示すブロック図である。
<< Functional configuration of server video processing apparatus >>
FIG. 36 is a block diagram showing a functional configuration of the server video processing device 3220 according to the present embodiment.
 サーバ用の映像処理装置3220は、通信制御部3610を有する。復号部3620は、通信制御部3610を介して携帯端末から受信した、符号化された局所特徴量および特徴点座標を復号する。そして、照合部3630において、局所特徴量DB3221の認識対象物の局所特徴量と照合する。照合結果に基づいて、精度調整判定部3650は精度調整が必要か否かを判定する。精度調整が必要と判定したならば、精度調整判定部3650は、精度調整パラメータDB3222から精度調整パラメータを読み出す。照合結果と必要であれば精度調整パラメータが送信部3640から通信制御部3610を介して携帯端末に返信される。 The server video processing device 3220 includes a communication control unit 3610. The decoding unit 3620 decodes the encoded local feature amount and feature point coordinates received from the mobile terminal via the communication control unit 3610. Then, the collation unit 3630 collates with the local feature quantity of the recognition target in the local feature quantity DB 3221. Based on the collation result, the accuracy adjustment determination unit 3650 determines whether or not accuracy adjustment is necessary. If it is determined that accuracy adjustment is necessary, the accuracy adjustment determination unit 3650 reads the accuracy adjustment parameter from the accuracy adjustment parameter DB 3222. If necessary, the accuracy adjustment parameter is returned from the transmission unit 3640 to the portable terminal via the communication control unit 3610.
 [第8実施形態]
 次に、本発明の第8実施形態に係る映像処理システムについて説明する。本実施形態に係る映像処理システムは、上記第7実施形態と比べると、携帯端末において粗い低精度の照合を行ない、照合サーバにおいて密な高精度の照合を行なう映像処理システムである点で異なる。その他の構成および動作は、第7実施形態と同様であるため、同じ構成および動作については説明を省略する。
[Eighth Embodiment]
Next, a video processing system according to the eighth embodiment of the present invention will be described. The video processing system according to the present embodiment is different from the seventh embodiment in that it is a video processing system that performs coarse low-precision collation in the mobile terminal and performs dense high-precision collation in the collation server. Other configurations and operations are the same as those of the seventh embodiment, and thus description of the same configurations and operations is omitted.
 本実施形態によれば、ダイナミックに局所特徴量の精度を調整しながら、映像処理システム内で役割分担を行ない、映像中のクエリ画像内の認識対象物をリアルタイムで認識することができる。 According to the present embodiment, while dynamically adjusting the accuracy of the local feature amount, roles can be divided in the video processing system, and the recognition target object in the query image in the video can be recognized in real time.
 《映像処理システムの処理手順》
 図37は、本実施形態に係る映像処理システム3700の処理手順を示すシーケンス図である。なお、ステップS3700においては、必要であれば、照合サーバから携帯端末に対して本実施形態のアプリケーションやデータのダウンロードを行なう。
《Processing procedure of video processing system》
FIG. 37 is a sequence diagram showing a processing procedure of the video processing system 3700 according to this embodiment. In step S3700, if necessary, the application and data of this embodiment are downloaded from the verification server to the portable terminal.
 まず、ステップS3701において、携帯端末および照合サーバにおいてアプリケーションを起動し、初期化する。携帯端末は、ステップS3703において、撮像部210により映像を撮影する。携帯端末は次にステップS3705において、低精度の初期局所特徴量を生成する。そして、携帯端末は、ステップS3707において、生成した初期局所特徴量と携帯端末用局所特徴量DB3710に格納された局所特徴量を照合して、初期対象物認識を行なう。そして、ステップS3709において、認識信頼度がOKか否かを判定する。認識OKで信頼度もOKであればステップS3719に進んで、認識対象物を映像に重畳して表示する。 First, in step S3701, an application is started and initialized in the mobile terminal and the verification server. In step S3703, the portable terminal captures an image by the imaging unit 210. Next, in step S3705, the mobile terminal generates a low-precision initial local feature amount. In step S3707, the portable terminal collates the generated initial local feature quantity with the local feature quantity stored in the portable terminal local feature quantity DB 3710 to perform initial object recognition. In step S3709, it is determined whether the recognition reliability is OK. If the recognition is OK and the reliability is OK, the process advances to step S3719 to display the recognition target object superimposed on the video.
 一方、認識不能あるいは信頼度が低い場合はステップS3711に進んで、精度調整を行ない高精度の局所特徴量を生成する。そして、ステップS3713において、照合サーバに送信する。 On the other hand, if the recognition is not possible or the reliability is low, the process proceeds to step S3711 to adjust the accuracy and generate a highly accurate local feature. In step S3713, the data is transmitted to the verification server.
 照合サーバにおいては、ステップS3715において、携帯端末から受信した高精度の局所特徴量と、サーバ用局所特徴量DB3720に認識対象物に対応付けて格納された高精度の局所特徴量とを照合して、対象物を認識する。そして、ステップS3717において、認識結果の認識対象物を携帯端末に通知する。 In the collation server, in step S3715, the high-accuracy local feature received from the mobile terminal is collated with the high-accuracy local feature stored in the server local feature DB 3720 in association with the recognition target. , Recognize the object. In step S3717, the recognition target object of the recognition result is notified to the portable terminal.
 携帯端末は、ステップS3719において、受信した認識対象物を映像に重畳して表示する。 In step S3719, the portable terminal displays the received recognition target object superimposed on the video.
 本実施形態においては、かかる一連の処理がリアルタイムで実現され、ユーザは入力映像中で認識対象物を知ることができる。 In the present embodiment, such a series of processing is realized in real time, and the user can know the recognition object in the input video.
 (携帯端末用局所特徴量DB)
 図38Aは、本実施形態に係る携帯端末用局所特徴量DB3710の構成を示すブロック図である。
(Local feature DB for mobile terminals)
FIG. 38A is a block diagram showing a configuration of the local feature DB 3710 for mobile terminal according to the present embodiment.
 携帯端末用局所特徴量DB3710は、認識対象物ID3811と認識対象物名3812に対応付けて、本例では50次元の最小特徴点数mminの局所特徴量3813から3815を記憶する。なお、50次元や特徴点数mminは一例であって、これに限定されない。 The local feature DB 3710 for mobile terminals stores local feature values 3813 to 3815 having a minimum feature number mmin of 50 dimensions in this example in association with the recognition object ID 3811 and the recognition object name 3812. Note that the 50 dimensions and the number of feature points mmin are examples, and are not limited thereto.
 (サーバ用局所特徴量DB)
 図38Bは、本実施形態に係るサーバ用局所特徴量DB3720の構成を示すブロック図である。
(Local feature DB for servers)
FIG. 38B is a block diagram showing a configuration of the server local feature DB 3720 according to the present embodiment.
 サーバ用局所特徴量DB3720は、認識対象物ID3821と認識対象物名3822に対応付けて、本例では150次元の最大特徴点数mmaxの局所特徴量3823から3825を記憶する。なお、150次元や特徴点数mmaxは一例であって、これに限定されない。 The server local feature DB 3720 stores, in this example, local feature amounts 3823 to 3825 having a maximum feature number mmax of 150 dimensions in association with the recognition object ID 3821 and the recognition object name 3822. The 150 dimensions and the number of feature points mmax are merely examples, and are not limited thereto.
 [他の実施形態]
 以上、実施形態を参照して本発明を説明したが、本発明は上記実施形態に限定されものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解し得る様々な変更をすることができる。また、それぞれの実施形態に含まれる別々の特徴を如何様に組み合わせたシステムまたは装置も、本発明の範疇に含まれる。
[Other Embodiments]
Although the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention. In addition, a system or an apparatus in which different features included in each embodiment are combined in any way is also included in the scope of the present invention.
 また、本発明は、複数の機器から構成されるシステムに適用されてもよいし、単体の装置に適用されてもよい。さらに、本発明は、実施形態の機能を実現する制御プログラムが、システムあるいは装置に直接あるいは遠隔から供給される場合にも適用可能である。したがって、本発明の機能をコンピュータで実現するために、コンピュータにインストールされる制御プログラム、あるいはその制御プログラムを格納した媒体、その制御プログラムをダウンロードさせるWWW(World Wide Web)サーバも、本発明の範疇に含まれる。 Further, the present invention may be applied to a system composed of a plurality of devices, or may be applied to a single device. Furthermore, the present invention can also be applied to a case where a control program that realizes the functions of the embodiments is supplied directly or remotely to a system or apparatus. Therefore, in order to realize the functions of the present invention on a computer, a control program installed in the computer, a medium storing the control program, and a WWW (World Wide Web) server that downloads the control program are also included in the scope of the present invention. include.
 この出願は、2011年12月15日に出願された日本出願特願2011-273939を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2011-273939 filed on December 15, 2011, the entire disclosure of which is incorporated herein.
 本実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。
 (付記1)
 認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段と、
 映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成手段と、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識手段と、
 前記第2局所特徴量生成手段により生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整手段と、
 を備えることを特徴とする映像処理装置。
 (付記2)
 前記精度調整手段は、前記認識手段の認識結果の信頼度に基づいて、前記n個の第2局所特徴量の精度を調整するように制御することを特徴とする付記1に記載の映像処理装置。
 (付記3)
 前記精度調整手段は、前記第2局所特徴量生成手段が生成する第2局所特徴量のデータ量に基づいて、前記n個の第2局所特徴量の精度を調整するように制御することを特徴とする付記1に記載の映像処理装置。
 (付記4)
 前記精度調整手段が前記n個の第2局所特徴量の精度を第1精度に調整して、前記認識手段によって、前記映像中の前記画像に存在する認識対象物を認識した後、
 前記精度調整手段が前記n個の第2局所特徴量の精度を前記第1精度より高い第2精度に調整して、前記認識手段によって、前記認識対象物をより詳細に認識するように制御する制御手段を、さらに備えることを特徴とする付記1乃至3のいずれか1つに記載の映像処理装置。
 (付記5)
 前記精度調整手段が前記n個の第2局所特徴量の精度を第1精度に調整して、前記認識手段によって、前記映像中の前記画像に存在する認識対象物を認識した後、
 前記精度調整手段が前記n個の第2局所特徴量の精度を前記第1精度より高い第2精度に調整して、前記認識手段によって、前記認識対象物を構成する複数の認識対象物を認識するように制御する制御手段を、さらに備えることを特徴とする付記1乃至3のいずれか1つに記載の映像処理装置。
 (付記6)
 前記精度調整手段が前記n個の第2局所特徴量の精度を第1精度に調整して、前記認識手段によって、前記映像中の前記画像に存在する認識対象物の変化を検出した後、
 前記精度調整手段が前記n個の第2局所特徴量の精度を前記第1精度より高い第2精度に調整して、前記認識手段によって、変化を検出された前記認識対象物をより詳細に認識するように制御する制御手段を、さらに備えることを特徴とする付記1乃至3のいずれか1つに記載の映像処理装置。
 (付記7)
 前記第1局所特徴量および前記第2局所特徴量は、画像から抽出した特徴点を含む局所領域を複数のサブ領域に分割し、前記複数のサブ領域内の勾配方向のヒストグラムからなる複数次元の特徴ベクトルを生成することにより生成されることを特徴とする付記1乃至6のいずれか1つに記載の映像処理装置。
 (付記8)
 前記第1局所特徴量および前記第2局所特徴量は、前記生成した複数次元の特徴ベクトルから、隣接するサブ領域間の相関がより大きな次元を削除することにより生成されることを特徴とする付記7に記載の映像処理装置。
 (付記9)
 前記第1局所特徴量および前記第2局所特徴量は、画像から抽出した前記複数の特徴点から、重要度がより少ないと判断された特徴点を削除することにより生成されることを特徴とする付記7または8に記載の映像処理装置。
 (付記10)
 前記特徴ベクトルの複数次元は、前記特徴点の特徴に寄与する次元から順に、かつ、前記局所特徴量に対して求められる精度の向上に応じて第1次元から順に選択できるよう、所定の次元数ごとに前記局所領域をひと回りするよう選定することを特徴とする付記7乃至9のいずれか1つに記載の映像処理装置。
 (付記11)
 前記精度調整手段は、前記第2局所特徴量生成手段により生成する前記n個の第2局所特徴量おける前記特徴ベクトルの次元数と、前記第2局所特徴量生成手段により抽出する前記n個の特徴点数との、少なくともいずれか1つを調整することを特徴とする付記1乃至10のいずれか1つに記載の映像処理装置。
 (付記12)
 前記精度調整手段は、前記第2局所特徴量生成手段により生成する前記n個の特徴点における局所領域のサイズと、該局所領域の形状と、前記局所領域をサブ領域に分割する分割数と、特徴ベクトルの方向数との、少なくともいずれか1つを調整することを特徴とする付記11に記載の映像処理装置。
 (付記13)
 前記第1局所特徴量記憶手段は、前記m個の第1局所特徴量と、前記認識対象物の画像における前記m個の特徴点の位置座標との組を記憶し、
 前記第2局所特徴量生成手段は、前記n個の第2局所特徴量と、前記映像中の画像における前記n個の特徴点の位置座標との組みを保持し、
 前記認識手段は、前記n個の第2局所特徴量とその位置座標との組の集合と、前記m個の第1局所特徴量とその位置座標との組の所定割合以上の集合とが線形変換の関係であると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識することを特徴とする付記1乃至12のいずれか1つに記載の映像処理装置。
 (付記14)
 前記認識手段が認識した前記認識対象物を示す情報を前記映像中の前記認識対象物が存在する画像に重畳して表示する表示手段をさらに備えることを特徴とする付記1乃至13のいずれか1つに記載の映像処理装置。
 (付記15)
 認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた映像処理装置の制御方法であって、
 映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識ステップと、
 前記第2局所特徴量生成ステップにより生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整ステップと、
 を含むことを特徴とする映像処理装置の制御方法。
 (付記16)
 認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた映像処理装置の制御プログラムであって、
 映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識ステップと、
 前記第2局所特徴量生成ステップにより生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整ステップと、
 をコンピュータに実行させることを特徴とする制御プログラム。
 (付記17)
 ネットワークを介して接続される携帯端末用の映像処理装置とサーバ用の映像処理装置とを有する映像処理システムであって、
 認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段と、
 映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成手段と、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識手段と、
 前記第2局所特徴量生成手段により生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整手段と、
 を備えることを特徴とする映像処理システム。
 (付記18)
 前記携帯端末用の映像処理装置は、
  前記第2局所特徴量生成手段と、
  前記n個の第2局所特徴量を符号化し、前記ネットワークを介して前記サーバ用の映像処理装置に送信する第1送信手段と、
  前記精度調整手段と、
  前記精度調整手段による精度調整の指示を受信する第1受信手段と、
  前記サーバ用の映像処理装置において認識された認識対象物を示す情報を、前記サーバ用の映像処理装置から受信する第2受信手段と、
 を備え、
 前記サーバ用の映像処理装置は、
  前記第1局所特徴量記憶手段と、
  符号化された前記n個の第2局所特徴量を、前記携帯端末用の映像処理装置から受信して復号する第3受信手段と、
  前記認識手段と、
  前記認識手段の認識結果に基づいて、前記精度調整手段による精度調整の指示を送信する第2送信手段と、
  前記認識手段が認識した前記認識対象物を示す情報を、前記ネットワークを介して前記携帯端末用の映像処理装置に送信する第3送信手段と、
 を備えることを特徴とする付記17に記載の映像処理システム。
 (付記19)
 付記17または18に記載の映像処理システムにおける携帯端末用の映像処理装置であって、
 映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成手段と、
 前記n個の第2局所特徴量を符号化し、ネットワークを介してサーバ用の映像処理装置に送信する第1送信手段と、
  前記第2局所特徴量生成手段により生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整手段と、
  前記精度調整手段による精度調整の指示を受信する第1受信手段と、
 前記サーバ用の映像処理装置において認識された認識対象物を示す情報を、前記サーバ用の映像処理装置から受信する第2受信手段と、
 を備えることを特徴とする映像処理装置。
 (付記20)
 付記17または18に記載の映像処理システムにおける携帯端末用の映像処理装置の制御方法であって、
 映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
 前記n個の第2局所特徴量を符号化し、ネットワークを介してサーバ用の映像処理装置に送信する第1送信ステップと、
  前記第2局所特徴量生成ステップにおいて生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整ステップと、
  前記精度調整ステップにおける精度調整の指示を受信する第1受信ステップと、
 前記サーバ用の映像処理装置において認識された認識対象物を示す情報を、前記サーバ用の映像処理装置から受信する第2受信ステップと、
 を含むことを特徴とする映像処理装置の制御方法。
 (付記21)
 付記17または18に記載の映像処理システムにおける携帯端末用の映像処理装置の制御プログラムであって、
 映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
 前記n個の第2局所特徴量を符号化し、ネットワークを介してサーバ用の映像処理装置に送信する第1送信ステップと、
  前記第2局所特徴量生成ステップにおいて生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整ステップと、
  前記精度調整ステップにおける精度調整の指示を受信する第1受信ステップと、
 前記サーバ用の映像処理装置において認識された認識対象物を示す情報を、前記サーバ用の映像処理装置から受信する第2受信ステップと、
 をコンピュータに実行させることを特徴とする制御プログラム。
 (付記22)
 付記17または18に記載の映像処理システムにおけるサーバ用の映像処理装置であって、
 認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段と、
 符号化された前記n個の第2局所特徴量を、携帯端末用の映像処理装置から受信して復号する第3受信手段と、
 前記第1局所特徴量の特徴ベクトルの次元数iと、前記第2局所特徴量の特徴ベクトルの次元数jとの内からより少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識手段と、
 前記認識手段が認識した前記認識対象物を示す情報を、ネットワークを介して前記携帯端末用の映像処理装置に送信する第3送信手段と、
 を備えることを特徴とするサーバ用の映像処理装置。
 (付記23)
 付記17または18に記載の映像処理システムにおける、認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えたサーバ用の映像処理装置の制御方法であって、
 符号化された前記n個の第2局所特徴量を、携帯端末用の映像処理装置から受信して復号する第3受信ステップと、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識ステップと、
 前記認識ステップにおいて認識した前記認識対象物を示す情報を、ネットワークを介して前記携帯端末用の映像処理装置に送信する第3送信ステップと、
 を含むことを特徴とするサーバ用の映像処理装置の制御方法。
 (付記24)
 付記17または18に記載の映像処理システムにおける、認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えたサーバ用の映像処理装置の制御プログラムであって、
 符号化された前記n個の第2局所特徴量を、携帯端末用の映像処理装置から受信して復号する第3受信ステップと、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識ステップと、
 前記認識ステップにおいて認識した前記認識対象物を示す情報を、ネットワークを介して前記携帯端末用の映像処理装置に送信する第3送信ステップと、
 をコンピュータに実行させることを特徴とする制御プログラム。
 (付記25)
 ネットワークを介して接続される携帯端末用の映像処理装置とサーバ用の映像処理装置とを有し、認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた映像処理システムにおける映像処理方法であって、
 映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
 前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識ステップと、
 前記第2局所特徴量生成ステップにおいて生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整ステップと、
 を含むことを特徴とする映像処理方法。
A part or all of the present embodiment can be described as in the following supplementary notes, but is not limited thereto.
(Appendix 1)
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object. First local feature quantity storage means for storing the local feature quantity in association with each other;
N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. Second local feature generating means for generating a quantity;
Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition means for recognizing that a recognition object exists;
Precision adjusting means for controlling to adjust the precision of the n second local feature values generated by the second local feature value generating means;
A video processing apparatus comprising:
(Appendix 2)
The video processing apparatus according to appendix 1, wherein the accuracy adjustment unit controls the accuracy of the n second local feature values based on the reliability of the recognition result of the recognition unit. .
(Appendix 3)
The accuracy adjusting means controls to adjust the precision of the n second local feature values based on a data amount of the second local feature value generated by the second local feature value generating unit. The video processing apparatus according to Supplementary Note 1.
(Appendix 4)
After the accuracy adjusting means adjusts the precision of the n second local feature amounts to the first precision, and the recognition means recognizes a recognition target existing in the image in the video,
The accuracy adjusting unit adjusts the accuracy of the n second local feature amounts to a second accuracy higher than the first accuracy, and controls the recognition unit to recognize the recognition target object in more detail. The video processing apparatus according to any one of supplementary notes 1 to 3, further comprising a control unit.
(Appendix 5)
After the accuracy adjusting means adjusts the precision of the n second local feature amounts to the first precision, and the recognition means recognizes a recognition target existing in the image in the video,
The accuracy adjusting unit adjusts the accuracy of the n second local feature amounts to a second accuracy higher than the first accuracy, and recognizes a plurality of recognition objects constituting the recognition object by the recognition unit. The video processing apparatus according to any one of appendices 1 to 3, further comprising control means for performing control.
(Appendix 6)
After the accuracy adjusting means adjusts the accuracy of the n second local feature amounts to the first accuracy, and the recognition means detects a change in the recognition target existing in the image in the video,
The accuracy adjusting unit adjusts the accuracy of the n second local feature quantities to a second accuracy higher than the first accuracy, and the recognition unit recognizes the recognition object whose change has been detected in more detail. The video processing apparatus according to any one of appendices 1 to 3, further comprising control means for performing control.
(Appendix 7)
The first local feature value and the second local feature value are obtained by dividing a local region including a feature point extracted from an image into a plurality of sub-regions, and having a plurality of dimensions including histograms of gradient directions in the plurality of sub-regions. The video processing device according to any one of appendices 1 to 6, wherein the video processing device is generated by generating a feature vector.
(Appendix 8)
The first local feature and the second local feature are generated by deleting a dimension having a larger correlation between adjacent sub-regions from the generated multi-dimensional feature vector. 8. The video processing device according to 7.
(Appendix 9)
The first local feature value and the second local feature value are generated by deleting feature points determined to be less important from the plurality of feature points extracted from an image. The video processing apparatus according to appendix 7 or 8.
(Appendix 10)
The plurality of dimensions of the feature vector is a predetermined number of dimensions so that it can be selected in order from the dimension that contributes to the feature of the feature point and from the first dimension in accordance with the improvement in accuracy required for the local feature amount. 10. The video processing device according to any one of appendices 7 to 9, wherein the video processing device is selected so as to go around the local region every time.
(Appendix 11)
The accuracy adjusting means includes the number of dimensions of the feature vector in the n second local feature values generated by the second local feature value generating means, and the n number of dimensions extracted by the second local feature value generating means. 11. The video processing apparatus according to any one of appendices 1 to 10, wherein at least one of the feature points is adjusted.
(Appendix 12)
The accuracy adjusting unit includes a size of a local region at the n feature points generated by the second local feature amount generating unit, a shape of the local region, and a division number for dividing the local region into sub-regions, 12. The video processing apparatus according to appendix 11, wherein at least one of the direction number of the feature vector is adjusted.
(Appendix 13)
The first local feature quantity storage means stores a set of the m first local feature quantities and the position coordinates of the m feature points in the image of the recognition object,
The second local feature quantity generation means holds a set of the n second local feature quantities and the position coordinates of the n feature points in the image in the video,
The recognizing unit is configured such that a set of a set of the n second local feature quantities and their position coordinates and a set of a set ratio of the m first local feature quantities and their position coordinates are a predetermined ratio or more are linear. 13. The video processing apparatus according to any one of appendices 1 to 12, wherein when it is determined that the relationship is a conversion relationship, the recognition target object is recognized in the image in the video.
(Appendix 14)
Any one of Supplementary notes 1 to 13, further comprising display means for displaying the information indicating the recognition object recognized by the recognition means so as to be superimposed on an image in which the recognition object exists in the video. The video processing device described in 1.
(Appendix 15)
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object. A method for controlling a video processing apparatus including a first local feature amount storage unit that stores a local feature amount in association with each other,
N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a quantity;
Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition step for recognizing that a recognition object exists;
An accuracy adjustment step for controlling to adjust the accuracy of the n second local feature values generated by the second local feature value generation step;
A control method for a video processing apparatus, comprising:
(Appendix 16)
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object. A control program for a video processing apparatus including a first local feature amount storage unit that stores a local feature amount in association with each other,
N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a quantity;
Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition step for recognizing that a recognition object exists;
An accuracy adjustment step for controlling to adjust the accuracy of the n second local feature values generated by the second local feature value generation step;
A control program for causing a computer to execute.
(Appendix 17)
A video processing system having a video processing device for a mobile terminal and a video processing device for a server connected via a network,
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object. First local feature quantity storage means for storing the local feature quantity in association with each other;
N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. Second local feature generating means for generating a quantity;
Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition means for recognizing that a recognition object exists;
Precision adjusting means for controlling to adjust the precision of the n second local feature values generated by the second local feature value generating means;
A video processing system comprising:
(Appendix 18)
The video processing device for the portable terminal is:
The second local feature quantity generating means;
First transmitting means for encoding the n second local feature values and transmitting the encoded second local feature values to the server video processing apparatus via the network;
The accuracy adjusting means;
First receiving means for receiving an instruction of accuracy adjustment by the accuracy adjusting means;
Second receiving means for receiving information indicating a recognition object recognized by the server video processing device from the server video processing device;
With
The server video processing device is:
The first local feature storage means;
Third receiving means for receiving and decoding the n second local feature values encoded from the video processing device for the mobile terminal;
The recognition means;
Second transmission means for transmitting an accuracy adjustment instruction by the accuracy adjustment means based on the recognition result of the recognition means;
Third transmission means for transmitting information indicating the recognition object recognized by the recognition means to the video processing device for the portable terminal via the network;
The video processing system according to appendix 17, further comprising:
(Appendix 19)
A video processing apparatus for a portable terminal in the video processing system according to appendix 17 or 18,
N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. Second local feature generating means for generating a quantity;
First transmitting means for encoding the n second local feature values and transmitting the encoded second local feature values to a video processing apparatus for a server via a network;
Precision adjusting means for controlling to adjust the precision of the n second local feature values generated by the second local feature value generating means;
First receiving means for receiving an instruction of accuracy adjustment by the accuracy adjusting means;
Second receiving means for receiving information indicating a recognition object recognized by the server video processing device from the server video processing device;
A video processing apparatus comprising:
(Appendix 20)
A method for controlling a video processing device for a portable terminal in the video processing system according to appendix 17 or 18,
N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a quantity;
A first transmission step of encoding the n second local feature amounts and transmitting the encoded second local feature amounts to a video processing apparatus for a server via a network;
An accuracy adjustment step of controlling to adjust the accuracy of the n second local feature values generated in the second local feature value generation step;
A first receiving step of receiving an instruction of accuracy adjustment in the accuracy adjustment step;
A second receiving step of receiving, from the server video processing device, information indicating a recognition object recognized by the server video processing device;
A control method for a video processing apparatus, comprising:
(Appendix 21)
A control program for a video processing device for a portable terminal in the video processing system according to appendix 17 or 18,
N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a quantity;
A first transmission step of encoding the n second local feature amounts and transmitting the encoded second local feature amounts to a video processing apparatus for a server via a network;
An accuracy adjustment step of controlling to adjust the accuracy of the n second local feature values generated in the second local feature value generation step;
A first receiving step of receiving an instruction of accuracy adjustment in the accuracy adjustment step;
A second receiving step of receiving, from the server video processing device, information indicating a recognition object recognized by the server video processing device;
A control program for causing a computer to execute.
(Appendix 22)
An image processing device for a server in the image processing system according to appendix 17 or 18,
Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object. First local feature quantity storage means for storing the local feature quantity in association with each other;
Third receiving means for receiving and decoding the encoded second local feature quantities from the video processing device for a mobile terminal;
A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and from the feature vector up to the selected dimension number The image in the video when it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts composed of feature vectors up to the selected number of dimensions. Recognizing means for recognizing that the object to be recognized exists.
Third transmission means for transmitting information indicating the recognition object recognized by the recognition means to the video processing device for the portable terminal via a network;
A video processing apparatus for a server, comprising:
(Appendix 23)
In the video processing system according to attachment 17 or 18, each of the recognition target and the m local regions including each of the m feature points of the image of the recognition target is generated from one dimension to i A control method for a video processing apparatus for a server, comprising first local feature amount storage means for storing m first local feature amounts consisting of feature vectors up to dimensions in association with each other,
A third receiving step of receiving and decoding the encoded second local feature quantities from the video processing device for a mobile terminal;
Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition step for recognizing that a recognition object exists;
A third transmission step of transmitting information indicating the recognition object recognized in the recognition step to the mobile terminal video processing device via a network;
A method for controlling a video processing apparatus for a server, comprising:
(Appendix 24)
In the video processing system according to attachment 17 or 18, each of the recognition target and the m local regions including each of the m feature points of the image of the recognition target is generated from one dimension to i A control program for a video processing apparatus for a server including a first local feature storage unit that stores m first local features including feature vectors up to dimensions in association with each other,
A third receiving step of receiving and decoding the encoded second local feature quantities from the video processing device for a mobile terminal;
Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition step for recognizing that a recognition object exists;
A third transmission step of transmitting information indicating the recognition object recognized in the recognition step to the mobile terminal video processing device via a network;
A control program for causing a computer to execute.
(Appendix 25)
A mobile terminal video processing apparatus and a server video processing apparatus connected via a network, each of which includes a recognition target object and m feature points of an image of the recognition target object. Video processing system including first local feature storage means for storing m first local feature amounts each including feature vectors from one dimension to i dimension generated for each local region in association with each other A video processing method in
N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a quantity;
Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition step for recognizing that a recognition object exists;
An accuracy adjustment step of controlling to adjust the accuracy of the n second local feature values generated in the second local feature value generation step;
A video processing method comprising:

Claims (25)

  1.  認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段と、
     映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成手段と、
     前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識手段と、
     前記第2局所特徴量生成手段により生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整手段と、
     を備えることを特徴とする映像処理装置。
    Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object. First local feature quantity storage means for storing the local feature quantity in association with each other;
    N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. Second local feature generating means for generating a quantity;
    Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition means for recognizing that a recognition object exists;
    Precision adjusting means for controlling to adjust the precision of the n second local feature values generated by the second local feature value generating means;
    A video processing apparatus comprising:
  2.  前記精度調整手段は、前記認識手段の認識結果の信頼度に基づいて、前記n個の第2局所特徴量の精度を調整するように制御することを特徴とする請求項1に記載の映像処理装置。 2. The video processing according to claim 1, wherein the accuracy adjustment unit controls to adjust the accuracy of the n second local feature amounts based on a reliability of a recognition result of the recognition unit. apparatus.
  3.  前記精度調整手段は、前記第2局所特徴量生成手段が生成する第2局所特徴量のデータ量に基づいて、前記n個の第2局所特徴量の精度を調整するように制御することを特徴とする請求項1に記載の映像処理装置。 The accuracy adjusting means controls to adjust the precision of the n second local feature values based on a data amount of the second local feature value generated by the second local feature value generating unit. The video processing apparatus according to claim 1.
  4.  前記精度調整手段が前記n個の第2局所特徴量の精度を第1精度に調整して、前記認識手段によって、前記映像中の前記画像に存在する認識対象物を認識した後、
     前記精度調整手段が前記n個の第2局所特徴量の精度を前記第1精度より高い第2精度に調整して、前記認識手段によって、前記認識対象物をより詳細に認識するように制御する制御手段を、さらに備えることを特徴とする請求項1乃至3のいずれか1項に記載の映像処理装置。
    After the accuracy adjusting means adjusts the precision of the n second local feature amounts to the first precision, and the recognition means recognizes a recognition target existing in the image in the video,
    The accuracy adjusting unit adjusts the accuracy of the n second local feature amounts to a second accuracy higher than the first accuracy, and controls the recognition unit to recognize the recognition target object in more detail. The video processing apparatus according to claim 1, further comprising a control unit.
  5.  前記精度調整手段が前記n個の第2局所特徴量の精度を第1精度に調整して、前記認識手段によって、前記映像中の前記画像に存在する認識対象物を認識した後、
     前記精度調整手段が前記n個の第2局所特徴量の精度を前記第1精度より高い第2精度に調整して、前記認識手段によって、前記認識対象物を構成する複数の認識対象物を認識するように制御する制御手段を、さらに備えることを特徴とする請求項1乃至3のいずれか1項に記載の映像処理装置。
    After the accuracy adjusting means adjusts the precision of the n second local feature amounts to the first precision, and the recognition means recognizes a recognition target existing in the image in the video,
    The accuracy adjusting unit adjusts the accuracy of the n second local feature amounts to a second accuracy higher than the first accuracy, and recognizes a plurality of recognition objects constituting the recognition object by the recognition unit. The video processing apparatus according to claim 1, further comprising a control unit configured to perform control.
  6.  前記精度調整手段が前記n個の第2局所特徴量の精度を第1精度に調整して、前記認識手段によって、前記映像中の前記画像に存在する認識対象物の変化を検出した後、
     前記精度調整手段が前記n個の第2局所特徴量の精度を前記第1精度より高い第2精度に調整して、前記認識手段によって、変化を検出された前記認識対象物をより詳細に認識するように制御する制御手段を、さらに備えることを特徴とする請求項1乃至3のいずれか1項に記載の映像処理装置。
    After the accuracy adjusting means adjusts the accuracy of the n second local feature amounts to the first accuracy, and the recognition means detects a change in the recognition target existing in the image in the video,
    The accuracy adjusting unit adjusts the accuracy of the n second local feature quantities to a second accuracy higher than the first accuracy, and the recognition unit recognizes the recognition object whose change has been detected in more detail. The video processing apparatus according to claim 1, further comprising a control unit configured to perform control.
  7.  前記第1局所特徴量および前記第2局所特徴量は、画像から抽出した特徴点を含む局所領域を複数のサブ領域に分割し、前記複数のサブ領域内の勾配方向のヒストグラムからなる複数次元の特徴ベクトルを生成することにより生成されることを特徴とする請求項1乃至6のいずれか1項に記載の映像処理装置。 The first local feature value and the second local feature value are obtained by dividing a local region including a feature point extracted from an image into a plurality of sub-regions, and having a plurality of dimensions including histograms of gradient directions in the plurality of sub-regions. The video processing apparatus according to claim 1, wherein the video processing apparatus is generated by generating a feature vector.
  8.  前記第1局所特徴量および前記第2局所特徴量は、前記生成した複数次元の特徴ベクトルから、隣接するサブ領域間の相関がより大きな次元を削除することにより生成されることを特徴とする請求項7に記載の映像処理装置。 The first local feature value and the second local feature value are generated by deleting a dimension having a larger correlation between adjacent sub-regions from the generated multi-dimensional feature vector. Item 8. The video processing device according to Item 7.
  9.  前記第1局所特徴量および前記第2局所特徴量は、画像から抽出した前記複数の特徴点から、重要度がより少ないと判断された特徴点を削除することにより生成されることを特徴とする請求項7または8に記載の映像処理装置。 The first local feature value and the second local feature value are generated by deleting feature points determined to be less important from the plurality of feature points extracted from an image. The video processing apparatus according to claim 7 or 8.
  10.  前記特徴ベクトルの複数次元は、前記特徴点の特徴に寄与する次元から順に、かつ、前記局所特徴量に対して求められる精度の向上に応じて第1次元から順に選択できるよう、所定の次元数ごとに前記局所領域をひと回りするよう選定することを特徴とする請求項7乃至9のいずれか1項に記載の映像処理装置。 The plurality of dimensions of the feature vector is a predetermined number of dimensions so that it can be selected in order from the dimension that contributes to the feature of the feature point and from the first dimension in accordance with the improvement in accuracy required for the local feature amount. The video processing apparatus according to claim 7, wherein the local region is selected so as to go around the local region every time.
  11.  前記精度調整手段は、前記第2局所特徴量生成手段により生成する前記n個の第2局所特徴量おける前記特徴ベクトルの次元数と、前記第2局所特徴量生成手段により抽出する前記n個の特徴点数との、少なくともいずれか1つを調整することを特徴とする請求項1乃至10のいずれか1項に記載の映像処理装置。 The accuracy adjusting means includes the number of dimensions of the feature vector in the n second local feature values generated by the second local feature value generating means, and the n number of dimensions extracted by the second local feature value generating means. The video processing apparatus according to claim 1, wherein at least one of the feature points is adjusted.
  12.  前記精度調整手段は、前記第2局所特徴量生成手段により生成する前記n個の特徴点における局所領域のサイズと、該局所領域の形状と、前記局所領域をサブ領域に分割する分割数と、特徴ベクトルの方向数との、少なくともいずれか1つを調整することを特徴とする請求項11に記載の映像処理装置。 The accuracy adjusting unit includes a size of a local region at the n feature points generated by the second local feature amount generating unit, a shape of the local region, and a division number for dividing the local region into sub-regions, The video processing apparatus according to claim 11, wherein at least one of the direction numbers of the feature vectors is adjusted.
  13.  前記第1局所特徴量記憶手段は、前記m個の第1局所特徴量と、前記認識対象物の画像における前記m個の特徴点の位置座標との組を記憶し、
     前記第2局所特徴量生成手段は、前記n個の第2局所特徴量と、前記映像中の画像における前記n個の特徴点の位置座標との組みを保持し、
     前記認識手段は、前記n個の第2局所特徴量とその位置座標との組の集合と、前記m個の第1局所特徴量とその位置座標との組の所定割合以上の集合とが線形変換の関係であると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識することを特徴とする請求項1乃至12のいずれか1項に記載の映像処理装置。
    The first local feature quantity storage means stores a set of the m first local feature quantities and the position coordinates of the m feature points in the image of the recognition object,
    The second local feature quantity generation means holds a set of the n second local feature quantities and the position coordinates of the n feature points in the image in the video,
    The recognizing unit is configured such that a set of a set of the n second local feature quantities and their position coordinates and a set of a set ratio of the m first local feature quantities and their position coordinates are a predetermined ratio or more are linear. 13. The video processing apparatus according to claim 1, wherein when it is determined that the relationship is a conversion relationship, the recognition target object is recognized to exist in the image in the video.
  14.  前記認識手段が認識した前記認識対象物を示す情報を前記映像中の前記認識対象物が存在する画像に重畳して表示する表示手段をさらに備えることを特徴とする請求項1乃至13のいずれか1項に記載の映像処理装置。 14. The display device according to claim 1, further comprising display means for displaying information indicating the recognition object recognized by the recognition means so as to be superimposed on an image in which the recognition object exists in the video. The video processing apparatus according to item 1.
  15.  認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた映像処理装置の制御方法であって、
     映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
     前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識ステップと、
     前記第2局所特徴量生成ステップにより生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整ステップと、
     を含むことを特徴とする映像処理装置の制御方法。
    Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object. A method for controlling a video processing apparatus including a first local feature amount storage unit that stores a local feature amount in association with each other,
    N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a quantity;
    Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition step for recognizing that a recognition object exists;
    An accuracy adjustment step for controlling to adjust the accuracy of the n second local feature values generated by the second local feature value generation step;
    A control method for a video processing apparatus, comprising:
  16.  認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた映像処理装置の制御プログラムであって、
     映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
     前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識ステップと、
     前記第2局所特徴量生成ステップにより生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整ステップと、
     をコンピュータに実行させることを特徴とする制御プログラム。
    Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object. A control program for a video processing apparatus including a first local feature amount storage unit that stores a local feature amount in association with each other,
    N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a quantity;
    Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition step for recognizing that a recognition object exists;
    An accuracy adjustment step for controlling to adjust the accuracy of the n second local feature values generated by the second local feature value generation step;
    A control program for causing a computer to execute.
  17.  ネットワークを介して接続される携帯端末用の映像処理装置とサーバ用の映像処理装置とを有する映像処理システムであって、
     認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段と、
     映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成手段と、
     前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識手段と、
     前記第2局所特徴量生成手段により生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整手段と、
     を備えることを特徴とする映像処理システム。
    A video processing system having a video processing device for a mobile terminal and a video processing device for a server connected via a network,
    Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object. First local feature quantity storage means for storing the local feature quantity in association with each other;
    N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. Second local feature generating means for generating a quantity;
    Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition means for recognizing that a recognition object exists;
    Precision adjusting means for controlling to adjust the precision of the n second local feature values generated by the second local feature value generating means;
    A video processing system comprising:
  18.  前記携帯端末用の映像処理装置は、
      前記第2局所特徴量生成手段と、
      前記n個の第2局所特徴量を符号化し、前記ネットワークを介して前記サーバ用の映像処理装置に送信する第1送信手段と、
      前記精度調整手段と、
      前記精度調整手段による精度調整の指示を受信する第1受信手段と、
      前記サーバ用の映像処理装置において認識された認識対象物を示す情報を、前記サーバ用の映像処理装置から受信する第2受信手段と、
     を備え、
     前記サーバ用の映像処理装置は、
      前記第1局所特徴量記憶手段と、
      符号化された前記n個の第2局所特徴量を、前記携帯端末用の映像処理装置から受信して復号する第3受信手段と、
      前記認識手段と、
      前記認識手段の認識結果に基づいて、前記精度調整手段による精度調整の指示を送信する第2送信手段と、
      前記認識手段が認識した前記認識対象物を示す情報を、前記ネットワークを介して前記携帯端末用の映像処理装置に送信する第3送信手段と、
     を備えることを特徴とする請求項17に記載の映像処理システム。
    The video processing device for the portable terminal is:
    The second local feature quantity generating means;
    First transmitting means for encoding the n second local feature values and transmitting the encoded second local feature values to the server video processing apparatus via the network;
    The accuracy adjusting means;
    First receiving means for receiving an instruction of accuracy adjustment by the accuracy adjusting means;
    Second receiving means for receiving information indicating a recognition object recognized by the server video processing device from the server video processing device;
    With
    The server video processing device is:
    The first local feature storage means;
    Third receiving means for receiving and decoding the n second local feature values encoded from the video processing device for the mobile terminal;
    The recognition means;
    Second transmission means for transmitting an accuracy adjustment instruction by the accuracy adjustment means based on the recognition result of the recognition means;
    Third transmission means for transmitting information indicating the recognition object recognized by the recognition means to the video processing device for the portable terminal via the network;
    The video processing system according to claim 17, further comprising:
  19.  請求項17または18に記載の映像処理システムにおける携帯端末用の映像処理装置であって、
     映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成手段と、
     前記n個の第2局所特徴量を符号化し、ネットワークを介してサーバ用の映像処理装置に送信する第1送信手段と、
      前記第2局所特徴量生成手段により生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整手段と、
      前記精度調整手段による精度調整の指示を受信する第1受信手段と、
     前記サーバ用の映像処理装置において認識された認識対象物を示す情報を、前記サーバ用の映像処理装置から受信する第2受信手段と、
     を備えることを特徴とする映像処理装置。
    A video processing device for a portable terminal in the video processing system according to claim 17 or 18,
    N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. Second local feature generating means for generating a quantity;
    First transmitting means for encoding the n second local feature values and transmitting the encoded second local feature values to a video processing apparatus for a server via a network;
    Precision adjusting means for controlling to adjust the precision of the n second local feature values generated by the second local feature value generating means;
    First receiving means for receiving an instruction of accuracy adjustment by the accuracy adjusting means;
    Second receiving means for receiving information indicating a recognition object recognized by the server video processing device from the server video processing device;
    A video processing apparatus comprising:
  20.  請求項17または18に記載の映像処理システムにおける携帯端末用の映像処理装置の制御方法であって、
     映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
     前記n個の第2局所特徴量を符号化し、ネットワークを介してサーバ用の映像処理装置に送信する第1送信ステップと、
      前記第2局所特徴量生成ステップにおいて生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整ステップと、
      前記精度調整ステップにおける精度調整の指示を受信する第1受信ステップと、
     前記サーバ用の映像処理装置において認識された認識対象物を示す情報を、前記サーバ用の映像処理装置から受信する第2受信ステップと、
     を含むことを特徴とする映像処理装置の制御方法。
    A method for controlling a video processing apparatus for a portable terminal in the video processing system according to claim 17 or 18,
    N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a quantity;
    A first transmission step of encoding the n second local feature amounts and transmitting the encoded second local feature amounts to a video processing apparatus for a server via a network;
    An accuracy adjustment step of controlling to adjust the accuracy of the n second local feature values generated in the second local feature value generation step;
    A first receiving step of receiving an instruction of accuracy adjustment in the accuracy adjustment step;
    A second receiving step of receiving, from the server video processing device, information indicating a recognition object recognized by the server video processing device;
    A control method for a video processing apparatus, comprising:
  21.  請求項17または18に記載の映像処理システムにおける携帯端末用の映像処理装置の制御プログラムであって、
     映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
     前記n個の第2局所特徴量を符号化し、ネットワークを介してサーバ用の映像処理装置に送信する第1送信ステップと、
      前記第2局所特徴量生成ステップにおいて生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整ステップと、
      前記精度調整ステップにおける精度調整の指示を受信する第1受信ステップと、
     前記サーバ用の映像処理装置において認識された認識対象物を示す情報を、前記サーバ用の映像処理装置から受信する第2受信ステップと、
     をコンピュータに実行させることを特徴とする制御プログラム。
    A control program for a video processing device for a portable terminal in the video processing system according to claim 17 or 18,
    N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a quantity;
    A first transmission step of encoding the n second local feature amounts and transmitting the encoded second local feature amounts to a video processing apparatus for a server via a network;
    An accuracy adjustment step of controlling to adjust the accuracy of the n second local feature values generated in the second local feature value generation step;
    A first receiving step of receiving an instruction of accuracy adjustment in the accuracy adjustment step;
    A second receiving step of receiving, from the server video processing device, information indicating a recognition object recognized by the server video processing device;
    A control program for causing a computer to execute.
  22.  請求項17または18に記載の映像処理システムにおけるサーバ用の映像処理装置であって、
     認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段と、
     符号化された前記n個の第2局所特徴量を、携帯端末用の映像処理装置から受信して復号する第3受信手段と、
     前記第1局所特徴量の特徴ベクトルの次元数iと、前記第2局所特徴量の特徴ベクトルの次元数jとの内からより少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識手段と、
     前記認識手段が認識した前記認識対象物を示す情報を、ネットワークを介して前記携帯端末用の映像処理装置に送信する第3送信手段と、
     を備えることを特徴とするサーバ用の映像処理装置。
    A video processing apparatus for a server in the video processing system according to claim 17 or 18,
    Each of m first vectors each consisting of a 1-dimensional to i-dimensional feature vector generated for each of a recognition object and m local regions including each of m feature points of the image of the recognition object. First local feature quantity storage means for storing the local feature quantity in association with each other;
    Third receiving means for receiving and decoding the encoded second local feature quantities from the video processing device for a mobile terminal;
    A smaller dimension number is selected from among the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, and from the feature vector up to the selected dimension number The image in the video when it is determined that the n second local feature amounts correspond to a predetermined ratio or more of the m first local feature amounts composed of feature vectors up to the selected number of dimensions. Recognizing means for recognizing that the object to be recognized exists.
    Third transmission means for transmitting information indicating the recognition object recognized by the recognition means to the video processing device for the portable terminal via a network;
    A video processing apparatus for a server, comprising:
  23.  請求項17または18に記載の映像処理システムにおける、認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えたサーバ用の映像処理装置の制御方法であって、
     符号化された前記n個の第2局所特徴量を、携帯端末用の映像処理装置から受信して復号する第3受信ステップと、
     前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識ステップと、
     前記認識ステップにおいて認識した前記認識対象物を示す情報を、ネットワークを介して前記携帯端末用の映像処理装置に送信する第3送信ステップと、
     を含むことを特徴とするサーバ用の映像処理装置の制御方法。
    19. The video processing system according to claim 17 or 18, wherein the recognition target object and m local regions each including m feature points of an image of the recognition target object are respectively generated from one dimension. A control method for a video processing apparatus for a server, comprising first local feature quantity storage means for storing m first local feature quantities consisting of feature vectors of up to i dimensions in association with each other,
    A third receiving step of receiving and decoding the encoded second local feature quantities from the video processing device for a mobile terminal;
    Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition step for recognizing that a recognition object exists;
    A third transmission step of transmitting information indicating the recognition object recognized in the recognition step to the mobile terminal video processing device via a network;
    A method for controlling a video processing apparatus for a server, comprising:
  24.  請求項17または18に記載の映像処理システムにおける、認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えたサーバ用の映像処理装置の制御プログラムであって、
     符号化された前記n個の第2局所特徴量を、携帯端末用の映像処理装置から受信して復号する第3受信ステップと、
     前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識ステップと、
     前記認識ステップにおいて認識した前記認識対象物を示す情報を、ネットワークを介して前記携帯端末用の映像処理装置に送信する第3送信ステップと、
     をコンピュータに実行させることを特徴とする制御プログラム。
    19. The video processing system according to claim 17 or 18, wherein the recognition target object and m local regions each including m feature points of an image of the recognition target object are respectively generated from one dimension. A control program for a video processing apparatus for a server including first local feature quantity storage means for storing m first local feature quantities composed of feature vectors of up to i dimensions in association with each other,
    A third receiving step of receiving and decoding the encoded second local feature quantities from the video processing device for a mobile terminal;
    Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition step for recognizing that a recognition object exists;
    A third transmission step of transmitting information indicating the recognition object recognized in the recognition step to the mobile terminal video processing device via a network;
    A control program for causing a computer to execute.
  25.  ネットワークを介して接続される携帯端末用の映像処理装置とサーバ用の映像処理装置とを有し、認識対象物と、前記認識対象物の画像のm個の特徴点のそれぞれを含むm個の局所領域のそれぞれについて生成された、それぞれ1次元からi次元までの特徴ベクトルからなるm個の第1局所特徴量とを、対応付けて記憶する第1局所特徴量記憶手段を備えた映像処理システムにおける映像処理方法であって、
     映像中の画像からn個の特徴点を抽出し、前記n個の特徴点のそれぞれを含むn個の局所領域について、それぞれ1次元からj次元までの特徴ベクトルからなるn個の第2局所特徴量を生成する第2局所特徴量生成ステップと、
     前記第1局所特徴量の特徴ベクトルの次元数iおよび前記第2局所特徴量の特徴ベクトルの次元数jのうち、より少ない次元数を選択し、選択した前記次元数までの特徴ベクトルからなる前記n個の第2局所特徴量に、選択した前記次元数までの特徴ベクトルからなる前記m個の第1局所特徴量の所定割合以上が対応すると判定した場合に、前記映像中の前記画像に前記認識対象物が存在すると認識する認識ステップと、
     前記第2局所特徴量生成ステップにおいて生成する前記n個の第2局所特徴量の精度を調整するように制御する精度調整ステップと、
     を含むことを特徴とする映像処理方法。
    A mobile terminal video processing apparatus and a server video processing apparatus connected via a network, each of which includes a recognition target object and m feature points of an image of the recognition target object. Video processing system including first local feature storage means for storing m first local feature amounts each including feature vectors from one dimension to i dimension generated for each local region in association with each other A video processing method in
    N feature points are extracted from the image in the video, and n second local features each consisting of a feature vector from the first dimension to the jth dimension for each of the n local regions including each of the n feature points. A second local feature generation step for generating a quantity;
    Of the dimension number i of the feature vector of the first local feature quantity and the dimension number j of the feature vector of the second local feature quantity, a smaller dimension number is selected, and the feature vectors up to the selected dimension number are selected. When it is determined that the n second local feature quantities correspond to a predetermined ratio or more of the m first local feature quantities composed of feature vectors up to the selected number of dimensions, the image in the video corresponds to the image. A recognition step for recognizing that a recognition object exists;
    An accuracy adjustment step of controlling to adjust the accuracy of the n second local feature values generated in the second local feature value generation step;
    A video processing method comprising:
PCT/JP2012/081541 2011-12-15 2012-12-05 Video processing system, video processing method, video processing device for portable terminal or for server and method for controlling and program for controlling same WO2013089004A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2011273939A JP2015062089A (en) 2011-12-15 2011-12-15 Video processing system, video processing method, video processing device for portable terminal or server, and control method and control program of the same
JP2011-273939 2011-12-15

Publications (1)

Publication Number Publication Date
WO2013089004A1 true WO2013089004A1 (en) 2013-06-20

Family

ID=48612458

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/081541 WO2013089004A1 (en) 2011-12-15 2012-12-05 Video processing system, video processing method, video processing device for portable terminal or for server and method for controlling and program for controlling same

Country Status (2)

Country Link
JP (1) JP2015062089A (en)
WO (1) WO2013089004A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101813797B1 (en) 2016-07-15 2017-12-29 경희대학교 산학협력단 Method and apparatus for detecting IMAGE FEATURE POINT based on corner edge pattern
KR102651126B1 (en) * 2016-11-28 2024-03-26 삼성전자주식회사 Graphic processing apparatus and method for processing texture in graphics pipeline
JP6575628B1 (en) * 2018-03-30 2019-09-18 日本電気株式会社 Information processing apparatus, information processing system, control method, and program
WO2023148964A1 (en) * 2022-02-07 2023-08-10 日本電気株式会社 Comparison device, comparison method, and program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010520684A (en) * 2007-03-05 2010-06-10 フォトネーション ビジョン リミテッド Face search and detection in digital image capture device
JP2011008507A (en) * 2009-06-25 2011-01-13 Kddi Corp Image retrieval method and system
JP2011198130A (en) * 2010-03-19 2011-10-06 Fujitsu Ltd Image processing apparatus, and image processing program
JP2011242861A (en) * 2010-05-14 2011-12-01 Ntt Docomo Inc Object recognition device, object recognition system and object recognition method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010520684A (en) * 2007-03-05 2010-06-10 フォトネーション ビジョン リミテッド Face search and detection in digital image capture device
JP2011008507A (en) * 2009-06-25 2011-01-13 Kddi Corp Image retrieval method and system
JP2011198130A (en) * 2010-03-19 2011-10-06 Fujitsu Ltd Image processing apparatus, and image processing program
JP2011242861A (en) * 2010-05-14 2011-12-01 Ntt Docomo Inc Object recognition device, object recognition system and object recognition method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HIRONOBU FUJIYOSHI: "Gradient-Based Feature Extraction -SIFT and HOG", IEICE TECHNICAL REPORT, vol. 107, no. 206, 27 August 2007 (2007-08-27), pages 211 - 224 *

Also Published As

Publication number Publication date
JP2015062089A (en) 2015-04-02

Similar Documents

Publication Publication Date Title
US10438084B2 (en) Article management system, information processing apparatus, and control method and control program of information processing apparatus
EP2782067B1 (en) Local feature amount extraction device, local feature amount extraction method, and program
JP5685390B2 (en) Object recognition device, object recognition system, and object recognition method
JP6226187B2 (en) Information processing system, information processing method, information processing apparatus and control method and control program thereof, communication terminal and control method and control program thereof
JP6048741B2 (en) Article management system, article management method, information processing apparatus, control method thereof, and control program
KR101611778B1 (en) Local feature descriptor extracting apparatus, method for extracting local feature descriptor, and computer-readable recording medium recording a program
JP6168303B2 (en) Information processing system, information processing method, information processing apparatus and control method and control program thereof, communication terminal and control method and control program thereof
JP6168355B2 (en) Information processing system, information processing method, communication terminal, control method thereof, and control program
JP2014170431A (en) Information processing system, information processing apparatus, control method thereof, and control program
WO2013089004A1 (en) Video processing system, video processing method, video processing device for portable terminal or for server and method for controlling and program for controlling same
WO2012046426A1 (en) Object detection device, object detection method, and object detection program
WO2022009301A1 (en) Image processing device, image processing method, and program
JP6153086B2 (en) Video processing system, video processing method, video processing apparatus for portable terminal or server, and control method and control program therefor
WO2013115092A1 (en) Video processing system, video processing method, video processing device, and control method and control program therefor
WO2013115203A1 (en) Information processing system, information processing method, information processing device, and control method and control program therefor, and communication terminal, and control method and control program therefor
CN112465517A (en) Anti-counterfeiting verification method and device and computer readable storage medium
US9826156B1 (en) Determining camera auto-focus settings
AU2020272930A1 (en) Biometrics authentication device and biometrics authentication method for authenticating a person with reduced computational complexity
JP6131859B2 (en) Information processing system, information processing method, information processing apparatus and control method and control program thereof, communication terminal and control method and control program thereof
US20130142430A1 (en) Information processing apparatus and information processing method
Pietkiewicz Application of fusion of two classifiers based on principal component analysis method and time series comparison to recognize maritime objects upon FLIR images
JP6379478B2 (en) Image processing apparatus, electronic camera, and image processing program
Skoczylas et al. Multirotor micro air vehicle autonomous landing system based on image markers recognition
JP6041156B2 (en) Information processing system, information processing method, and information processing program
WO2023084778A1 (en) Image processing device, image processing method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12856756

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12856756

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP