WO2011092793A1 - データ処理装置 - Google Patents
データ処理装置 Download PDFInfo
- Publication number
- WO2011092793A1 WO2011092793A1 PCT/JP2010/007518 JP2010007518W WO2011092793A1 WO 2011092793 A1 WO2011092793 A1 WO 2011092793A1 JP 2010007518 W JP2010007518 W JP 2010007518W WO 2011092793 A1 WO2011092793 A1 WO 2011092793A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- feature
- model
- unidentified
- classification
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5838—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
- G06V10/464—Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
Definitions
- the present invention relates to an image processing technique for automatically classifying a plurality of images into predetermined categories.
- image indexing techniques for automatically tagging images.
- a specific object such as a technique for estimating an event using time or place information, a face detection technique, or the like, or a similar image is detected based on the similarity of color or texture information, and is used for retrieval.
- an image indexing technique for recognizing or classifying general objects has been proposed.
- an object is modeled on the basis of a basic feature amount or a local feature amount group in an image such as a luminance value and is recognized by matching with a feature amount detected from the image.
- This recognition technology is generally used as many computer vision applications.
- a feature vector representing an input image is generated, and the input image is based on a combination of output results processed using a plurality of different classifiers.
- a technique for providing a device for automatically classifying the images see, for example, Patent Document 1). This technique makes it possible to calculate the characteristics of an object from various viewpoints at higher speed.
- the above-described image indexing technique is premised on defining and classifying a valid model for a general object, not a classification specialized for user data. Therefore, for example, in the configuration disclosed in Patent Document 1, the feature vector calculated from the image is classified based on the weighted combination output of a plurality of classifiers, so that it is effective for an object within a definable range. Although it works, it does not have enough processing power to cover all common objects, and it cannot always detect undefined objects or objects that are important to the user.
- the conventional technique does not always classify objects specialized for user data, so that there is a problem that the classification result is not satisfied for the user.
- the present invention provides a data processing device, an image processing method, a program, and an integrated circuit that allow a user to be satisfied with a classification result even when an object specialized in user data exists.
- the purpose is to do.
- the present invention is a data processing apparatus, comprising a combination of detection frequencies of a plurality of feature amounts, and storing means for storing a plurality of model data used for object classification, Classification means for specifying whether or not an object included in the data can be classified based on the plurality of model data and the detection frequency of each of the two or more feature amounts detected in the classification target data, and the plurality of classification target data If there are a plurality of unidentified data identified as being object-classified after performing the processing by the classification means, a feature quantity that is equal to or greater than a certain number of unidentified data with the same frequency of detection is detected.
- the data processing device uses two or more unidentified data to identify two or more feature quantities that have a certain number or more of unidentified data that are detected at the same frequency. Such identification is possible because there are many unidentified data including the same object in a plurality of unidentified data. Therefore, by creating new model data from two or more specified feature amounts, it is possible to classify unidentified data including the same object from a plurality of unidentified data.
- the specifying unit obtains a detection frequency at which a feature quantity similar to the feature quantity is detected for each of the plurality of feature quantities, and is obtained from each of the unidentified data. From the detected frequency, a distribution degree of the detection frequency is generated for each of the plurality of feature quantities. From each of the distribution degrees, two or more feature quantities that have a predetermined number or more of unidentified data having the same detected frequency are generated. It may be specified.
- the data processing apparatus can easily identify two or more feature quantities that have a certain number or more of unidentified data having the same detected frequency based on the distribution degree of the detection frequency.
- the specifying unit generates a plurality of data groups by grouping the plurality of unidentified data for each section according to a predetermined rule. For each data group, acquisition of a detection frequency, generation of a distribution degree, the model creation unit may create new model data for each data group.
- the data processing device identifies a feature quantity that has a certain number or more of unidentified data having the same frequency to be detected for each section according to a predetermined rule. Classification is possible.
- each of the unidentified data is associated with time information indicating the date and time when the unidentified data was created, and the section according to the predetermined rule is divided into fixed time zones. It is a period, and the specifying unit may generate the plurality of data groups by grouping the plurality of unidentified data for each period divided by a certain time period.
- the data processing device specifies two or more feature quantities for each section divided at certain time zones.
- data including the same object is often created in the same time zone. Therefore, the objects included in each section can be easily classified by dividing the data into such sections.
- the model creation means is configured such that one model data generated in one data group is temporally compared with other model data generated in another data group.
- the one model data and the other model data may be associated with each other as model data having temporal variability.
- the data processing apparatus associates these model data as having temporal variability.
- the data classified as follows can include the same object.
- the model creating means is configured to determine a difference between the degree of similarity change of the first feature amount that characterizes the one model data and the degree of similarity change of the second feature amount that characterizes the other model data. If there is a proportional relationship, it may be determined that there is a correlation.
- the data processing apparatus determines that there is a correlation when there is a proportional relationship in the degree of change in the similarity of each model data generated in different data groups. can do.
- the model creation means is configured such that one model data generated in one data group is the same as another model data generated in another data group, or the one model in the remaining data group. If the same data appears periodically, only the one model data may be stored in the storage means.
- the data processing apparatus stores only the one model data.
- duplication of model data to be stored can be prevented.
- the specifying unit acquires a calculation frequency for each feature amount using all the unidentified data, specifies one or more feature amounts whose acquired calculation frequency is equal to or higher than a predetermined frequency, and For each piece of data that has not been identified, a detection frequency for each of the acquired one or more feature quantities may be acquired, and the distribution degree may be generated from the one or more detection frequencies acquired for each piece of data.
- the data processing apparatus specifies one or more feature amounts whose acquired calculation frequency is equal to or higher than a predetermined frequency among the calculation frequencies acquired for each of the plurality of feature amounts, and the specified one or more Since the detection frequency for each of the feature quantities is acquired, the processing load is reduced as compared with the case of acquiring the detection frequencies for all the feature quantities.
- the data processing device further includes display means for displaying the plurality of unidentified data, and instruction accepting means for accepting designation of two or more data from the displayed unidentified data from the user.
- the means is acquired from the detection frequency for each of the plurality of feature amounts acquired from each of the two or more unidentified data received by the instruction receiving unit, or from the remaining data excluding the two or more data.
- the distribution degree for each of the plurality of feature amounts may be created from the detection frequency for each of the plurality of feature amounts.
- the data processing apparatus acquires the detection frequency from two or more data received from the user among a predetermined number or more of the unidentified data, and creates the distribution degree, so that the user's intention is reflected.
- Model data can be created.
- the instruction receiving means may receive the instruction when new model data has not yet been created.
- the data processing apparatus accepts designation of data for creating model data from the user, so that more reliable model data can be created. .
- the specifying unit determines whether the creation date / time is one of a plurality of periods based on the creation date / time of the unidentified data.
- the two or more unidentified data may be grouped so as to belong to the group, and the distribution degree may be created for each group.
- the model creating means can create model data for each period.
- the data processing apparatus further includes display means for displaying a plurality of data that are considered to have an object identified by the new model data created by the model data creation means, and a plurality of displayed data Instruction receiving means for receiving designation of two or more data from the user from the user, and the specifying means for each of the plurality of feature amounts acquired from each of the two or more data received by the instruction receiving means
- a distribution degree different from the distribution degree is created for each of the plurality of feature amounts from a detection frequency or a detection frequency for each of the plurality of feature amounts acquired from each of the remaining data excluding the two or more data.
- the model data creation means may create model data different from the new model data based on the different distribution degrees.
- the data processing apparatus creates model data again from two or more data received from the user from a plurality of data that are considered to have an object identified by the created new model data.
- highly accurate model data can be created again Can do.
- the data is an image
- the specifying unit generates, for each image in which the object is not identified, a local feature group using the similarity of one or more feature amounts detected in the image.
- the detection frequency may be acquired from each local feature group.
- the data processing apparatus creates new model data for an image for which no object has been identified. Therefore, after the new model data is created, the data processing apparatus is specialized for the new model data. Images can be classified.
- FIG. 2 is a block diagram showing a configuration of a data processing device 100.
- FIG. It is a figure which shows an example of the SIFT feature-value extracted in a certain image. It is a figure which shows an example of the detection frequency of each VisualWord extracted in a certain image. It is a figure which shows an example of the similarity distribution produced from the detection number of the detection frequency value for every VisualWord extracted from all the AV data which were not classified. It is a figure which shows an example of the data structure of 1st reference
- FIG. 3 is a block diagram illustrating a configuration of a local model creation unit 20.
- FIG. It is a figure which shows an example of the image group which exists for every area.
- FIG. It is a flowchart which shows the process which extracts area information.
- the present embodiment relates to a mechanism for generating a local classification model and automatically tagging an AV data group with high accuracy in a data processing apparatus 100 for automatically organizing a local AV (Audio Video) data group in a home or the like.
- AV data is a general term for photographic image data, moving image data, music data, and the like.
- FIG. 1 is a block diagram showing the configuration of the data processing apparatus 100.
- a data processing apparatus 100 includes a local DB (DataBase) 1, a preprocessing unit 2, a feature amount extraction unit 3, a classification unit 4, a basic dictionary DB (database) 5, a search index DB (database) 6, An unclassified feature DB (database) 7, an identical feature extraction unit 8, a local model creation unit 9, a local dictionary DB (database) 10, and a reference parameter DB (database) 11 are configured.
- DataBase DataBase
- preprocessing unit 2 includes a feature amount extraction unit 3, a classification unit 4, a basic dictionary DB (database) 5, a search index DB (database) 6, An unclassified feature DB (database) 7, an identical feature extraction unit 8, a local model creation unit 9, a local dictionary DB (database) 10, and a reference parameter DB (database) 11 are configured.
- each DB is specifically a storage device such as a large-capacity media disk such as HDD (Hard Disk Drive) or DVD (Digital Versatile Disk) or semiconductor memory.
- HDD Hard Disk Drive
- DVD Digital Versatile Disk
- the local DB 1 stores AV (Audio Video) data such as photographic image data, moving image data, and music data as file data in the home.
- AV Audio Video
- the preprocessing unit 2 is a process performed before extracting the feature amount of AV data. Specifically, the pre-processing unit 2 performs a normalization process of AV data, a process of detecting a background or an object area by an image area dividing process, and a scene by calculating a power change of audio in order to facilitate feature amount extraction. A process for detecting a section is performed.
- the feature amount extraction unit 3 is a process for extracting feature amounts of AV data. More specifically, when the AV data is image data, the feature amount extraction unit 3 describes a feature amount description that expresses a region feature amount from a low-order feature amount such as an edge, a color, or a texture around a characteristic point.
- the feature quantity extraction unit 3 When the AV data is voice data, the feature quantity extraction unit 3 has feature quantities such as voice power, zero cross, spectrum-related feature quantity, cepstrum-related feature quantity, and chroma vector. Spectrum-related and cepstrum-related feature quantities include spectrum roll-off, MFCC (Mel Frequency Cepstrum Coefficient), and the like.
- MPEG7 Motion Picture Experts Group Phase 7
- MPEG7-Audio Features include Audio Power, Audio Spectrum Envelop, Audio Spectrum Centroid, Harmonic Spectral Deviation, Harmonic Spectral Spread, and the like. Details are described in “MPEG7 AUDIO AND BEYOND” (John Wiley & Sons Ltd, 2005) written by Hyung-Good Kim et al.
- the function of the feature amount extraction unit 3 when the AV data is image data will be described below.
- the feature amount extraction unit 3 has in advance a dictionary in which a plurality of VisualWords that are reference feature amounts for extracting feature amounts are stored.
- the feature amount extraction unit 3 extracts one or more feature points in the image, and calculates a SIFT feature amount from the extracted feature points.
- the feature amount extraction unit 3 generates one or more BoF (Bag Of Features) using the calculated all SIFT feature amounts and a plurality of Visual Words stored in the dictionary.
- the feature quantity extraction unit 3 outputs the generated one or more BoFs to the classification unit 4.
- VisualWord is calculated as a representative central model in various SIFT feature values, and represents a partial part or the whole of a general object shape such as a person, a house, or an umbrella. Since VisualWord, feature point extraction, SIFT feature value calculation, and BoF generation are well-known techniques, descriptions thereof are omitted here.
- the classification unit 4 performs matching processing with existing model data using the feature amount extracted from the AV data, and performs determination processing as to which model the input data is compatible with.
- a classifier that is a machine learning technique is used for the determination process.
- general discriminators there are GMM (Gaussian mixture model) and SVM (Support Vector Machine).
- Classification reference data prepared for each category to be classified in advance in the discriminator for example, model information stored in a basic dictionary DB 5 and a local dictionary DB 10 to be described later is set, and sample input information (here, 1 of AV data) is set.
- the classification item is discriminated using one or more BoF), and the likelihood as the discrimination reliability is calculated.
- the likelihood generally means that the greater the value, the higher the reliability.
- the classifier 4 When the classifier 4 matches the input AV data with the one model in the classifier 4 in the classifier, the classifier 4 associates the class information of the matched one model with the AV data as the input data (tag) Append) and accumulate in the search index DB6.
- the classifying unit 4 associates (tags) with the unclassified information related to the AV data and accumulates it in the search index DB 6.
- the unclassified information is an identifier for identifying AV data. For example, when the AV data is an image, it is an image number associated with the image.
- Basic dictionary DB5 In the basic dictionary DB 5, categories to be classified by the classification unit 4 are defined in advance, and model information of each category necessary for classifying the category is accumulated according to the feature amount used.
- Search index DB6 When the classification unit 4 matches the input AV data with the one model, the search index DB 6 stores the classification information of the matched model in association with the input data.
- Unclassified feature DB7 The unclassified feature DB 7 stores unclassified information of AV data that could not be classified.
- Same feature extraction unit 8 Based on the unclassified information accumulated in the unclassified feature DB 7, the same feature extraction unit 8 calculates the similarity of the feature amount, the appearance frequency of the feature amount, and the like from a plurality of unclassified AV data. The same feature that is estimated to be obtained from the same object when there is a bias is extracted.
- the same feature extraction unit 8 is started, for example, when the classification unit 4 performs classification processing.
- the same feature extraction unit 8 determines that the number of uncategorized information stored in the unclassified feature DB 7 becomes a predetermined number or more necessary for starting the extraction process of the same feature. Determine whether or not. For example, the determination is performed using a first reference parameter table T100 stored in a reference parameter DB 11 described later.
- the same feature extraction unit 8 out of all feature amounts extracted from the AV data indicated by the unclassified information from each of the unclassified information stored in the unclassified feature DB 7.
- a reference feature (VisualWord) having a calculation frequency equal to or higher than a certain value is extracted.
- the calculation frequency F (x) of the type x having the reference feature amount is, for example, the total AV data number V all , the AV data number in which one or more feature amounts x are calculated V x, cal , and the feature amount x.
- the average number of feature quantities x calculated from each AV data in which V is present is V x, one , and is calculated according to Equation 1 below.
- the same feature extraction unit 8 uses the calculated value of F (x) and the second reference parameter table T110 stored in the reference parameter DB 11 described later, and outputs only one reference feature amount having a high calculation frequency. Extract above.
- the same feature extraction unit 8 calculates the similarity of the feature amount with respect to each reference feature amount for one AV data. For example, when the reference feature amount is VisualWord, the distance to each VisualWord model is calculated as the similarity. Specifically, this will be described with reference to FIGS.
- FIG. 2 shows SIFT feature values extracted from a photograph showing a person, a house, and an umbrella.
- SIFT feature values characteristic points (characteristic points shown in the figure) in one image are detected, and area information (scale in the figure) is calculated as a SIFT descriptor.
- the rotation indicates the direction in which the rotation direction of the feature region (scale) of the feature point is captured.
- the feature points, scale, and rotation are the same as those defined in the prior art, and a detailed description thereof will be omitted.
- Euclidean distance, Mahalanobis distance, Minkowski distance, etc. are calculated based on multivariate data of the same feature group, and the closeness of the distance is used as the similarity.
- the Euclidean square distance exists as a basic distance, and the dissimilarity d ij is an amount representing dissimilarity between the individual i and the individual j when the observed value of the feature of the n individual m variable is obtained. It can be calculated by Equation 2.
- the same feature extraction unit 8 calculates all SIFT feature amounts in the AV data for every AV data (image) that has not been classified. Then, the same feature extraction unit 8 calculates, as the detection frequency, the number of detected SIFT feature values similar to each VisualWord as shown in FIG. 3 for all AV data that has not been classified.
- the same feature extraction unit 8 calculates the reference feature amounts in all the AV data not classified from the detection frequencies for each of one or more reference feature amounts extracted as the reference feature amount having a high frequency. 4 is created, and a peak value in the distribution is calculated.
- the peak value can be calculated as, for example, a difference obtained by subtracting the nearest minimum value from the maximum value in the similarity distribution.
- the same feature extraction unit 8 calculates the detection frequency as shown in FIG. 3 for every image data, and calculates the number of detections of each detection frequency value, so that the vertical axis indicates the detection frequency and the horizontal axis indicates the similarity of the detection number. Create a degree distribution.
- the peak value is calculated by calculating the maximum value and the minimum value from the increase / decrease in the number of detections in the similarity distribution, and taking the difference using the number of detections of the minimum value having the closest detection frequency from each maximum value.
- the same feature extraction unit 8 determines and extracts a reference feature amount having a peak value that is estimated that the same object property can be determined using the peak value.
- the same feature extraction unit 8 outputs the extracted reference feature amount to the local model creation unit 9 as the same feature.
- a second reference parameter table T110 which will be described later, is used as the peak value determination criterion.
- the local model creation unit 9 defines a category of an object peculiar to a certain local AV data group using the same feature extracted by the same feature extraction unit 8, and calculates the model information.
- the local model creation unit 9 uses the same feature extracted by the same feature extraction unit 8 and uses a class creation method such as a k-means method, among a plurality of AV data not classified.
- a category is defined and a model is generated from a group of similar data composed of one or more AV data in which the same feature is detected. Since the class creation method such as the k-means method is a known technique, a description thereof is omitted here.
- the storage DB is a storage device such as a large-capacity media disk such as an HDD or a DVD or a semiconductor memory.
- Reference parameter DB11 stores a first reference parameter table T100 and a second reference parameter table T110.
- the first reference parameter table T100 indicates a reference for starting processing in the same feature extraction unit 8.
- the second reference parameter table T110 indicates a reference for extracting a reference feature amount having a high frequency and a reference feature amount based on a peak value.
- First reference parameter table T100 As shown in FIG. 5, the first reference parameter table T100 includes one or more sets of data types and classification start amount reference parameters.
- the data type indicates the type of data to be classified. Specifically, the data type is a still image, a moving image, audio, or the like.
- the classification start amount reference parameter further includes a reference 1, a reference 2, a reference 3, and so on. Reference 1, reference 2, reference 3,... Indicate the number (amount) of data that is the start of classification of the corresponding data type.
- the same feature extraction unit 8 determines that any one of criteria 1, criteria 2,... Is satisfied in a still image that has not been classified. Start the classification.
- Second reference parameter table T110 As shown in FIG. 6, the second reference parameter table T110 is composed of one or more sets of data types and various reference parameters.
- the data type indicates the type of data to be extracted. Specifically, the data type is a still image, a moving image, audio, or the like.
- the various reference parameters are composed of a frequency reference, a peak value reference 1, a peak value reference 2, and so on.
- the frequency reference is used when extracting a reference feature amount having a high calculation frequency in the corresponding data type.
- the peak value reference 1, the peak value reference 2,... are used when determining a reference feature amount that is estimated to be able to determine the same object property in the corresponding data type.
- the same feature extraction unit 8 extracts one or more reference feature amounts with the frequency reference (0.35) or more when the target for extracting the reference feature amount with a high calculation frequency is a still image. .
- the same feature extraction unit 8 satisfies any one of the peak value criterion 1, the peak value criterion 2,.
- the corresponding reference feature amount determines that the same object property can be determined.
- FIG. 7 is a flowchart showing the same feature extraction process when the data processing apparatus 100 extracts the same feature from local data.
- the same feature extraction process is started, for example, when a classification process performed by the classification unit 4 is performed.
- the same feature extraction unit 8 determines whether or not the number of unclassified information stored in the unclassified feature DB 7 is a certain number or more necessary for starting the process (step) S1). For example, when the classification target is a still image, the same feature extraction unit 8 satisfies any of the criteria 1, criteria 2, criteria 3,... In the first criteria parameter table T100 shown in FIG. Determine if.
- step S1 When it is determined that the number is not equal to or greater than a certain number necessary for starting the process (“No” in step S1), the same feature extraction unit 8 ends the same feature extraction process.
- the same feature extraction unit 8 determines the AV data based on the value of F (x) in Equation 1.
- the standard feature quantity whose calculation frequency is a certain level or higher is extracted from all the feature quantities extracted from (Step S2).
- the same feature extraction unit 8 calculates the similarity between the representative feature value and the reference feature value calculated in all AV data (step S3). Specifically, when the reference feature amount is VisualWord, the same feature extraction unit 8 calculates the distance to each VisualWord model as a similarity (SIFT feature amount). The same feature extraction unit 8 calculates, as the detection frequency, the number detected as the SIFT feature quantity for each reference feature quantity as shown in FIG.
- the same feature extraction unit 8 calculates the reference feature quantities in all the AV data not classified from the detection frequency in FIG. The similarity distribution shown is created, and the peak value in the distribution is calculated (step S4).
- the same feature extraction unit 8 uses the peak value to determine and extract a reference feature amount having a peak value that is considered to be able to determine the same object property (step S5), and to the local model creation unit 9 as the same feature Output.
- the determination of the peak value is whether or not any one of the peak value reference 1, the peak value reference 2,.
- the data processing apparatus 100 uses the AV data (images) that could not be classified, and is considered to be able to determine the same object property based on the feature amounts included in these images.
- Model data is generated from a reference feature value having a value.
- the same feature is extracted from all AV data that has not been classified, but the present invention is not limited to this. Instead of extracting the same feature from all the AV data that has not been classified, the same feature may be extracted from AV data for a certain amount or every predetermined time interval. The procedure of the same feature extraction process in this case will be described with reference to the flowchart shown in FIG.
- the same feature extraction unit 8a starts processing the number of unclassified information stored in the unclassified feature DB 7. It is determined whether the number is more than a certain number necessary for the above. If it is determined that the number is not more than a certain number necessary for starting the process, the same feature extraction unit 8a ends the same feature extraction process.
- the same feature extraction unit 8a inputs a feature amount detected for each fixed time period from the unclassified feature DB (step S11). .
- the same feature extraction unit 8a calculates BoF, which is a local feature amount, in input units for each fixed time period (step S12). Next, the same feature extraction unit 8a calculates the VisualWord detection frequency using the local feature amount calculated in step S12 (step S13).
- the same feature extraction unit 8a uses the calculated detection frequency to create a histogram of the number of detections for each Visual Word as shown in FIG. 3 (step S14).
- the same feature extraction unit 8a determines whether or not a peak value exists in the histogram of the number of detections in each Visual Word, determines a reference feature amount having a peak value equal to or higher than the peak value reference, and determines the same feature in the certain time interval. Extract (step S15).
- Identical feature extraction unit 8a determines whether or not the processing has been completed for all time period data (step S16). If it is determined that the process is complete (“Yes” in step S16), the same feature extraction unit 8a ends the process. If it is determined that the process has not been completed (“No” in step S16), the same feature extraction unit 8a returns to step S11 and repeats the process until the process is completed in all time zones.
- the extraction target is an image group divided in a certain time zone, but is not limited to this. Any type can be used as long as it is a unit that can divide an image, such as a certain number, a certain region, or every certain event.
- the data processing apparatus is difficult to model in the entire feature amount space, for example, like a pet dog or a special property kept by a family. Can be extracted by making a limited feature amount space consisting of only the same features. Since the local model created in this manner is a model specialized for local AV data, the model can be classified with high accuracy.
- the feature model to be used is limited to only feature quantities having the same subjectivity, and then a local model is created in the limited space.
- the unclassified feature DB As a unit using uncategorized information stored in the unclassified feature DB, it is conceivable to calculate the same feature using all unclassified information as in the first embodiment. For example, it may be calculated for each predetermined number of images, or may be calculated by dividing unclassified information in a certain event unit, time unit, or place unit.
- a method for extracting the same feature amount in an image a method for determining whether or not the same object is included by matching corresponding points at feature points, and a method for using the overall similarity of color histograms and edge amount distributions As long as the method uses a feature quantity that can extract the same object existing in the database, the type is not limited.
- the second embodiment is not an optimal model for all data, considering not only the amount of information and the degree of similarity, but also time continuity as unique information of local AV data when creating a local model.
- the present invention relates to a method for creating an optimal model in time series.
- a method for generating a local model more suitable for a user-specific local DB all AV data is handled uniformly, and a model specific to the local DB is not generated.
- a method that generates a model specific to the local DB is also used in consideration of sequential transitions.
- a method for generating a local model based on an analysis result of image information assuming mainly an image as data used in the present embodiment will be described in detail.
- the functional configuration of the local model creation unit is different from that of the first embodiment. Since other components are the same as those in the first embodiment, the local model creation unit will be described here.
- FIG. 10 is a functional block diagram of the local model creation unit 20.
- the local model creation unit 20 includes a section information extraction unit 21, a section model creation unit 22, and a model continuity determination unit 23.
- Section information extraction unit 21 extracts section information of a certain data unit, time unit, place unit, or event unit as a piece of local AV data. For example, it is possible to automatically calculate certain continuous shooting section information by using shooting time and GPS (Global Positioning System) information included in EXIF (Exchangeable Image File Format) information as image information. . It is also conceivable to divide the data by folder unit information created by the user and extract the section information.
- GPS Global Positioning System
- EXIF Exchangeable Image File Format
- the section information extraction unit 21 calculates certain continuous shooting section information based on the shooting time included in the EXIF information. Specifically, the section information extraction unit 21 extracts time information of all the images to be processed stored in the local DB 1 from the EXIF information from the contents of the unclassified information stored in the unclassified feature DB 7. Next, the section information extraction unit 21 calculates the number of images shot for, for example, a section every hour based on the date and time of the first shooting according to the obtained time information. Then, the section information extraction unit 21 calculates the cumulative number of images by adding the number of images for the first hour being processed based on the number of images calculated every hour. When a section in which the total number of images is 500 or more and the number of images to be added is 0 continues for 3 hours or more, the section information extraction unit 21 extracts the section and sets the total number of images to zero.
- the section information extraction unit 21 performs the above operation on all images to be processed.
- Section model creation unit 22 creates a local model for each section extracted by the section information extraction unit 21 using the feature amount group calculated by the same feature extraction unit 8.
- a specific model generation method can be created by the same method as in the first embodiment.
- the similarity can be calculated by calculating the distance between the models from the multivariate feature quantity, as in the first embodiment.
- Model continuity determination unit 23 calculates how long the local model created for each section is continuously created, and determines whether there is temporal continuity in the local DB 1. judge. The model continuity determination unit 23 performs tagging sequentially from a local model with high temporal continuity, that is, a local model with high appearance frequency.
- the model continuity determination unit 23 calculates whether or not there is a partial change in the feature amount as aging, and determines whether or not there is a certain change tendency. For a plurality of local models having a certain change tendency, the model continuity determining unit 23 performs association (same tagging) to the effect that they have the same characteristics.
- FIG. 11 is a diagram illustrating an example of an image group existing for each section in the image.
- the local DB 1 there is an image group taken in time series by a specific user as shown in FIG. 11 with the horizontal axis as the time axis and the hourly image amount as the vertical axis. To do.
- section information extraction unit 21 The operation of the section information extraction unit 21 in this case will be described using the flowchart shown in FIG.
- the section information extraction unit 21 extracts time information of all the images to be processed stored in the local DB 1 from the EXIF information from the contents of the unclassified information stored in the unclassified feature DB 7 (step S21). .
- the section information extraction unit 21 calculates the number of images photographed for, for example, a section every hour based on the date and time when the photograph was first photographed according to the obtained time information (step S22).
- the section information extraction unit 21 calculates the cumulative number of images by adding the number of images for the first hour being processed based on the number of images calculated for each time (step S23).
- the section information extraction unit 21 extracts the section and sets the cumulative number of images to 0 (step S24). ).
- the section information extraction unit 21 determines whether processing has been completed for all the processing target images (step S25). If it is determined that it has been completed (“Yes” in step S25), the section information extraction unit 21 completes the section information extraction process. If it is determined that the processing has not been completed (“No” in step S25), the section information extraction unit 21 returns to step S23 and repeats the processing until the processing for all images is completed.
- the section information extraction unit 21 can extract sections 1 to 6 as shown in FIG. 11, for example.
- the section model creation unit 22 generates a local model for all the sections (sections 1 to 6) extracted by the section information extraction unit 21.
- the section model creation unit 22 since 6 sections from section 1 to section 6 are extracted, for example, as shown in FIG. 13, all local models A, B, C, D, E, F and G are generated.
- the model continuity determination unit 23 determines whether the created local model has temporal continuity, periodicity, or aging. In the example of FIG. 13, the overall similarity of the local model for each section is calculated, and the same label is assigned to the local model having a certain similarity or more. As a result, the local model C can be extracted as a model having high temporal continuity, that is, a high appearance frequency. Therefore, the model continuity determination unit 23 can preferentially tag as a model having a higher local attribute than a local model (for example, the local models B and G) that exists in the short term.
- a local model for example, the local models B and G
- the model continuity determination unit 23 detects and models aging. For example, when the model continuity determination unit 23 has a proportional relationship with the degree of change in local similarity as shown in FIG. 14, the local models A, E, and D are local models having a certain secular change. And preferentially tag as a model with a high local attribute, and associate the same as an object. Specifically, aging variability as the degree of change of face and body when a child grows, variability due to deterioration or breakage of an object, shape variability of an object such as a car due to a fashion, etc. are extracted.
- the model continuity determination unit 23 calculates a degree of change in similarity of local feature amounts between local models extracted as models with high similarity, although the model is different when considering the overall features.
- these models can be extracted as one model having secular change.
- the data processing apparatus determines the continuity in the local DB 1 of various created local models, thereby enabling a short-term or one-time effective local model. It is possible to create a model that is effective in the long term or periodically, or to create a local model that adaptively changes over time.
- a temporal chunk is extracted as a connection between AV data and limited to AV data for each section.
- Local discriminability that takes into account the tendency of AV data acquired by the user by creating a local model and creating a local model with higher local attributes by determining inter-model continuity for each section Therefore, it is possible to accurately define a category of an object peculiar to the local AV data group and extract the model information.
- the section information extraction unit 21 extracts time information of all images to be processed from EXIF information, but may extract it from data generation time information.
- the same feature extraction unit 8 is provided, but a feature amount is extracted from a configuration using a general feature amount calculated by the feature amount extraction unit 3 or all AV data. It is good also as a structure to use.
- the present invention when extracting the same feature, creating a local model, or displaying a result classified by the created local model, feedback from the user (hereinafter referred to as user interaction) is taken into account.
- the present invention relates to a method for correcting the same feature and a local model that have been automatically generated, or for generating the same feature or a local model that could not be automatically generated.
- FIG. 15 is a block diagram showing a configuration when a user interaction input unit 30 having a user interaction input function is added to the configuration of FIG. 1 in the present embodiment.
- the user interaction input unit 30 is added to the AV data held by the user or the output result calculated by the data processing device in order to improve the accuracy of the contents processed by the same feature extraction unit 8 or the local model creation unit 9 This is a function to input typical information.
- the user interaction input unit 30 displays an image G100 shown in FIG. 16, an image G200 shown in FIG. 17, and an image G300 shown in FIG. 18, and accepts an instruction from the user.
- the screen for displaying an image has a touch panel function.
- Image G100 An image G100 shown in FIG. 16 shows an example of being the same object and inputting tag information into the image.
- An image G100 shown in FIG. 16 includes a library G101 indicating the storage location of the displayed image, unclassified images I100, I101, I102, I103,..., Buttons B100, B101, B102, B103, and a scroll bar SB100. Is included.
- the library to be displayed is surrounded by a thick frame so that the user can know the storage location of each image being displayed.
- the A01 library under the album 1 is surrounded by a thick frame, the user can recognize at a glance that the storage destination of each image being displayed is A01.
- each of the displayed images I100, I101, I102, I103,... Is an image that is included in the display target library and has not been classified, and a check is placed below each displayed image. Boxes C100, C101, C102, C103,... Are displayed. The user can specify an image to be processed by checking a check box for one or more images among displayed images. For example, in FIG. 16, in addition to the images I102 and I103, three images (a total of five images) are designated.
- the button B100 is used to instruct that the same object is included in a plurality of images designated for processing.
- the same feature extraction unit 8 extracts a feature amount related to the same object from a plurality of designated images.
- the subsequent operations of the same feature extraction unit 8 and the local model creation unit 9 are the same as those in the first embodiment, and a description thereof will be omitted here.
- the button B101 instructs to associate tag information with one or more images designated for processing.
- this button B101 is pressed by a user operation, the display screen changes from the image G100 to the image G200.
- the button B102 is for designating a region for extracting a feature amount for one or more images designated for processing. After pressing the button B102, the user designates an area to be extracted by operating the mouse.
- the button B103 is for instructing the end of the process by the user interaction.
- Scroll bar SB100 is for scrolling the displayed image. The user scrolls the image by operating the displayed scroll bar SB100 using the mouse.
- Image G200 An image G200 shown in FIG. 17 is displayed when the button B101 is pressed in the image G100.
- a display mode when the image I103 of FIG. 16 is designated and the button B101 is pressed is shown.
- the user interaction input unit 30 displays the designated image, and then accepts designation of an object associated with tag information from the user.
- the user designates an area with a finger so as to surround an object associated with tag information.
- the region O201 is specified so as to surround the object O200.
- the user interaction input unit 30 When the user interaction input unit 30 receives the designation of the area O201, the user interaction input unit 30 displays a box T200 for inputting a tag name.
- the user inputs tag information (here, “chair” as a tag name) in the box T200.
- the user interaction input unit 30 acquires the unclassified information of the image associated with the tag information and notifies the local model creation unit 9 together with the tag information.
- the local model creation unit 9 associates the input tag information (“chair”) with the local model created for the specified object O200.
- Image G300 An image G300 shown in FIG. 18 shows an example when an instruction based on a result classified by the data processing apparatus is input.
- the image G300 shown in FIG. 18 includes a library G301, images I300, I301, I302, I303,..., Buttons B300, B301, B302, B303, and a scroll bar SB300.
- the same feature extraction unit 8 and the local model creation unit 9 display the library name for each detected object.
- the library name to be displayed is surrounded by a thick frame so that the user can know the folder being displayed.
- the library name “X001” is surrounded by a thick frame.
- each of the displayed images I100, I101, I102, I103,... Is an image included in the display target library “X001”, and a check box C100, below each displayed image.
- C101, C102, C103,... are displayed.
- the user can specify an image to be processed by checking a check box for one or more images among displayed images. For example, in FIG. 18, in addition to the image I302, three images (a total of four images) are designated.
- the button B300 instructs to create a local model again using a plurality of images designated for processing.
- the same feature extraction unit 8 extracts a feature amount related to the same object from a plurality of designated images.
- the subsequent operations of the same feature extraction unit 8 and the local model creation unit 9 are the same as those in the first embodiment, and a description thereof will be omitted here.
- the button B301 instructs to create a local model again using the remaining images excluding one or more images designated for processing.
- the same feature extraction unit 8 extracts a feature amount related to the same object from a plurality of designated images.
- the subsequent operations of the same feature extraction unit 8 and the local model creation unit 9 are the same as those in the first embodiment, and a description thereof will be omitted here.
- FIG. 18 mainly collects images of dogs, but there are images of only cats and landscapes. By specifying them with check boxes and pressing button B301, A local model can be created again only from images that contain
- the button B302 is for instructing to create a local model for each of the divided image groups by dividing into a plurality of images designated for processing and the remaining images.
- a local model is created for each image group divided by the same feature extraction unit 8 and the local model creation unit 9.
- the button B303 instructs to integrate two or more libraries.
- a local model is created using two or more libraries by the same feature extraction unit 8 and the local model creation unit 9.
- scroll bar SB300 has the same function as the scroll bar SB100, a description thereof is omitted here.
- the user interaction input unit 30 displays the result when the button B300 and the button B301 are pressed and classified again.
- the user interaction input unit 30 displays the result.
- the user interaction input unit 30 displays the result.
- the user specifies other than the main classification contents when various objects are mixed, By pressing the button B301, the content is corrected.
- the library “X001” in FIG. 18 mainly stores images of dogs, but there are also images of cats and scenes. Then, by feeding back to the data processing device that it is wrong, it is possible to obtain an image group in which the content is corrected and only the dog is detected. In addition, it is possible to perform correction methods such as specifying only the correct contents, subdividing when it is desired to further divide the dogs by type, or integrating when the dogs are too divided.
- FIG. 19 is a flowchart showing the specific feedback processing procedure.
- the user interaction input unit 30 acquires the information (step S31). Specifically, in the image G100 shown in FIG. 16 or the image G300 shown in FIG. 18, the image to be processed is specified, and the number of images specified when one of the buttons is pressed is pressed. The processing content corresponding to the button is acquired as input information.
- the input information is information that can improve image processing contents (step S32).
- the information that can be improved here is, when the AV data is an image, the area related information of the subject included in the image, the tag related information, the event related information about the image group, and the number of designated images.
- the user interaction input unit 30 determines whether there are two or more specified images when the button B100 and the button B300 are pressed, and specifies the specified image when the button B101 is pressed. If the button B301 is pressed, it is determined whether there are two or more remaining images excluding the designated image. If button B302 is pressed, each of the two divided image groups includes two or more images. If button B303 is pressed, two or more libraries are specified. It is judged whether it is done.
- the user interaction input unit 30 converts the acquired input information into information that can be processed by the same feature extraction unit 8 or the local model creation unit 9 (Ste S33). Specifically, the user interaction input unit 30 acquires unclassified information (an identifier for identifying AV data) for each of one or more designated images. Further, for example, when a name tag is attached to a pet kept at home, the image and the region with the name are converted into image information (unclassified information) in which the same object exists.
- unclassified information an identifier for identifying AV data
- step S34 The same feature extraction unit 8 and local model creation unit 9 perform various improvements that can be improved based on the converted information, and update the results (step S34).
- the user interaction input unit 30 determines whether or not the user input is completed (step S35). If it is determined that the process has been completed (“Yes” in step S34), the feedback process is completed. If it is determined that the process has not been completed (“No” in step S34), the process returns to step S31, and the process is repeated until the user input is completed.
- step S32 If it is determined that improvement is not possible (“No” in step S32), the process proceeds to step S35.
- the similarity is based on only the similarity of the feature regardless of whether it is the same object or not.
- the same feature was extracted using the clustering method. Therefore, unnecessary feature amounts are also mixed, and the accuracy of extracting the same feature is not so high.
- the similarity is obtained only from the limited image information of the same object. Since the same feature can be extracted by calculation, extraction with high accuracy can be performed.
- the local model creation unit 9 when the same object information is directly input, a necessary local model can be directly learned from an image and created, so that a highly accurate classification model can be generated. Even if the information is only whether the same object is included as indirect information, the classification model created by mistake can be corrected.
- the user interaction may be an individual user input unit, an input unit that is grouped with respect to a certain function, or the like.
- the same feature or local model is not automatically created by the data processing device to create the same feature or local model, but is also modified while considering feedback processing by user input as user interaction.
- By creating a configuration it is possible to obtain a local model that improves the classification accuracy step by step. Therefore, the object category peculiar to a certain local AV data group is corrected and defined step by step, and the model information is ensured. Can be extracted.
- the present invention relates to a method for automatically creating both a basic dictionary DB and a local dictionary DB by considering both of the same characteristics for classifying objects peculiar to each.
- each model is generated instead of a method of accumulating and generating model information of predefined categories.
- a method for automatically generating a general model by generating a similar feature for classifying a general model in addition to the same feature is used.
- FIG. 20 is a block diagram showing the basic configuration of the data processing apparatus 100a of the present invention.
- the data processing device 100a includes a local DB 1, a preprocessing unit 2, a feature amount extraction unit 3, a classification unit 40, a basic dictionary DB 5, a search index DB 6, an identical feature extraction unit 8, and a local database.
- the model creation unit 9 includes a local dictionary DB 10, a reference parameter DB 11, an all-image feature DB (database) 41, a similar feature extraction unit 42, and a global model creation unit 43.
- Local DB1 preprocessing unit 2, feature extraction unit 3, basic dictionary DB5, search index DB6, identical feature extraction unit 8, local model creation unit 9, local dictionary DB10, and reference parameter DB11 Since is the same as that described in the first embodiment, description thereof is omitted here.
- All-image feature DB 41 stores all the unclassified information calculated by the feature amount extraction unit 3.
- Similar feature extraction unit 42 does not classify a specific model (for example, dog) from the feature values of all images, but extracts a feature amount common to various types of models (for example, dogs).
- the similar feature extraction unit 42 uses the first reference parameter table T100 included in the reference parameter DB 11, and the number of uncategorized information stored in the entire image feature DB 41 is similar to that of the feature. It is determined whether or not the number is more than a certain number necessary for starting the extraction process.
- the similar feature extraction unit 42 performs similar feature extraction processing. If the determination is negative, the similar feature extraction unit 42 does not perform a similar feature extraction process.
- the criteria for determining the similarity of feature amounts are lowered than when extracting the same features, the same features are merged with similar features at a certain level, or features other than the same features It is conceivable to use a quantity or define a use feature quantity in advance.
- Global model creation unit 43 uses the similar features extracted by the similar feature extraction unit 42 to define general object categories in a certain local AV data group and calculate model information thereof.
- the data processing apparatus 100a also creates a general classification model from the information of the local AV data group, information that cannot be classified can be reduced and information that can be classified can be increased.
- the classification unit 40 performs matching processing with existing model data using feature amounts extracted from AV data, and determines which model the input data is compatible with I do.
- the classification unit 40 does not perform processing, and the AV data that has been subject to calculation of the feature amount by the feature amount extraction unit 3. Unclassified information is stored in the entire image feature DB 41.
- the classification unit 4 performs a determination process and assigns metadata such as tag information for the AV data.
- the classification model is not defined and held in advance, but based on the feature amount obtained from the local AV data, not only the feature amount having the same subject property but also the feature having the similar subject property is high.
- the data processing apparatus 100a automatically creates all classification models by extracting the quantities. As a result, the data processing apparatus 100a can classify not only a local model having the same subjectivity but also a global model having a similar subjectivity, so that all the categories of objects included in a certain local AV data group are automatically defined.
- the model information can be extracted.
- the present embodiment relates to a method of accepting designation of a plurality of images from a user when generating the same feature or creating a local model, and generating the same feature or local model from the received plurality of images. It is.
- the data processing device 100b includes a local DB 1, a preprocessing unit 2, a feature amount extraction unit 3, a classification unit 4, a basic dictionary DB 5, a search index DB 6, an unclassified feature DB 7, an identical feature extraction unit 58,
- the local model creation unit 59, the local dictionary DB 10, the reference parameter DB 11, and the registration unit 51 are configured.
- Registration unit 51 selects an image group composed of a plurality of images to be classified by the user and generates a local model in order to increase the accuracy of the content processed by the same feature extraction unit 58 and the local model creation unit 59. It is a function that accepts.
- the registration unit 51 displays an image similar to the image G100 illustrated in FIG. 16, the image G200 illustrated in FIG. 17, and the image G300 illustrated in FIG. 18, and receives an instruction from the user.
- a touch panel function is provided as in the third embodiment.
- the screen configuration of the image G100 displayed in the present embodiment is the same as that shown in the third embodiment, and the images to be displayed are different. In the present embodiment, it is assumed that the local model has not yet been created, and the image to be displayed is not used for classification.
- buttons B100, B101, B102, B103, and the scroll bar SB100 are the same as in the third embodiment. The description here is omitted.
- the user can easily select an image group to be registered while performing a scroll operation using the scroll bar SB100.
- the local model generated by the functions of the same feature extraction unit 58 and the local model creation unit 59 described later is registered in the local dictionary DB10.
- Same feature extraction unit 58 The same feature extraction unit 58 extracts the same feature from the image group designated by the registration unit 51.
- the same feature extraction unit 58 captures a plurality of images included in the checked image group at the shooting time. Classify to the nearest thing, that is, the event unit.
- the same feature extraction unit 58 extracts the same feature in a plurality of classified image units. Since the extraction method is the same as that of the same feature extraction unit 8 shown in the first embodiment, description thereof is omitted here.
- the local model creation unit 59 creates a local model for each of the same features extracted in units of a plurality of images classified by the same feature extraction unit 58.
- the local model creation method is the same as that of the local model creation unit 59 shown in the first embodiment, the description thereof is omitted here.
- the registration unit 51 receives a registration instruction and designation of a plurality of target images by the user (step S100). Specifically, the registration unit 51 receives a registration instruction and image designation by pressing the button B100 after a plurality of images are checked in the image G100.
- the same feature extraction unit 58 determines whether a plurality of received images are designated (step S105).
- step S105 If it is determined that a plurality of sheets have not been designated (“No” in step S105), the process ends.
- step S105 When it is determined that a plurality of images are designated (“Yes” in step S105), the same feature extraction unit 58 classifies the event by event unit (step S110).
- the same feature extraction unit 58 selects one event (step S115).
- the same feature extraction unit 58 determines whether or not the number of images included in the selected event is a predetermined number or more (step S120).
- the same feature extraction unit 58 extracts a reference feature amount calculated more than a certain frequency from a plurality of images included in the selected event (step S125).
- the type of the feature quantity may be any feature quantity extracted by the feature quantity extraction unit 3, and it is also conceivable to use a combination of color information and higher-order feature quantity SIFT.
- SIFT feature values are used.
- the reference feature amount can be identified and extracted according to a condition such that a majority of SIFT feature amounts having a degree of similarity equal to or greater than a certain threshold value are present in all designated images.
- the same feature extraction unit 58 calculates the similarity between the representative feature quantity and all frequent feature quantities (step S130). For example, when the frequent feature quantity is a SIFT feature quantity, the distance to each SIFT feature quantity of all image data is calculated as the similarity.
- the same feature extraction unit 58 normalizes the degree of coincidence with SIFT feature amounts in all images that are not classified for each reference feature amount, for example, between 0 (no match at all) and 1 (complete match).
- a similarity distribution is calculated (step S135).
- the same feature extraction unit 58 can determine the same object property when the ratio close to 0 is high and the ratio close to 1 is high, for example, when the distribution is as shown in FIG.
- the frequent feature amount that is considered to be determined is determined and extracted (step S140), and the same feature is output to the local model creation unit 9.
- the same feature extraction unit 58 determines whether there is an unselected event (step S145).
- step S145 If it is determined that it exists (“Yes” in step S145), the same feature extraction unit 58 selects the next event (step S150) and returns to step S120.
- step S145 If it is determined that it does not exist (“No” in step S145), the process ends.
- the local model creation unit 9 creates a local model for each event using the extracted same feature.
- the same feature extraction unit 58 divides the designated image group into event units, but the present invention is not limited to this.
- the same feature extraction unit may extract a plurality of identical features from the designated image group without dividing the designated image group into event units.
- the local model creation unit may classify the plurality of extracted same features into event units, or create a local model from all the extracted same features without classifying into event units. Also good.
- the local model creation unit 59 creates a local model for each event.
- the present invention is not limited to this.
- the local model creation unit may create a local model using all the same features extracted for each event. In this case, only the features common to each local model created for each event are extracted, and the core portion of the local model is generated from the extracted features. Furthermore, by calculating the difference between the local model of the core part and each local model, it is possible to extract the trend change of these local models and generate a new local model suitable for the change trend and the image trend of the entire section. Also good.
- the local model creation unit creates a local model for each event, and a local model for an event that exists between one event and another event and is not specified from an image specified by the user (unselected event).
- a model may be generated from the local model of the one event and the local model of the other event.
- the local model creation unit creates local models in sections 1 and 3, respectively. Creates a local model for the section 2 (section not specified by the user) existing between the sections 1 and 3 from the local models in the sections 1 and 3, respectively.
- the user may select each orientation of the object included in the image when performing the registration instruction.
- the user when selecting a pet or a person as a target for creating a local model, the user can select an image taken from the front, an image taken from the right side, an image taken from the left side, etc. Make a selection accordingly.
- the same feature extraction unit extracts the same feature for each shooting angle.
- the same feature extraction unit 58 divides the image group for each event.
- the present invention is not limited to this.
- the image When the user designates an image, the image may be designated by classification for each event.
- the data processing apparatus displays only unclassified images in a state where the local model has not yet been created.
- the present invention is not limited to this.
- the displayed image may be included in the library to be displayed regardless of whether or not it is classified.
- the local model creation unit 59 generates a local model for each event (for example, for each section shown in FIG. 11), and also specifies the time continuity of the model by the user. Can be determined by group. For example, if the image groups specified by the user are included in the sections 1, 2, and 6 shown in FIG. 11, each section is based on the image group including the target specified by the user in the sections 1, 2, and 6.
- a model it is possible to generate a local model that is the best registration target for the image trend (for example, the average color histogram of the image, the content of feature objects, the background type, etc.) it can.
- the image group specified by the user when the image group specified by the user is included only in the section 3 shown in FIG. 11, it is highly likely that the image was captured in an event, and a local model optimized only in the section is created. You can also. Furthermore, the feature quantity itself that extracts and uses the same feature in each section can be limited and used.
- the discriminator used in the determination process performed by the classification unit 4 is based on the machine learning technique, but is not limited thereto.
- the discriminator may be any method that can discriminate a defined classification item to which a signal having a certain feature amount belongs according to a certain criterion.
- the reference feature value used in the present invention may be any feature value that can be captured by the feature value extraction unit 3 in the AV data.
- partial part feature quantities such as each VisualWord in BoF (Bag Of Features) can be considered, and in speech, a vowel or consonant utterance model as a language basic model can be considered.
- the first reference parameter table T100 is used as an example for starting the extraction process of the same feature, but the contents of this table are not limited.
- the type is not limited.
- the data processing apparatus may perform the same feature extraction process according to the increase / decrease in the number of all data, or satisfy at least two or more of the respective criteria of the first reference parameter table T100. In some cases, processing may be performed.
- the same feature extraction unit 8 and the same feature extraction unit 58 calculate the detection frequency as shown in FIG. 3 for every image data, and the number of detection frequency values detected for each fixed section. May be calculated.
- the value of the number of detected similarity distributions may be normalized to 0 to 1. Thereby, calculation processing can be simplified by doing.
- the same feature extraction unit 8 and the same feature extraction unit 58 have the same reference feature amount when they satisfy any of a plurality of peak value criteria in the second reference parameter table T110. Although it is determined that the object property can be determined, the present invention is not limited to this.
- a peak value reference may be associated with each reference feature used.
- the image is selected using the check box in the example of FIG. 16, but the present invention is not limited to this.
- a single object (chair) is selected and a tag is input, but a plurality of objects may be selected in one image and a tag for each object may be input.
- any method may be used as long as it is a content of user interaction that can correct the processing result of the same feature extraction unit 8 or the local model 9.
- the unclassified feature DB 7 stores an identifier for identifying AV data as unclassified information.
- the present invention is not limited to this.
- the feature amount for the AV data calculated by the feature amount extraction unit 3 may be stored as unclassified information.
- the device of the present invention may be incorporated in a device that can store data for creating a local model, such as a DVD recorder, a TV, a personal computer, or a data server.
- the feature quantity extraction unit is a feature quantity descriptor such as SURF or SIFT that expresses a region feature quantity from a low-order feature quantity such as an edge, color, texture, etc.
- a feature quantity descriptor such as SURF or SIFT that expresses a region feature quantity from a low-order feature quantity such as an edge, color, texture, etc.
- HOG Heistogram of Oriented Gradient
- the feature amount extraction unit may generate a feature group including local feature groups that are similar in edge, color, texture, and the like. At this time, the same feature extraction unit calculates the similarity of the feature amount, the appearance frequency of the feature amount, and the like from each local feature group included in the generated feature group.
- a CPU Central Processing Unit
- a program describing the procedure of the method may be stored in a recording medium and distributed.
- Each component according to each of the above embodiments may be realized as an LSI (Large Scale Integration) that is an integrated circuit. These configurations may be made into one chip, or may be made into one chip so as to include a part or all of them.
- LSI Large Scale Integration
- IC Integrated Circuit
- system LSI super LSI
- ultra LSI ultra LSI
- the method of circuit integration is not limited to LSI, and circuit integration may be performed with a dedicated circuit or a general-purpose processor.
- an FPGA Field Programmable Gate Array
- a reconfigurable processor ReConfigurable Processor
- the calculation of these functional blocks can be performed using, for example, a DSP (Digital Signal Processor) or a CPU (Central Processing Unit). Further, these processing steps can be processed by being recorded on a recording medium as a program and executed.
- the data processing apparatus of the present invention is useful for classifying data that cannot be identified by a general model and creating a local model specialized for the user.
- the data processing apparatus is not limited to creating a local model mainly by a metric space using all feature amounts, but also restricts the feature amounts to be used only to feature amounts having the same subjectivity,
- By generating a model in consideration of time-series continuity by dividing each section it is possible to obtain a local classification model that is highly discriminating with respect to local AV data instead of a general classification model. Therefore, by extracting object information peculiar to the local AV data group with high accuracy and making it an index of data, it becomes possible to classify and search AV data that does not require the user trouble.
- the image processing function to create and classify a classification model specific to the user's local image group, and to view various images Useful as a terminal.
- the present invention can be applied to uses such as a DVD recorder, TV (Television), personal computer software, and a data server.
Abstract
Description
1.1 データ処理装置100の構成
以下、図面を参照して本発明に係る第1の実施の形態について説明する。本実施の形態は、家庭内等のローカルなAV(AudioVideo)データ群を自動整理するデータ処理装置100おいて、ローカルな分類モデルを生成し、AVデータ群に精度良く自動タグ付けする仕組みに関するものである。ここでは、AVデータとは、写真画像データや動画像データや音楽データ等を総称するものである。
ローカルDB1は、家庭内等のファイルデータとして、例えば写真画像データや動画像データや音楽データ等のAV(AudioVideo)データを記憶している。
前処理部2は、AVデータの特徴量を抽出する前に行う処理である。具体的には、前処理部2は、特徴量を抽出し易くするために、AVデータの正規化処理、画像の領域分割処理による背景や物体領域を検出する処理、音声のパワー変化算出によるシーン区間を検出する処理を行う。
特徴量抽出部3は、AVデータの特徴量を抽出する処理である。具体的には、AVデータが画像データである場合には、特徴量抽出部3は、エッジや色やテクスチャ等の低次特徴量から特徴的な点を中心に領域特徴量を現す特徴量記述子であるSURF(Speeded Up Robust Features)やSIFT(Scale-Invariant Feature Transform)等の特徴量、さらには物体の形状特徴を現すHOG(Histogram of oriented Gradient)等の高次特徴が存在する。なお、藤吉弘亘著の「Gradientベースの特徴抽出- SIFTとHOG -」(情報処理学会 研究報告 CVIM 160, pp. 211-224, 2007)に詳細が記載されている。
分類部4は、AVデータから抽出された特徴量を用いて既存のモデルデータとのマッチング処理を行い、入力データがどのモデルと適合しているかの判定処理を行うものである。
基本辞書DB5は、予め分類部4で分類するためのカテゴリが定義されており、そのカテゴリを分類するために必要な各カテゴリのモデル情報が利用する特徴量に応じて蓄積されている。
検索インデクスDB6は、入力したAVデータに対して分類部4で一のモデルと適合した際に、その適合したモデルの分類情報が入力データと関連付けられて蓄積される。
未分類特徴DB7は、分類できなかったAVデータの未分類情報が蓄積される。
同一特徴抽出部8は、未分類特徴DB7に蓄積されている未分類情報に基づいて、分類されなかった複数のAVデータから特徴量の類似性及び特徴量の出現頻度等を算出し、一定の偏りが存在する際に同一物体から得られると推定される同一特徴を抽出するものである。
同一特徴抽出部8は、図2に示すように、分類されなかった全てのAVデータ(画像)毎に、当該AVデータ内の全SIFT特徴量を算出する。そして、同一特徴抽出部8は、分類されなかった全てのAVデータに対して、図3に示すように各VisualWordに類似するSIFT特徴量として検出された数を検出頻度として算出する。
ローカルモデル作成部9は、同一特徴抽出部8で抽出された同一特徴を用いて、あるローカルAVデータ群に特有な物体のカテゴリを定義し、そのモデル情報を算出処理するものである。
ローカル辞書DB10は、ローカルモデル作成部9で算出されたカテゴリ定義及びそのカテゴリを分類するために必要なモデル情報が、利用する特徴量に応じて蓄積される。蓄積DBは、例えばHDDやDVD等の大容量メディアディスクや半導体メモリ等のストレージデバイスである。
基準パラメータDB11は、第1基準パラメータテーブルT100と、第2基準パラメータテーブルT110とを記憶している。
第1基準パラメータテーブルT100は、図5に示すように、データ種類と分類開始量基準パラメータからなる1つ以上の組からなる。
第2基準パラメータテーブルT110は、図6に示すように、データ種類と各種基準パラメータからなる1つ以上の組からなる。
ここでは、ユーザが保有するAVデータを整理するためにAVデータの自動タグ付けをする際のローカルモデルを作成する動作について詳細に説明する。
上述したように、データ処理装置100は、分類できなかったAVデータ(画像)を用いて、これら画像に含まれる特徴量を基に、同一物体性を判定可能だと考えられるピーク値を持つ基準特徴量からモデルデータを生成している。
上記実施の形態では、分類されなかった全AVデータから同一特徴を抽出したが、これに限定されない。分類されなかった全AVデータから同一特徴を抽出するのではなく、一定量や一定時間区間毎のAVデータから同一特徴を抽出してもよい。この場合の同一特徴抽出処理の手順について図8に示すフローチャートを用いて説明する。
以下、図面を参照して、本発明に係る第2の実施の形態について説明する。
ここでは、第2の実施の形態に係るデータ処理装置の構成について、第1の実施の形態と異なる点を中心に説明する。
以下、本実施の形態に係るローカルモデル作成部20の機能構成の一例について、図10を参照しつつ説明する。図10は、ローカルモデル作成部20の機能ブロック図である。ローカルモデル作成部20は、区間情報抽出部21と、区間モデル作成部22と、モデル継続性判定部23とから構成されている。
区間情報抽出部21は、ローカルAVデータのまとまり情報として一定のデータ単位や時間単位や場所単位やイベント単位の区間情報を抽出する。例えば、画像情報としてEXIF(Exchangeable Image File Format)情報に含まれている撮影時間やGPS(Global Positioning System)情報を用いることで、自動的に一定の連続撮影区間情報を算出することが可能である。また、ユーザが作成したフォルダ単位情報等でデータを分割しその区間情報を抽出すること等も考えられる。
区間モデル作成部22は、区間情報抽出部21で抽出された区間毎に、同一特徴抽出部8で算出された特徴量群を用いてローカルモデルを作成する。具体的なモデル生成方法については、第1の実施の形態と同様の方法で作成することができる。
モデル継続性判定部23は、区間毎に作成されたローカルモデルがどのぐらいの長さの区間で継続して作成されているかを算出し、ローカルDB1内での時間的継続性があるかどうかを判定する。モデル継続性判定部23は、時間的継続性の高いローカルモデル、つまり、出現頻度の高いローカルモデルから順次タグ付けを行う。
以下では、AVデータが画像であった場合の具体的なローカルモデルの作成手法について詳しく説明する。図11は、画像において区間毎に存在する画像群の一例を示す図である。なお、ここでは、ローカルDB1には、横軸を時間軸、1時間毎の画像量を縦軸にした図11に示すような特定ユーザが時系列的に撮影した画像群が存在しているとする。
上記の動作を行うことで、区間情報抽出部21は、例えば、図11に示すような区間1から6を抽出することができる。
以上により、本実施の形態によると、データ処理装置は、各種作成されたローカルモデルのローカルDB1内での継続性を判定することで、短期間や単発的に有効なローカルモデルを作成したり、長期的にまたは周期的に有効なモデルを作成したり経年変化に合わせて適応的に変化するローカルモデルを作成することができる。
本実施の形態において、区間情報抽出部21は、処理対象となる全画像の時間情報をEXIF情報から抽出したが、データ生成時間情報から抽出してもよい。
以下、図面を参照して、本発明に係る第3の実施の形態について説明する。
ここでは、第3の実施の形態に係るデータ処理装置の構成について、第1の実施の形態と異なる点を中心に説明する。
ユーザインタラクション入力部30は、同一特徴抽出部8やローカルモデル作成部9で処理される内容の精度を改善するために、ユーザの持つAVデータまたはデータ処理装置により算出される出力結果に対して付加的な情報が入力される機能である。
図16で示す画像G100は、同一物体であることやタグ情報を画像に入力する際の一例を示すものである。
図17で示す画像G200は、画像G100においてボタンB101が押下された場合に、表示されるものである。ここでは、図16の画像I103が指定され、ボタンB101が押下された場合の表示態様を示す。
図18で示す画像G300は、データ処理装置によって分類された結果に基づく指示を入力する際の一例を示すものである。
ユーザインタラクションによる指示の受付を用いることによって、同一特徴抽出処理及びローカルモデル作成処理の改善方法について具体的に説明する。図19は、その具体的なフィードバック処理の手順を示したフローチャートである。
具体的には、ユーザインタラクション入力部30は、ボタンB100及びボタンB300が押下された場合には指定された画像が2つ以上あるか否か、ボタンB101が押下された場合には指定された画像が1つ以上あるか否か、ボタンB301が押下された場合には、指定された画像を除く残りの画像が2つ以上存在するか否かを判断する。また、ボタンB302が押下された場合には分割された2つの画像群それぞれに、2つ以上の画像が含まれているか否か、ボタンB303が押下された場合には2つ以上のライブラリが指定されているか否かを判断する。
第1の実施の形態では、同一特徴抽出部8では、自動的に同一特徴を抽出していた際には同一物体かどうかは関係なく特徴の類似性のみを判断根拠として類似性によるクラスタリング手法を用いて同一特徴を抽出していた。そのため、不要な特徴量も混在することとなり、同一特徴の抽出の精度はあまり高くないものとなる。しかしながら、本実施の形態では、ユーザが同一物体を予め指定するので、データ処理装置は、同一物体だと情報が予め分かっている場合には、限定された同一物体の画像情報のみから類似性を算出して同一特徴を抽出できるため、精度の高い抽出を行うことができる。
以下、図面を参照して、本発明に係る第4の実施の形態について説明する。
図20は本発明のデータ処理装置100aの原理的な構成を示すブロック図である。図20において、データ処理装置100aは、ローカルDB1と、前処理部2と、特徴量抽出部3と、分類部40と、基本辞書DB5と、検索インデクスDB6と、同一特徴抽出部8と、ローカルモデル作成部9と、ローカル辞書DB10と、基準パラメータDB11と、全画像特徴DB(データベース)41と、類似特徴抽出部42と、グローバルモデル作成部43とから構成されている。ローカルDB1と、前処理部2と、特徴量抽出部3と、基本辞書DB5と、検索インデクスDB6と、同一特徴抽出部8と、ローカルモデル作成部9と、ローカル辞書DB10と、基準パラメータDB11とについては、第1の実施の形態に記載の内容と同じであるので、ここでの説明は省略する。
全画像特徴DB41は、特徴量抽出部3で算出された全ての未分類情報が蓄積される。
類似特徴抽出部42は、全画像の特徴量から特定のモデル(例えば、犬)を分類するのではなく、色々な種類のモデル(例えば、犬)に共通な特徴量を抽出する。
グローバルモデル作成部43は、類似特徴抽出部42で抽出された類似特徴を用いて、あるローカルAVデータ群における一般的な物体のカテゴリを定義しそのモデル情報を算出処理する。
分類部40は、第1の実施の形態と同様に、AVデータから抽出された特徴量を用いて既存のモデルデータとのマッチング処理を行い、入力データがどのモデルと適合しているかの判定処理を行う。
以上のように、予め分類モデルを定義して保持しておくのではなく、ローカルAVデータから得られる特徴量によって、同一被写体性の高い特徴量のみではなく類似被写体性の高い特徴量も抽出して全ての分類モデルを、データ処理装置100aは自動的に作成している。これにより、データ処理装置100aは、同一被写体性の高いローカルモデルだけではなく、類似被写体性の高いグローバルモデルも分類できるため、あるローカルAVデータ群に含まれる物体のカテゴリを全て自動的に定義しそのモデル情報を抽出することが可能となる。
以下、図面を参照して、本発明に係る第5の実施の形態について説明する。
ここでは、第5の実施の形態に係るデータ処理装置100bの構成について、第1の実施の形態及び第3の実施の形態と異なる点を中心に説明する。
登録部51は、同一特徴抽出部58やローカルモデル作成部59で処理される内容の精度を高めるため、ユーザが分類したい複数の画像からなる画像群を選択してローカルモデルを生成するための指示を受け付ける機能である。
同一特徴抽出部58は、登録部51で指定された画像群から同一の特徴を抽出するものである。
ローカルモデル作成部59は、同一特徴抽出部58で分類された複数の画像単位で抽出された同一特徴毎に、ローカルモデルを作成する。
ここでは、データ処理装置100がユーザ指定により指定された画像群から同一特徴抽出する際の処理について、図22に示すフローチャートを用いて説明する。
以上、本発明の一例として、第5の実施の形態に基づいて説明したが、これに限定されない。例えば、以下のような変形例が考えられる。
上述したように、ローカルモデル作成部59は、イベント単位毎(例えば、図11に示す区間単位毎)にローカルモデルを生成すると共にそのモデルの時間継続性をユーザにより指定された画像群で判定することができる。例えば、ユーザが指定した画像群が図11に示す区間1と2と6に含まれていた場合、区間1と2と6でユーザが指定した対象を含む画像群を基にそれぞれの区間についてローカルモデルを生成する事で、それぞれの区間全体の画像傾向(例えば画像の平均的な色ヒストグラムや特徴物体の含有度や背景種類等)に対して最適な登録対象となるローカルモデルを生成することができる。
以上、実施の形態に基づいて説明したが、本発明は上記の各実施の形態に限られない。例えば、以下のような変形例が考えられる。
2 前処理部
3 特徴量抽出部
4 分類部
5 基本辞書DB
6 検索インデクスDB
7 未分類特徴DB
8 同一特徴抽出部
9 ローカルモデル作成部
10 ローカル辞書DB
11 基準パラメータDB
20 ローカルモデル作成部
21 区間情報抽出部
22 区間モデル作成部
23 モデル継続性判定部
30 ユーザインタラクション入力部
40 分類部
41 全画像特徴DB
42 類似特徴抽出部
43 グローバルモデル作成部
100 データ処理装置
Claims (16)
- 複数の特徴量それぞれの検出頻度の組み合わせからなり、オブジェクトの分類に用いられる複数のモデルデータを保持する記憶手段と、
前記複数のモデルデータと、分類対象のデータにおいて検出される2つ以上の特徴量それぞれの検出頻度とから、当該データに含まれるオブジェクトの分類の可否を特定する分類手段と、
複数の分類対象のデータについて前記分類手段による処理を行った後、オブジェクトの分類が否と特定された未識別データが複数存在する場合、検出される頻度が同一である未識別データが一定数以上である特徴量を2つ以上特定する特定手段と、
新たなモデルデータを、クラス作成手法により、特定された2つ以上の特徴量に基づいて作成し、前記記憶手段へ格納するモデル作成手段とを備える
ことを特徴とするデータ処理装置。 - 前記特定手段は、
前記未識別データ毎に、前記複数の特徴量それぞれに対して当該特徴量に類似する特徴量が検出される検出頻度を取得し、
前記未識別データそれぞれから取得された検出頻度から、前記複数の特徴量毎に検出頻度の分布度合を生成し、
前記分布度合それぞれから、検出される頻度が同一である未識別データが一定数以上となる特徴量を2つ以上特定する
ことを特徴とする請求項1に記載のデータ処理装置。 - 前記特定手段は、
前記複数の未識別データを所定規則に従った区間毎にグループ化して複数のデータ群を生成し、前記データ群毎に、検出頻度の取得、分布度合の生成、及び特徴量の特定を行い、
前記モデル作成手段は、
前記データ群毎に新たなモデルデータを作成する
ことを特徴とする請求項2に記載のデータ処理装置。 - 前記未識別データそれぞれには、当該未識別データが作成された日時を示す時間情報が対応付けられており、
前記所定規則に従った区間とは、一定の時間帯毎に区切られた期間であり、
前記特定手段は、
前記複数の未識別データを一定の時間帯に区切られた期間毎にグループ化して前記複数のデータ群を生成する
ことを特徴とする請求項3に記載のデータ処理装置。 - 前記モデル作成手段は、
複数の新たなモデルデータが作成された場合、一のデータ群において生成された一のモデルデータが、他のデータ群において生成された他のモデルデータと時間的推移による相関関係があるか否かを判定し、相関関係があると判定する場合には、当該一のモデルデータと当該他のモデルデータとを時間変化性をもつモデルデータとして対応付ける
ことを特徴とする請求項4に記載のデータ処理装置。 - 前記モデル作成手段は、
前記一のモデルデータを特徴付ける第1の特徴量の類似性の変化度と、前記他のモデルデータを特徴付ける第2の特徴量の類似性の変化度との間に比例関係がある場合に、相関関係があると判定する
ことを特徴とする請求項5に記載のデータ処理装置。 - 前記モデル作成手段は、
一のデータ群において生成された一のモデルデータが、他のデータ群において生成された他のモデルデータと同一である場合又は残りのデータ群において当該一のモデルデータと同一のものが周期的に出現する場合には、当該一のモデルデータのみを前記記憶手段へ記憶する
ことを特徴とする請求項5に記載のデータ処理装置。 - 前記特定手段は、
前記未識別データ全てを用いて、特徴量毎に対する算出頻度を取得し、取得した算出頻度が所定頻度以上である1つ以上の特徴量を特定し、前記オブジェクトの識別がされなかったデータ毎に、取得した1つ以上の特徴量それぞれに対する検出頻度を取得し、
当該データ毎に取得された1つ以上の検出頻度から、前記分布度合を生成する
ことを特徴とする請求項2に記載のデータ処理装置。 - 前記データ処理装置は、さらに、
前記複数の未識別データを表示する表示手段と、
表示された未識別データから2つ以上のデータの指定をユーザから受け付ける指示受付手段を備え、
前記特定手段は、
前記指示受付手段で受け付けた前記2つ以上の未識別データそれぞれから取得される前記複数の特徴量毎の検出頻度から、または前記2つ以上のデータを除く残りのデータそれぞれから取得される前記複数の特徴量毎の検出頻度から、前記複数の特徴量毎の前記分布度合を作成する
ことを特徴とする請求項2に記載のデータ処理装置。 - 前記指示受付手段は、新たなモデルデータが未だ作成されていないときに、前記指示を受け付ける
ことを特徴とする請求項9に記載のデータ処理装置。 - 前記特定手段は、
前記指示受付手段で受け付けた前記2つ以上の未識別データそれぞれについて、当該未識別データの作成日時に基づいて、当該作成日時が複数の期間のうち何れかの期間に属するよう、前記2つ以上の未識別データそれぞれをグループ分けし、グループ毎に前記分布度合を作成する
ことを特徴とする請求項10に記載のデータ処理装置。 - 前記データ処理装置は、さらに、
前記モデルデータ作成手段で作成された前記新たなモデルデータにより識別されるオブジェクトを有するとみされる複数のデータを表示する表示手段と、
表示された複数のデータから2つ以上のデータの指定をユーザから受け付ける指示受付手段を備え、
前記特定手段は、
前記指示受付手段で受け付けた前記2つ以上のデータそれぞれから取得される前記複数の特徴量毎の検出頻度から、または前記2つ以上のデータを除く残りのデータそれぞれから取得される前記複数の特徴量毎の検出頻度から、前記複数の特徴量毎に前記分布度合とは異なる分布度合を作成し、
前記モデルデータ作成手段は、
前記異なる分布度合から前記新たなモデルデータとは異なるモデルデータを作成する
ことを特徴とする請求項2に記載のデータ処理装置。 - 前記データは画像であり、
前記特定手段は、
前記オブジェクトの識別がされなかった画像毎に、当該画像で検出される1つ以上の特徴量の類似度を用いて少なくとも局所特徴群を含む高次特徴群を生成し、各局所特徴群から前記検出頻度を取得する
ことを特徴とする請求項2に記載のデータ処理装置。 - 複数の特徴量それぞれの検出頻度の組み合わせからなり、オブジェクトの分類に用いられる複数のモデルデータを保持する記憶手段を備えるデータ処理装置で用いられるデータ処理方法であって、
前記複数のモデルデータと、分類対象のデータにおいて検出される2つ以上の特徴量それぞれの検出頻度とから、当該データに含まれるオブジェクトの分類の可否を特定する分類ステップと、
複数の分類対象のデータについて前記分類ステップによる処理を行った後、オブジェクトの分類が否と特定された未識別データが複数存在する場合、検出される頻度が同一である未識別データが一定数以上である特徴量を2つ以上特定する特定ステップと、
新たなモデルデータを、クラス作成手法により、特定された2つ以上の特徴量に基づいて作成し、前記記憶手段へ格納するモデル作成ステップとを含む
ことを特徴とするデータ処理方法。 - 複数の特徴量それぞれの検出頻度の組み合わせからなり、オブジェクトの分類に用いられる複数のモデルデータを保持する記憶手段を備えるデータ処理装置で用いられるプログラムであって、
前記データ処理装置に、
前記複数のモデルデータと、分類対象のデータにおいて検出される2つ以上の特徴量それぞれの検出頻度とから、当該データに含まれるオブジェクトの分類の可否を特定する分類ステップと、
複数の分類対象のデータについて前記分類ステップによる処理を行った後、オブジェクトの分類が否と特定された未識別データが複数存在する場合、検出される頻度が同一である未識別データが一定数以上である特徴量を2つ以上特定する特定ステップと、
新たなモデルデータを、クラス作成手法により、特定された2つ以上の特徴量に基づいて作成し、前記記憶手段へ格納するモデル作成ステップとを実行させる
ことを特徴とするプログラム。 - データ処理装置で用いられる集積回路であって、
複数の特徴量それぞれの検出頻度の組み合わせからなり、オブジェクトの分類に用いられる複数のモデルデータを保持する記憶手段と、
前記複数のモデルデータと、分類対象のデータにおいて検出される2つ以上の特徴量それぞれの検出頻度とから、当該データに含まれるオブジェクトの分類の可否を特定する分類手段と、
複数の分類対象のデータについて前記分類手段による処理を行った後、オブジェクトの分類が否と特定された未識別データが複数存在する場合、検出される頻度が同一である未識別データが一定数以上である特徴量を2つ以上特定する特定手段と、
新たなモデルデータを、クラス作成手法により、特定された2つ以上の特徴量に基づいて作成し、前記記憶手段へ格納するモデル作成手段とを備える
ことを特徴とする集積回路。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/146,253 US8583647B2 (en) | 2010-01-29 | 2010-12-24 | Data processing device for automatically classifying a plurality of images into predetermined categories |
EP20100841825 EP2530605A4 (en) | 2010-01-29 | 2010-12-24 | DATA PROCESSING UNIT |
CN201080012541.6A CN102356393B (zh) | 2010-01-29 | 2010-12-24 | 数据处理装置 |
JP2011536678A JP5576384B2 (ja) | 2010-01-29 | 2010-12-24 | データ処理装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010-018035 | 2010-01-29 | ||
JP2010018035 | 2010-01-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011092793A1 true WO2011092793A1 (ja) | 2011-08-04 |
Family
ID=44318806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/007518 WO2011092793A1 (ja) | 2010-01-29 | 2010-12-24 | データ処理装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US8583647B2 (ja) |
EP (1) | EP2530605A4 (ja) |
JP (1) | JP5576384B2 (ja) |
CN (1) | CN102356393B (ja) |
WO (1) | WO2011092793A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013206116A (ja) * | 2012-03-28 | 2013-10-07 | Fujitsu Ltd | 音声データ検索装置、音声データ検索方法および音声データ検索プログラム |
WO2019065582A1 (ja) * | 2017-09-29 | 2019-04-04 | 富士フイルム株式会社 | 画像データ判別システム、画像データ判別プログラム、画像データ判別方法、及び撮像システム |
CN109670267A (zh) * | 2018-12-29 | 2019-04-23 | 北京航天数据股份有限公司 | 一种数据处理方法和装置 |
US11741363B2 (en) | 2018-03-13 | 2023-08-29 | Fujitsu Limited | Computer-readable recording medium, method for learning, and learning device |
Families Citing this family (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5755046B2 (ja) * | 2011-06-22 | 2015-07-29 | キヤノン株式会社 | 画像認識装置、画像認識方法及びプログラム |
US8874557B2 (en) | 2011-09-02 | 2014-10-28 | Adobe Systems Incorporated | Object retrieval and localization using a spatially-constrained similarity model |
US8781255B2 (en) | 2011-09-17 | 2014-07-15 | Adobe Systems Incorporated | Methods and apparatus for visual search |
JP5833880B2 (ja) * | 2011-10-07 | 2015-12-16 | キヤノンイメージングシステムズ株式会社 | 情報処理装置、デバイス制御装置、デバイス制御システム、およびその制御方法 |
US9105073B2 (en) * | 2012-04-24 | 2015-08-11 | Amadeus S.A.S. | Method and system of producing an interactive version of a plan or the like |
US8880563B2 (en) | 2012-09-21 | 2014-11-04 | Adobe Systems Incorporated | Image search by query object segmentation |
CN104239315B (zh) * | 2013-06-09 | 2018-03-30 | 北京三星通信技术研究有限公司 | 一种图片关联的方法 |
US10262462B2 (en) | 2014-04-18 | 2019-04-16 | Magic Leap, Inc. | Systems and methods for augmented and virtual reality |
US9336280B2 (en) | 2013-12-02 | 2016-05-10 | Qbase, LLC | Method for entity-driven alerts based on disambiguated features |
US9177262B2 (en) | 2013-12-02 | 2015-11-03 | Qbase, LLC | Method of automated discovery of new topics |
US9542477B2 (en) | 2013-12-02 | 2017-01-10 | Qbase, LLC | Method of automated discovery of topics relatedness |
US9922032B2 (en) | 2013-12-02 | 2018-03-20 | Qbase, LLC | Featured co-occurrence knowledge base from a corpus of documents |
WO2015084726A1 (en) | 2013-12-02 | 2015-06-11 | Qbase, LLC | Event detection through text analysis template models |
US9223833B2 (en) | 2013-12-02 | 2015-12-29 | Qbase, LLC | Method for in-loop human validation of disambiguated features |
WO2015084756A1 (en) * | 2013-12-02 | 2015-06-11 | Qbase, LLC | Event detection through text analysis using trained event template models |
US9223875B2 (en) | 2013-12-02 | 2015-12-29 | Qbase, LLC | Real-time distributed in memory search architecture |
US9230041B2 (en) | 2013-12-02 | 2016-01-05 | Qbase, LLC | Search suggestions of related entities based on co-occurrence and/or fuzzy-score matching |
US9424294B2 (en) | 2013-12-02 | 2016-08-23 | Qbase, LLC | Method for facet searching and search suggestions |
KR20160124742A (ko) | 2013-12-02 | 2016-10-28 | 큐베이스 엘엘씨 | 비정형 텍스트내의 특징들의 중의성을 해소하는 방법 |
US9355152B2 (en) | 2013-12-02 | 2016-05-31 | Qbase, LLC | Non-exclusionary search within in-memory databases |
US9619571B2 (en) | 2013-12-02 | 2017-04-11 | Qbase, LLC | Method for searching related entities through entity co-occurrence |
US9201744B2 (en) | 2013-12-02 | 2015-12-01 | Qbase, LLC | Fault tolerant architecture for distributed computing systems |
US9544361B2 (en) | 2013-12-02 | 2017-01-10 | Qbase, LLC | Event detection through text analysis using dynamic self evolving/learning module |
US9317565B2 (en) | 2013-12-02 | 2016-04-19 | Qbase, LLC | Alerting system based on newly disambiguated features |
US9424524B2 (en) | 2013-12-02 | 2016-08-23 | Qbase, LLC | Extracting facts from unstructured text |
US9208204B2 (en) | 2013-12-02 | 2015-12-08 | Qbase, LLC | Search suggestions using fuzzy-score matching and entity co-occurrence |
US9659108B2 (en) | 2013-12-02 | 2017-05-23 | Qbase, LLC | Pluggable architecture for embedding analytics in clustered in-memory databases |
US9984427B2 (en) | 2013-12-02 | 2018-05-29 | Qbase, LLC | Data ingestion module for event detection and increased situational awareness |
US9348573B2 (en) | 2013-12-02 | 2016-05-24 | Qbase, LLC | Installation and fault handling in a distributed system utilizing supervisor and dependency manager nodes |
US9025892B1 (en) | 2013-12-02 | 2015-05-05 | Qbase, LLC | Data record compression with progressive and/or selective decomposition |
US9430547B2 (en) | 2013-12-02 | 2016-08-30 | Qbase, LLC | Implementation of clustered in-memory database |
US9547701B2 (en) | 2013-12-02 | 2017-01-17 | Qbase, LLC | Method of discovering and exploring feature knowledge |
US9361317B2 (en) | 2014-03-04 | 2016-06-07 | Qbase, LLC | Method for entity enrichment of digital content to enable advanced search functionality in content management systems |
US10147015B2 (en) * | 2014-05-07 | 2018-12-04 | Nec Corporation | Image processing device, image processing method, and computer-readable recording medium |
KR102024867B1 (ko) * | 2014-09-16 | 2019-09-24 | 삼성전자주식회사 | 예제 피라미드에 기초하여 입력 영상의 특징을 추출하는 방법 및 얼굴 인식 장치 |
JP6814981B2 (ja) * | 2016-07-21 | 2021-01-20 | パナソニックIpマネジメント株式会社 | 学習装置、識別装置、学習識別システム、及び、プログラム |
WO2019012654A1 (ja) * | 2017-07-13 | 2019-01-17 | 日本電気株式会社 | 分析システム、分析方法及び記憶媒体 |
US10887656B2 (en) * | 2018-07-14 | 2021-01-05 | International Business Machines Corporation | Automatic content presentation adaptation based on audience |
CN114781194B (zh) * | 2022-06-20 | 2022-09-09 | 航天晨光股份有限公司 | 基于金属软管的数据库的构建方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004185259A (ja) * | 2002-12-03 | 2004-07-02 | Renesas Technology Corp | 蓄積画像管理装置及びプログラム |
JP2008090698A (ja) * | 2006-10-04 | 2008-04-17 | Fujifilm Corp | 画像分類装置および方法ならびにプログラム |
JP2010003177A (ja) * | 2008-06-20 | 2010-01-07 | Secom Co Ltd | 画像処理装置 |
Family Cites Families (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6741986B2 (en) * | 2000-12-08 | 2004-05-25 | Ingenuity Systems, Inc. | Method and system for performing information extraction and quality control for a knowledgebase |
US6826576B2 (en) * | 2001-05-07 | 2004-11-30 | Microsoft Corporation | Very-large-scale automatic categorizer for web content |
US6993535B2 (en) * | 2001-06-18 | 2006-01-31 | International Business Machines Corporation | Business method and apparatus for employing induced multimedia classifiers based on unified representation of features reflecting disparate modalities |
ATE528724T1 (de) | 2002-08-30 | 2011-10-15 | Mvtec Software Gmbh | Auf hierarchischen gliedern basierte erkennung von gegenständen |
US7814089B1 (en) * | 2003-12-17 | 2010-10-12 | Topix Llc | System and method for presenting categorized content on a site using programmatic and manual selection of content items |
US8271495B1 (en) * | 2003-12-17 | 2012-09-18 | Topix Llc | System and method for automating categorization and aggregation of content from network sites |
US8037036B2 (en) * | 2004-11-17 | 2011-10-11 | Steven Blumenau | Systems and methods for defining digital asset tag attributes |
JP4472631B2 (ja) * | 2005-12-28 | 2010-06-02 | オリンパスメディカルシステムズ株式会社 | 画像処理装置および当該画像処理装置における画像処理方法 |
EP1969992B1 (en) | 2005-12-28 | 2012-05-02 | Olympus Medical Systems Corp. | Image processing device and image processing method in the image processing device |
EP1840764A1 (en) * | 2006-03-30 | 2007-10-03 | Sony France S.A. | Hybrid audio-visual categorization system and method |
TWI384413B (zh) * | 2006-04-24 | 2013-02-01 | Sony Corp | An image processing apparatus, an image processing method, an image processing program, and a program storage medium |
US7783085B2 (en) * | 2006-05-10 | 2010-08-24 | Aol Inc. | Using relevance feedback in face recognition |
US20080089591A1 (en) | 2006-10-11 | 2008-04-17 | Hui Zhou | Method And Apparatus For Automatic Image Categorization |
JP2008282085A (ja) * | 2007-05-08 | 2008-11-20 | Seiko Epson Corp | シーン識別装置、及び、シーン識別方法 |
US8558952B2 (en) * | 2007-05-25 | 2013-10-15 | Nec Corporation | Image-sound segment corresponding apparatus, method and program |
JP2009004999A (ja) * | 2007-06-20 | 2009-01-08 | Panasonic Corp | 映像データ管理装置 |
JP5166409B2 (ja) * | 2007-11-29 | 2013-03-21 | 株式会社東芝 | 映像処理方法および映像処理装置 |
US8170280B2 (en) * | 2007-12-03 | 2012-05-01 | Digital Smiths, Inc. | Integrated systems and methods for video-based object modeling, recognition, and tracking |
US20120272171A1 (en) * | 2011-04-21 | 2012-10-25 | Panasonic Corporation | Apparatus, Method and Computer-Implemented Program for Editable Categorization |
-
2010
- 2010-12-24 CN CN201080012541.6A patent/CN102356393B/zh active Active
- 2010-12-24 EP EP20100841825 patent/EP2530605A4/en not_active Ceased
- 2010-12-24 WO PCT/JP2010/007518 patent/WO2011092793A1/ja active Application Filing
- 2010-12-24 JP JP2011536678A patent/JP5576384B2/ja active Active
- 2010-12-24 US US13/146,253 patent/US8583647B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004185259A (ja) * | 2002-12-03 | 2004-07-02 | Renesas Technology Corp | 蓄積画像管理装置及びプログラム |
JP2008090698A (ja) * | 2006-10-04 | 2008-04-17 | Fujifilm Corp | 画像分類装置および方法ならびにプログラム |
JP2010003177A (ja) * | 2008-06-20 | 2010-01-07 | Secom Co Ltd | 画像処理装置 |
Non-Patent Citations (2)
Title |
---|
See also references of EP2530605A4 * |
TAICHI JOTO ET AL.: "Web Image Classification with Bag-of-Keypoints", IPSJ SIG NOTES, vol. 2007, no. 42, 15 May 2007 (2007-05-15), pages 201 - 208, XP008167511 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013206116A (ja) * | 2012-03-28 | 2013-10-07 | Fujitsu Ltd | 音声データ検索装置、音声データ検索方法および音声データ検索プログラム |
WO2019065582A1 (ja) * | 2017-09-29 | 2019-04-04 | 富士フイルム株式会社 | 画像データ判別システム、画像データ判別プログラム、画像データ判別方法、及び撮像システム |
US11741363B2 (en) | 2018-03-13 | 2023-08-29 | Fujitsu Limited | Computer-readable recording medium, method for learning, and learning device |
CN109670267A (zh) * | 2018-12-29 | 2019-04-23 | 北京航天数据股份有限公司 | 一种数据处理方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
US20120117069A1 (en) | 2012-05-10 |
CN102356393B (zh) | 2014-04-09 |
US8583647B2 (en) | 2013-11-12 |
EP2530605A1 (en) | 2012-12-05 |
JPWO2011092793A1 (ja) | 2013-05-30 |
JP5576384B2 (ja) | 2014-08-20 |
EP2530605A4 (en) | 2013-12-25 |
CN102356393A (zh) | 2012-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5576384B2 (ja) | データ処理装置 | |
JP5934653B2 (ja) | 画像分類装置、画像分類方法、プログラム、記録媒体、集積回路、モデル作成装置 | |
US6977679B2 (en) | Camera meta-data for content categorization | |
US8520909B2 (en) | Automatic and semi-automatic image classification, annotation and tagging through the use of image acquisition parameters and metadata | |
KR101346730B1 (ko) | 화상 처리 시스템, 화상 처리 장치 및 방법, 프로그램, 및기록 매체 | |
KR101289085B1 (ko) | 객체 기반 영상 검색시스템 및 검색방법 | |
JP5385759B2 (ja) | 画像処理装置及び画像処理方法 | |
TWI223171B (en) | System for classifying files of non-textual subject data, method for categorizing files of non-textual data and method for identifying a class for data file at a classification node | |
US20140093174A1 (en) | Systems and methods for image management | |
WO2007120558A2 (en) | Image classification based on a mixture of elliptical color models | |
CN114138993A (zh) | 基于用户行为的内容推荐的系统和方法 | |
JP2014093058A (ja) | 画像管理装置、画像管理方法、プログラム及び集積回路 | |
JP2014092955A (ja) | 類似コンテンツ検索処理装置、類似コンテンツ検索処理方法、およびプログラム | |
EP2156438A1 (en) | Method and apparatus for automatically generating summaries of a multimedia file | |
Ardizzone et al. | A novel approach to personal photo album representation and management | |
Zhang et al. | Automatic preview frame selection for online videos | |
Cerosaletti et al. | Approaches to consumer image organization based on semantic categories | |
JP5144557B2 (ja) | 映像分類方法、映像分類装置および映像分類プログラム | |
WO2004008344A1 (en) | Annotation of digital images using text | |
CN117786137A (zh) | 一种多媒体数据查询方法、装置、设备及可读存储介质 | |
Pulc et al. | Application of Meta-learning Principles in Multimedia Indexing | |
Broilo et al. | Personal photo album summarization for global and local photo annotation | |
Zawbaa et al. | Semi-automatic annotation system for home videos | |
Ardizzone et al. | Automatic image representation for content-based access to personal photo album | |
Malini et al. | Average mean based feature extraction for image retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080012541.6 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 3096/KOLNP/2011 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010841825 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13146253 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011536678 Country of ref document: JP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10841825 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |