WO2022130734A1 - Metadata extraction program - Google Patents

Metadata extraction program Download PDF

Info

Publication number
WO2022130734A1
WO2022130734A1 PCT/JP2021/036456 JP2021036456W WO2022130734A1 WO 2022130734 A1 WO2022130734 A1 WO 2022130734A1 JP 2021036456 W JP2021036456 W JP 2021036456W WO 2022130734 A1 WO2022130734 A1 WO 2022130734A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
information
image
intent
metadata
Prior art date
Application number
PCT/JP2021/036456
Other languages
French (fr)
Japanese (ja)
Inventor
基光 白川
Original Assignee
ソプラ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソプラ株式会社 filed Critical ソプラ株式会社
Publication of WO2022130734A1 publication Critical patent/WO2022130734A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates

Definitions

  • the present invention relates to a metadata extraction program suitable for extracting metadata from information contained in an image of an information medium and further identifying an appropriate image corresponding to a received conversational sentence.
  • Metadata is associated with each image of each information medium.
  • the metadata referred to here is a description of the attribute values of the determined attributes for the image of the information medium.
  • the contents of the open-air bath photo that is, the information indicating that the content of the photo is an "open-air bath"
  • the comfort and warmth created by the steam from the open-air bath photo and the scenery around the open-air bath.
  • the work of associating metadata with images of such information media is a very complicated work.
  • the text information in an image can be easily converted into metadata by using a well-known technique such as OCR, but it is difficult to immediately convert this into metadata for a photograph.
  • OCR a well-known technique
  • Metadata which is particularly useful in business, often requires human definition, which is time-consuming.
  • Patent Document 1 Although a method for generating metadata from an image has been proposed in the past (see, for example, Patent Document 1), there is no mention of generating metadata that must rely on human visual discrimination. There is no particular mention of generating and associating metadata that is more convenient for performing an ex post facto search of images in information media with respect to the received conversational text.
  • the present invention has been devised in view of the above-mentioned problems, and an object thereof is a metadata extraction program for extracting metadata from information contained in an image, particularly by human vision. It is possible to extract metadata from images of information media that have to rely on discrimination, and it is also possible to generate and associate metadata that is more convenient for subsequent searches. To provide a metadata extraction program and system.
  • the metadata extraction system is a metadata extraction program that extracts metadata from information contained in an image of an information medium, and a feature map generation step of generating a feature map that extracts features from the image of the information medium.
  • Inference refers to one or more individual inference models in which the feature map and the feature label for each element are related to each other, and extracts the feature label for each element as metadata from the feature map generated in the feature map generation step. It is characterized by having a computer perform steps.
  • the image specifying program includes a conversation sentence reception step for accepting a conversation sentence, an entity extraction step for extracting one or more entities included in one or more conversation sentences received in the conversation sentence reception step, a feature label, and a feature label.
  • entity consisting of the derived associative words refers to the entity table associated with the image one-to-one or one-to-many, and identifies the image associated with one or more entities extracted in the above entity extraction step. It is characterized by having an image identification step.
  • the metadata extraction system is a metadata extraction system that extracts metadata from information contained in an image of an information medium, and is a feature map generation means for generating a feature map that extracts features from the image of the information medium.
  • Inference refers to one or more individual inference models in which the feature map and the feature label for each element are associated with each other, and extracts the feature label for each element as metadata from the feature map generated by the feature map generation means. It is characterized by having means.
  • imaged information media there are various types of imaged information media, and it is possible to generate metadata that is more convenient for performing an ex post facto search of images of these information media.
  • spoken conversational sentences are acquired by voice, it is possible to extract images of appropriate information media corresponding to the received conversational sentences with high accuracy, and generate and associate metadata that can realize this. Can be done.
  • this metadata includes associative words, in addition to the keyword of the feature label itself, various words that can be associated with it are included. Therefore, at the time of specifying the image, even if such an associative word is included in the conversation sentence, the image can be specified from there. For this reason, it is also possible to generate metadata according to a sensory impression such as "hot”, “heavy”, “fast”, etc. that a person who sees an image receives.
  • FIG. 1 is a diagram showing an overall configuration of a metadata extraction system.
  • FIG. 2 is a block configuration diagram of a metadata extraction device.
  • FIG. 3 is a diagram showing a comprehensive inference model according to the present embodiment.
  • FIG. 4 is a diagram showing an example of an image for which metadata is to be extracted.
  • FIG. 5 is a diagram showing an example of a table required for performing various inferences.
  • FIG. 6 is a diagram showing an associative word inference model according to the present embodiment.
  • FIG. 7 is a block configuration diagram of the image identification system.
  • FIG. 8 is a flowchart showing the processing operation of the metadata extraction system.
  • FIG. 9 is a diagram schematically showing the processing operation from the extraction of the metadata to the storage.
  • FIG. 10 is a diagram for explaining the operation of the image specifying system.
  • FIG. 11 is a diagram showing another example of the comprehensive inference model according to the present embodiment.
  • FIG. 1 shows the overall configuration of the metadata extraction system 100.
  • the metadata extraction system 100 includes a user terminal 10 that can access the public communication network 50, a metadata extraction device 1 connected to the public communication network 50, and an image identification system 2.
  • the public communication network 50 is an Internet communication network or the like, but when operating in a narrow area such as in the company, it may be configured by a LAN (Local Area Network). Further, the public communication network 50 may be configured by a so-called optical fiber communication network. Further, the public communication network 50 is not limited to the wired communication network, and may be realized by a wireless communication network.
  • the user terminal 10 can visually recognize an image of an information medium such as a personal computer (PC), a smartphone, a tablet terminal, a mobile phone, a wearable terminal, a digital camera, or the like, or can capture an image of the information medium. Includes all possible electronic devices.
  • the user terminal 10 captures an image A1 of an information medium by a digital camera mounted on the user terminal 10 or acquires an image A1 of an information medium from a server or the like (not shown) via a public communication network 50.
  • the information medium referred to here refers to all media showing information such as pamphlets, catalogs, company information and advertisements, and various documents and explanatory materials.
  • the image A1 is composed of such an information medium as image data.
  • the user terminal 10 transmits the image A1 thus acquired to the metadata extraction device 1 via the public communication network 50.
  • the metadata extraction device 1 is a device that extracts metadata about an image A1 received from a user terminal 10 via a public communication network 50, and is composed of, for example, a PC, a server, or the like. In this metadata extraction device 1, metadata is associated with each image A1.
  • the metadata referred to here is an image of an information medium in which the attribute values of the determined attributes are written, and for example, if it is an image of a catalog of a car for sale, it is displayed in the image.
  • the content of the picture of the car that is, the information indicating that the content of the picture is "car"
  • the car There is metadata of incidental information obtained from photographs, such as the color of the car, the shape of the car, the atmosphere of the car, and the image that seems to be fast.
  • This metadata that is, the textualized version of these incidental information, becomes the metadata associated with the image A1 of the car as an information medium.
  • this metadata may be composed of information indicating what kind of information medium is in the first place. For example, as the type of information medium, pamphlets, roentgen photographs, company information, catalogs, leaflets for advertising, financial statements, etc. It may be composed of information indicating the type of.
  • the textual information that can be read from the characters such as the inn name and room rate displayed in the image, the inn's address and contact information, and the check-in to check-out time.
  • the content of the picture of the open-air bath that is, the information indicating that the content of the picture is "open-air bath"
  • the comfort and warmth created by the steam from the picture of the open-air bath and the surroundings of the open-air bath
  • metadata of incidental information obtained from photographs, such as the colors and beauty created from harmony with the landscape, and the menu of dishes that can be extracted from photographs of meals.
  • This metadata that is, the textualized version of these incidental information, becomes the metadata associated with the image A1 of the pamphlet of the hot spring as an information medium.
  • the metadata extraction device 1 collects an image A1 related to an information medium via the user terminal 10, and sequentially executes a processing operation of associating the metadata for each of the images A1.
  • the metadata extraction device 1 builds a database in which each metadata is associated with the image A1 by storing each of the images A1 to which the metadata is associated.
  • the image identification system 2 is a system for specifying an image desired by a user by referring to a database in which the relationship between the image A1 and the metadata is related, which is constructed in the metadata extraction device 1.
  • the image specifying system 2 receives a conversational sentence directly from the user or from the user terminal 10. Then, the image specifying system 2 accesses the metadata extraction device 1, identifies an appropriate image corresponding to the received conversational sentence, displays it directly to the user, or transmits it to the user terminal 10.
  • the user who operates the user terminal 10 can acquire a desired image sent from the image specifying system 2 by inputting an image to be viewed by voice.
  • FIG. 2 shows a block configuration of the metadata extraction device 1.
  • the metadata extraction device 1 includes a central control unit 20, an execution unit 3 and an auxiliary storage unit 4 connected to the central control unit 20, respectively.
  • the central control unit 20 is, for example, a CPU (Central Processing Unit), and executes processing by calling a program stored in the execution unit 3.
  • the central control unit 20 controls each component mounted in the metadata extraction device 1.
  • the execution unit 3 includes an image acquisition unit 5, a feature map generation unit 6, a general inference unit 7, a linking unit 8, and an extraction unit 9, respectively.
  • the execution unit 3 is configured by a RAM (Random Access Memory)
  • a program corresponding to the configurations of the image acquisition unit 5 to the extraction unit 9 is stored in the execution unit 3.
  • the image acquisition unit 5 acquires the image A1 received from the user terminal 10 via the public communication network 50.
  • the feature map generation unit 6 generates a feature map 30, which will be described later, which is data obtained by extracting features from the image A1.
  • the feature map 30 is composed of a pixel unit or, for example, a block area unit which is an aggregate of a plurality of pixels, and features of an analysis image using a deep learning technique as necessary based on a well-known image analysis. The amount is reflected on the two-dimensional image. As a result, for example, it is possible to obtain a feature map 30 in which a featured portion of an object to be discriminated from a photograph image is highlighted.
  • the general inference unit 7 uses one general inference model DB1 composed of one or more individual inference models DB11 to DB16, which will be described later, for the feature map 30 generated in the feature map generation unit 6. Using the general inference model DB1, the general inference unit 7 infers at least the content of at least one object in the image A1 and generates metadata.
  • the associating unit 8 associates one or a plurality of associative words with the feature label as the overall inference result, which is the result of inference by the overall inference unit 7.
  • the associative word is a word associated with the word of the feature label. For example, if the feature label is "car”, “vehicle”, “heavy”, etc. are associative words, and the feature label is "hot spring". For example, “warm”, “sulfur”, “alkaline”, “steam”, “recreation”, etc. are associative words.
  • a plurality of associative words associated with each of such feature labels is the associative model set TB2, which is stored in the auxiliary storage unit 4.
  • the auxiliary storage unit 4 is, for example, an SSD (Solid State Drive) or an HDD (Hard Disk Drive), and is a table storage unit 11, a corpus storage unit 14, an entity storage unit 15, an image data storage unit 26, and a meta. It includes a data storage unit 27.
  • SSD Solid State Drive
  • HDD Hard Disk Drive
  • One or two or more tables are stored in the table storage unit 11.
  • the comprehensive inference model DB1 stored in the auxiliary storage unit 4 the comprehensive inference result table TB1, and the associative model set TB2 are stored.
  • FIG 3 shows an example of the comprehensive inference model DB.
  • the comprehensive inference model DB1 is composed of one or more individual inference models DB11 to DB15.
  • the inputs of the individual inference models DB11 to DB15 are common feature maps, and the output is the individual inference result.
  • the individual inference result is composed of text data consisting of words indicating the inferred result.
  • the input data and the output data are related to each other through the degree of association.
  • the degree of association indicates the degree of connection between the input data and the output data. For example, it can be determined that the higher the degree of association, the stronger the connection of each data.
  • the degree of association is indicated by, for example, three or more values such as percentages or three or more stages, and may be associated with two values or two stages.
  • the individual inference models DB11 to DB15 generate a plurality of feature maps for learning and a plurality of individual inference results for learning as a learning data set by machine learning.
  • machine learning for example, a convolutional neural network (CNN) is used, and for example, deep learning is applied.
  • CNN convolutional neural network
  • Each individual inference model DB11 to DB15 is composed of a learning data set that infers the content, color, characters, etc. of an object displayed on an image of an information medium from a feature map.
  • the individual inference models DB11, 12, and 14 have "car” and "tree” as the contents of the object displayed in the image of the information medium.
  • the individual inference model DB 13 is a model for inferring the color of the object projected on the image, and the individual inference model DB 15 is projected on the image. It is used as a model for inferring existing character strings.
  • the individual inference model is independently provided for each element consisting of the content of the object, the color of the object, the character string projected on the image, and the like.
  • the individual inference result is expressed by the inference probability, which is the probability that the search solution is inferred, such as car: 96.999%, tree: 2.01%, for example, when the individual inference model DB11 is taken as an example. May be.
  • the general inference model DB 1 may input one feature map 30 and output only a search solution of, for example, 80% or more as a threshold value of the class field probability as a general inference result. Further, when there is no search solution above the threshold value of the inference probability and only the search solution below the threshold value exists, a plurality of search solutions may be output. For example, for DB13 and DB15, a plurality of search solutions are output as a comprehensive inference result.
  • the individual inference results represented by the text data obtained as the search solutions of the individual individual inference models DB11 to DB15 are used as the comprehensive inference results.
  • the general inference result may be stored in the table storage unit 11 after being made into a table as shown in the general inference result table TB1.
  • the comprehensive inference result table TB1 is composed of an image of an information medium forming a feature map and a table in which the comprehensive inference results are linked to each other.
  • the associative model set TB2 is configured as a table in which feature labels and associative words are linked to each other.
  • the feature label here is the text data "car”, “tree”, “red”, “green”, “fence”, “company A”, “lineup A” as the search solution output as the above-mentioned general inference result. "Is applicable to this.
  • associative words are words that can be associated with these feature labels. For example, if the feature label is “car”, there are associative words such as “heavy” and “vehicle”. If the feature label is “tree”, there are “resources”, “nature”, etc. as associative words. Further, if “Lineup A”, which is a lineup for sale by a certain automobile manufacturer, is a feature label, “car” and “manufacturer” can be mentioned as associative words. If the feature label is "red”, the associative word is "color”.
  • the associative model set TB2 in which associative words are associated with such a feature label in a one-to-one or one-to-many relationship is set in advance on the system side (operator side), and this is set in the table storage unit. Stored in 11. By preparing such an associative model set TB in advance, it is possible to output one or more associative words associated with the feature label when the feature label is input.
  • this associative model set TB2 is not limited to the case where the feature label and the associative word are represented by two values of whether or not they are associated with each other, and they are related to each other as shown in FIG. It may be associated.
  • the input is a feature label and the output is an associative word.
  • the feature label and the associative word are associated with each other through the degree of association.
  • the degree of association indicates the degree of connection between the input data and the output data. For example, it can be determined that the higher the degree of association, the stronger the connection of each data.
  • the degree of association is indicated by, for example, three or more values such as percentages or three or more stages, and may be indicated by two values or two stages.
  • Each node of the hidden layer constituting such a degree of association may be composed of a node of a neural network.
  • the feature label and the associative word are generated as a learning data set by machine learning.
  • machine learning for example, a convolutional neural network is used, and for example, deep learning is applied.
  • the entity storage unit 15 stores one or more entities and entity values.
  • An entity is one or more words associated with conversational sentence information.
  • a word is a unit that constitutes a sentence.
  • a word may be simply called, for example, a "word” or a "word”, or may be considered as a kind of morpheme (for example, an independent word described later).
  • entity storage unit 15 As shown in the entity table TB3 shown in FIG. 5, for example, one or more entity values are stored in association with each one or more entities.
  • entity value is a character string that embodies the entity.
  • the entity usually corresponds to any one or more conversational text information in one or more conversational text information stored in the corpus storage unit 14. Therefore, in the entity storage unit 15, for example, one or two or more entities may be stored for each one or more conversational sentence information stored in the corpus storage unit 14.
  • the entity value is associated with that entity in the associative model set TB2.
  • those including "car” in the feature label and the associative word are associated with “car", “lineup A”, “heavy”, and “vehicle”. Therefore, when “car” is used as an entity, the entity values are “car”, “lineup A”, “heavy”, and “vehicle”.
  • the entity values are "sweet”, “delicious”, “crop”, “breeding", "in the soil” and the like.
  • the entity table TB3 such an entity and the entity value are associated with each other on a one-to-one basis or one-to-many basis. Therefore, the entity related to the entity can be derived from the entity value via the entity table TB3, and the entity value can be derived from the entity.
  • the feature label or the associative word which is a character string having a large number of search results searched for the feature label and the associative word, is set as the entity, and the search result is set.
  • the other character string with a small number of cases may be set as the entity value.
  • an image is further associated with the entity and the entity value associated with each other and stored.
  • the entity and the entity value are extracted from the associative model set TB2 as described above, and the feature label in the associative model set TB is associated with the image in the comprehensive inference result table TB1. Therefore, it is possible to associate an image with an entity and an entity value from the correspondence relationship associated with each other. As a result, as shown in FIG. 5, it is possible to store the relationship between the entity and the entity value associated with each image A1, A2, A3, ....
  • the word that constitutes the entity or entity value may be, for example, a collocation.
  • a collocation is a word that expresses a certain meaning by connecting two or more autonomous words, and may be called a compound word.
  • the collocations are, for example, "hot spring inn” which is a combination of “hot spring” and “inn”, "lineup A” which is a combination of “lineup” and “A”, etc.
  • any set of two or more words may be used.
  • the day conversion information is information for converting a day word into a date.
  • a day word is a word about a day.
  • a day word is usually a word associated with the entity name "date entity”, for example, "last month”, “yesterday”, “last week”, “this year”, “this month”, “last year”, “previous term”, For example, "this year”, but any information that can be converted into a date may be used.
  • the day conversion information includes a day word and day information acquisition information.
  • the day information acquisition information is information for acquiring day information.
  • the day information is information about the day corresponding to the day word, and is information used when composing inquiry information.
  • the day information may be, for example, information indicating a date such as "April 1" or information indicating a period from a start date to an end date such as "4/1 to 4/30", and is limited thereto. It is not something that will be done.
  • the day information acquisition information is, for example, a function name or a method name, but may be API information, may be a program itself, and is not limited thereto.
  • the day information acquisition information for the day word "last month” for example, the current time information (for example, "May 10, 2020 11:15”: the same applies hereinafter) is acquired, and the current time information has.
  • Obtain the previous month (for example, "April") for a month (for example, "May") refer to the calendar information of the previous month, and perform day information (for example, the first day to the last day of the previous month).
  • a program or the like for acquiring "4/1 to 4/30" etc. may be used.
  • day information acquisition information for the day word "this year" for example, the current time information is acquired, and the calendar information of the year (for example, "2020") possessed by the current time information is referred to from the first day of the year.
  • API information or the like for acquiring day information for example, "2020/1/1 to 2020/5/10" up to the day of the current time information may be used.
  • day information acquisition information for the day word "yesterday” is a method of acquiring the current time information and acquiring the day information of the day before the day of the current time information (for example, "5/9"), or The method name or the like may be used.
  • the corpus storage unit 14 stores one or more conversational sentence information.
  • Conversational sentence information is information on conversational sentences.
  • Conversational sentence information is usually an example sentence of a conversational sentence. Examples of sentences are, for example, "show me a pamphlet of a hot spring inn" and "show me a catalog of cars of XX brands", but are not limited to these.
  • the conversation text information may be a conversation text template.
  • Templates are, for example, "Do you have an image of a ⁇ car ⁇ ?", "Show me a ⁇ pamphlet ⁇ of a ⁇ vehicle ⁇ ", “Tell me the ⁇ information ⁇ of a ⁇ car ⁇ of a ⁇ maker ⁇ ", " ⁇ company ⁇ ”
  • the information expressed by " ⁇ ”, " ⁇ ” such as ⁇ car ⁇ included in the template is an entity, that is, a variable.
  • Conversational text information usually corresponds to the intent.
  • the intent can also be said to be information for specifying the processing operation.
  • processing operations are, for example, "image search”, “pamphlet search”, “image information search”, etc., and by being stored in the corpus table TB4 in association with the information specifying each of these processing operations. There may be.
  • the corpus storage unit 14 for example, one or two or more conversational sentence information is stored for each one or more intents stored in the intent storage unit 12.
  • FIG. 5 shows an example of the corpus table TB4 relating to the conversational sentence information (corpus) associated with each intent.
  • the corpus table TB4 usually stores one or more entity information for each one or more stored conversational sentence information.
  • the entity information is information about each one or more entities corresponding to one conversational sentence information.
  • the entity information has the start position, end position, and entity name of each entity in addition to the above-mentioned entity and entity value.
  • the start position here is the position where the entity starts in the conversation text information.
  • the start position is represented by, for example, a value (for example, "1", "4", etc.) indicating which character the first character of the entity is in the character string constituting the conversation sentence.
  • the end position is the position where the entity ends in the conversation text information, and is, for example, a value indicating the number of the last character of the entity (for example, "2", "5", etc.). ).
  • the expression format of the start position and the end position is not limited to these.
  • the start position and the end position may be referred to as offsets. Further, the offset may be expressed by the number of bytes, and is not limited to this.
  • the entity name is the name of the entity.
  • the entity name is, for example, "object entity”, “date entity”, “information entity”, etc., but the format is not limited to these as long as it is information that can express the attributes of the entity.
  • the object entity is an entity related to the object, such as ⁇ vehicle ⁇ , ⁇ car ⁇ , ⁇ company ⁇ , etc. in Table 1.
  • a date entity is an entity related to a date.
  • An information entity is an entity related to the required information.
  • the entity information may have, for example, an entity name and an order information when the conversation text information is a template.
  • the order information is a value indicating which variable the entity name corresponds to in one or more variables included in the template.
  • the structure of the entity information is not limited to this.
  • the corpus in the embodiment may be considered as each of one or more conversational sentence information stored in the corpus storage unit 14, for example, and corresponds to one or more conversational sentence information and each conversational sentence information. It can also be thought of as a set of entity information.
  • the above-mentioned associative model set TB2, entity table TB3, and corpus table TB4 may be, for example, a tabular database.
  • one or more item names are registered in the table, and one or two or more values are registered for each one or more item names.
  • the item name may be referred to as an attribute name, and each value of 1 or more corresponding to one item name may be referred to as an attribute value.
  • the table is, for example, a relational database table, TSV, Excel, CSV, etc., but the type thereof is not limited to these.
  • the image data storage unit 26 is an area for storing an image acquired via the image acquisition unit 5. Each image is associated with each parameter in the general inference result table TB1 and the entity table TB3, and this is stored in the image data storage unit 26 so that the associated image can be read out immediately. back.
  • each model DB1 and tables TB1 to TB4 stored in the auxiliary storage unit 4 are read out and referred to when the central control unit 20 executes various processing operations on the execution unit 3.
  • a feature label is newly extracted as metadata in the general inference unit 7 based on the image acquired in the image acquisition unit 5, or a string is used. It may be updated every time a new associative word is derived in the attachment unit 8.
  • the extraction unit 9 compares the entity extracted by the image specifying system 2 described later with at least one of the feature label and the associative word, and at least a feature label and the associative word that at least partially match the extracted entity. Extract the image associated with one. In such a case, each model DB1 and tables TB1 to TB4 described above are referred to.
  • the extraction unit 9 transmits the extracted image to the user terminal 10, and the user terminal 10 displays the image.
  • the configuration diagram 7 of the image identification device is a block diagram of the image identification system 2.
  • the image identification system 2 includes a storage unit 19, a reception unit 29, a processing unit 39, and an output unit 49.
  • the storage unit 19 includes a table storage unit 11, an intent storage unit 12, an API information storage unit 13, a corpus storage unit 14, an entity storage unit 15, and a day conversion information storage unit 18.
  • the reception unit 29 includes a conversation sentence reception means 21.
  • the conversation sentence receiving means 21 includes a voice receiving means 211 and a voice recognizing means 212.
  • the processing unit 39 includes a parameterization unit 30, an intent determination unit 31, a conversational sentence information determination unit 32, an entity acquisition unit 33, a parameter acquisition unit 34, an API information acquisition unit 35, an inquiry information configuration unit 36, and a search result acquisition unit.
  • Means 37 is provided.
  • the parameter acquisition unit 34 includes a determination unit 341, a day information acquisition unit 342, an entity name acquisition unit 343, a translation item name acquisition unit 344, a table identifier acquisition unit 345, a primary key identifier acquisition unit 346, and a conversion parameter acquisition unit 347. ..
  • the output unit 4 includes a search result output means 41.
  • the storage unit 19 is a database that stores various types of information.
  • the various types of information include, for example, tables, intents, API information, corpora, entities, entity mapping information, PK items, and day conversion information. Information on tables and the like will be described later. In addition, other information will be explained in a timely manner.
  • the table storage unit 11 stores the same table as the general inference model DB1, the general inference result table TB1, and the associative model set TB2 stored in the table storage unit 11 in the metadata extraction device 1. In the image identification system 2, if the table storage unit 11 in the metadata extraction device 1 is used, the table storage unit 11 may be omitted on the image identification system 2 side.
  • the intent is information managed for each image-specific processing, and can be said to be information for specifying an image-specific processing operation.
  • the intent usually has a process operation name that identifies the business process.
  • the processing operation name is the name of the processing operation.
  • the processing operation is usually a business process executed via API. However, the processing operation may be, for example, a business process executed according to the SQL statement.
  • processing operation name usually corresponds to the API information described later. Therefore, it may be considered that the intent is associated with the API information, for example, via the processing operation name.
  • API information is information about API.
  • API is an interface for using the functions of a program.
  • APIs are software such as, for example, functions, methods, or execution modules.
  • the API is, for example, a Web API, but other APIs may be used.
  • the Web API is an API constructed by using a Web communication protocol such as HTTP or HTTPS. Since APIs such as WebAPI are known techniques, detailed description thereof will be omitted.
  • API information is information that corresponds to the intent. As described above, the API information corresponds to the intent, for example, through the processing operation name.
  • API information is usually information for searching an imaged information medium.
  • the API information may be, for example, information for registering information or performing processing based on the information.
  • API information has one or more parameter-specific information.
  • the parameter specific information is information that specifies a parameter. It may be said that the parameter is a value having a specific attribute. The value is usually a variable. Variables can also be called arguments.
  • the parameter is usually the information obtained by converting the entity, but it may be the entity itself.
  • the parameters are, for example, arguments given to the API or variables in the SQL statement.
  • the parameter may be composed of, for example, a set of an attribute name and a value.
  • the parameter specific information is, for example, a parameter name.
  • the parameter name is the name of the parameter.
  • the parameter-specific information is, for example, an attribute name, but any information that can specify the parameter may be used.
  • the corpus storage unit 14 stores a table similar to the corpus table TB4 stored in the corpus storage unit 14 in the metadata extraction device 1. In the image identification system 2, if the corpus storage unit 14 in the metadata extraction device 1 is used, the corpus storage unit 14 may be omitted on the image identification system 2 side.
  • the entity storage unit 15 stores the same table as the entity table TB3 stored in the entity storage unit 15 in the metadata extraction device 1. In the image identification system 2, if the entity storage unit 15 in the metadata extraction device 1 is used, the entity storage unit 15 may be omitted on the image identification system 2 side.
  • the reception unit 29 receives various information.
  • the various types of information are, for example, conversational sentences.
  • the reception unit 29 receives information such as conversational sentences from a terminal, for example, but may receive information via an input device such as a keyboard, a touch panel, or a microphone.
  • the reception unit 29 may receive information read from a recording medium such as a disk or a semiconductor memory, and the mode of reception is not particularly limited.
  • the conversation sentence receiving means 21 accepts a conversation sentence.
  • a conversational sentence is a sentence in which a person speaks, and can be said to be a sentence in natural language.
  • the reception of conversational sentences is, for example, reception by voice, but reception by text may also be used.
  • Voice is a voice made by a person.
  • a text is a character string that is voice-recognized a voice uttered by a person.
  • a character string consists of an array of one or more characters.
  • the voice receiving means 211 receives the voice of the conversational sentence.
  • the voice receiving means 211 receives the voice of the conversation sentence from the terminal, for example, in pairs with the terminal identifier, but may receive the voice via the microphone.
  • the terminal identifier is information that identifies the terminal.
  • the terminal identifier is, for example, a MAC address, an IP address, an ID, or the like, but any information that can identify the terminal may be used.
  • the terminal identifier may be a user identifier that identifies the user of the terminal.
  • the user identifier is, for example, an e-mail address, a telephone number, or the like, but may be an ID, an address, a name, or the like, and may be any information that can identify the user.
  • the voice recognition means 212 performs voice recognition processing on the voice received by the voice reception means 211, and acquires a conversational sentence which is a character string.
  • the voice recognition process is a known technique, and detailed description thereof will be omitted.
  • the processing unit 39 performs various processes.
  • the various processes include, for example, parameterization means 30, intent determination means 31, conversational sentence information determination means 32, entity acquisition unit 33, parameter acquisition unit 34, API information acquisition means 35, inquiry information configuration unit 36, and search. Processing of result acquisition means 37, determination means 341, day information acquisition means 342, entity name acquisition means 343, translation item name acquisition means 344, table identifier acquisition means 345, primary key identifier acquisition means 346, conversion parameter acquisition means 347, and the like.
  • the various processes include, for example, various discriminations described in the flowchart.
  • the processing unit 39 processes, for example, the parameterization means 30 and the intent determination means 31 in response to the conversation sentence reception means 21 receiving the conversation sentence.
  • the processing unit 39 performs processing by the intent determination means 31 or the like for each one or more terminal identifiers.
  • the parameterizing means 30 parameterizes one or more entities included in one or more conversational sentences received by the conversational sentence receiving means 21.
  • the parameterizing means 30 may parameterize the entity corresponding to the conversational sentence information determined by the conversational sentence information determining means 32.
  • the parameterizing means 30 parameterizes an entity included in a conversational sentence input as voice, as an example, an independent word. For example, comparing the conversational sentence "Is there an image of lineup A?” And the conversational sentence "Is there an image of lineup A?”, There is no difference between the two conversational sentences except that the particles are interchanged. .. In spite of this, in the search results so far, there are cases where conversational sentences with different meanings are not always recognized as having the same meaning. Therefore, the parameterizing means 30 parameterizes the independent words "lineup A" and "image” included in these conversational sentences, that is, the entities.
  • the intent determination means 31 determines the intent corresponding to the conversation sentence received by the conversation sentence reception means 21.
  • the intent determination means 31 first acquires, for example, the text corresponding to the conversation sentence received by the conversation sentence reception means 21.
  • the text is, for example, the result of voice recognition of the conversational sentence received by the conversational sentence receiving means 21, but may be the conversational sentence itself received by the conversational sentence receiving means 21.
  • the intent determination means 31 voice-recognizes the conversation sentence and acquires the text.
  • the intent determination means 31 may acquire the text.
  • the intent determining means 31 acquires one or more independent words from the acquired text by, for example, performing morphological analysis and syntactic analysis.
  • the morphological analysis is a known technique, and detailed description thereof will be omitted.
  • the intent determining means 31 determines an intent having a processing operation name having a word that is the same as or similar to the acquired one or more independent words.
  • a synonym dictionary is stored in the storage unit 1.
  • a synonym dictionary is a dictionary related to synonyms.
  • a word possessed by the processing action name and one or more synonyms of the word are included. It is registered.
  • the intent determination means 31 uses the conversation sentence as "maker ⁇ " or "car”. Or "selling price information", etc., one or more independent words are acquired, the intent storage unit 12 is searched using each independent word as a key, and whether or not there is an intent having a processing operation name that matches the independent word. Judge. The match is, for example, an exact match, but may be a partial match. Then, when there is an intent that has a word that matches the independent word and specifies the processing operation, the intent determining means 31 determines the intent.
  • the intent determining means 31 is one of one or more synonyms corresponding to the independent word from the synonym dictionary.
  • the synonym is acquired, the intent storage unit 12 is searched using the one synonym as a key, and it is determined whether or not there is an intent having a processing operation name having a word matching the one synonym. Then, when there is an intent having a processing operation name having a word matching the one synonym, the intent determining means 31 determines the intent. If there is no such intent, the intent determining means 31 performs the same processing for other synonyms to determine the intent. For any synonym, if there is no such intent, the intent determining means 31 may output that the intent is not determined.
  • the conversation sentence information determining means 32 searches the corpus storage unit 14 using the intent determined by the intent determining means 31 as a key, and the conversation sentence receiving means 21 is selected from one or more conversation sentence information corresponding to the intent. Determines the conversational sentence information that most closely matches the conversational sentence received by.
  • the conversational sentence information that most closely resembles the conversational sentence is, for example, the conversational sentence information that has the highest degree of similarity to the conversational sentence. That is, the conversation sentence information determining means 32 calculates, for example, the degree of similarity between the accepted conversation sentence and each one or more conversation sentence information corresponding to the determined intent, and the conversation sentence information having the maximum similarity degree. To decide.
  • the conversation sentence information determining means 32 may search for a conversation template that matches the template in which the position of the noun of the accepted conversation sentence is used as a variable. That is, the corpus storage unit 14 stores a template in which one or more entity names are used as variables, and the conversation sentence information determining means 32 stores each entity name of one or two or more of the accepted conversation sentences. Acquires the position of, and determines the template corresponding to the position of the acquired entity name as conversational sentence information. The position of each one or more entity names in the conversation sentence is information indicating the number of the entity name in the template having one or more entity names.
  • the entity acquisition unit 33 corresponds to one or more entities corresponding to the conversational sentence information determined by the conversational sentence information determining means 32, and is one or more words included in the conversational sentence received by the conversational sentence receiving means 21. Get an entity.
  • the entity acquisition unit 33 acquires the start position and end position of the entity from the corpus storage unit 14 for each one or more entities corresponding to the determined conversation text information, and from the received conversation text, the entity acquisition unit 33 obtains the start position and the end position of the entity. Acquires the word specified by the start position and the end position.
  • the parameter acquisition unit 34 acquires one or more parameters corresponding to each of the one or more entities acquired by the entity acquisition unit 33.
  • the acquired parameter is, for example, the acquired entity itself, but it may be information obtained by converting the acquired entity. That is, for example, when the acquired day word is included in one or more entities, the parameter acquisition unit 34 converts the day word into the parameter day information.
  • the determination means 341 constituting the parameter acquisition unit 34 determines whether or not the day word exists in one or more entities acquired by the entity acquisition unit 33. Specifically, for example, one or two or more day words are stored in the storage unit 1, and the determination means 341 matches any of the stored day words for each of the acquired one or more entities. Whether or not it is determined is performed, and when the determination results for at least one entity show a match, it is determined that the day word exists in the acquired one or more entities.
  • the day information acquisition means 342 acquires the day conversion information corresponding to the day word from the day conversion information storage unit 18. , The day information which is a parameter is acquired by using the day conversion information.
  • the Japanese word "last year” and the like are stored in the storage unit 1, and the conversational sentence "Show me the last year's pamphlet of brand B" is accepted, and the three entities "brand B",
  • the determination means 341 determines that the day word exists in the acquired 3 entities because the entity "last month” matches the day word "last month”. do.
  • the current time information is acquired, and the day information (for example, "4/1 to 4/30", etc.) is acquired.
  • the day information acquisition means 342 acquires the day information acquisition information (for example, a program) corresponding to the day word "last year” from the day conversion information storage unit 18. Then, the day information acquisition means 342 uses the day information acquisition information to obtain the current time information (for example, “May 10, 2020 11:15”) from the built-in clock of the MPU (Micro Processing Unit), the NTP server, or the like. Is acquired, and the previous month (for example, "2019”) is acquired for the year (for example, "2020”) that the current time information has. Then, the day information acquisition means 342 acquires the day information "January 1, 2019-December 31, 2019" from the first day to the last day of the previous year by referring to the calendar information of the previous month. do.
  • the day information acquisition information for example, a program
  • the day information acquisition means 342 stores the day information acquisition information (for example, API information) corresponding to the day word "this year” in the day conversion information storage unit. Obtained from 18. Then, the day information acquisition means 342 acquires the current time information from the built-in clock or the like by using the day information acquisition information, and refers to the calendar information of the year (for example, “2020”) possessed by the current time information. , Acquires day information (for example, "January 1, 2020 to December 31, 2020”) from the first day of the year to the day of the current time information.
  • day information acquisition information for example, API information
  • the day information acquisition means 342 acquires the day information acquisition information (for example, a method) corresponding to the day word "yesterday” from the day conversion information storage unit 18. .. Then, the day information acquisition means 342 acquires the current time information from the built-in clock or the like by using the day information acquisition information, and the day information of the day before the day that the current time information has (for example, "5/9"). ”) Is acquired.
  • the day information acquisition information for example, a method
  • the entity name acquisition means 343 acquires the entity name corresponding to the entity from the entity storage unit 15 for each one or more entities acquired by the entity acquisition unit 33.
  • the entity name corresponding to the entity is an entity name paired with a start position and an end position that match or are similar to the position of the entity corresponding to the entity in the conversation sentence acquired by the entity.
  • the entity name acquisition means 343 acquires the entity name corresponding to the entity from the entity storage unit 15 for each one or more entities acquired by the entity acquisition unit 33, for example, using the entity information associated with the entity. You may.
  • the entity name acquisition means 343 is stored in association with the conversational text information in the conversational text information "Tell me the selling price information of the car of the manufacturer XX" stored in the corpus storage unit 14.
  • the two entity information using the first entity information that has the same start and end positions as "maker ⁇ " in the accepted conversation "Tell me the selling price information of the maker ⁇ 's car", Acquire the "maker entity” corresponding to "maker ⁇ ".
  • the entity name acquisition means 343 has, for example, two of the above three entity information having the same start position and end position as the "car” in the conversation sentence "Tell me the selling price information of the car of the manufacturer XX". Using the entity information of the eyes, acquire the "car entity” corresponding to the "car”, and start the same as the "sales price information" in the conversation "Tell me the sales price information of the car of the manufacturer ⁇ ". The "information entity” corresponding to the "sales price information” is acquired by using the third entity information having the position and the end position.
  • the API information acquisition means 35 acquires API information corresponding to the intent determined by the intent determination means 31 from the API information storage unit 13.
  • the API information acquisition means 35 acquires, for example, API information having a processing operation name corresponding to the intent determined by the intent determination means 31 from the API information storage unit 13.
  • the processing operation name "image information search” and three or more parameter specific information "company code, Cope_code”, “vehicle code, Car_code”, and “information code, Info_code” are stored.
  • the API information acquisition means 35 has the processing operation name "image information” possessed by the intent. Acquire API information 1 having "search”.
  • the inquiry information configuration unit 36 configures inquiry information by using one or more parameters acquired by the parameter acquisition unit 34 and the API information acquired by the API information acquisition means 35.
  • Inquiry information is information for information retrieval, and is usually feasible information.
  • the query information is, for example, a function or method in which an argument is inserted, but it may be a completed SQL statement or a set of URL and parameter.
  • the inquiry information configuration unit 36 is, for example, a parameter corresponding to each location at each of one or more variables possessed by the API information acquired by the API information acquisition means 35, and arranges the parameters acquired by the parameter acquisition unit 34. By doing so, the inquiry information is constructed.
  • the search result acquisition means 37 executes the inquiry information configured by the inquiry information configuration unit 36, and acquires the search result by searching the storage unit 1 (database) using the parameters obtained by the parameterization means 30. do. Further, the search result acquisition means 37 may generate API information including the parameters obtained by the parameterization means 30 and search the storage unit 1 (database) based on the generated API information. That is, API information may be generated by writing a new parameter or by rewriting a parameter that has already been written to a new parameter, and the database may be searched based on the API information that reflects the parameter. Information for inquiries such as API information and SQL, and detailed operations of the search result acquisition means 37 will be described with specific examples and modified examples.
  • the output unit 49 outputs various information.
  • the various types of information are, for example, images of the searched information medium.
  • the output unit 49 obtains information such as a search result, which is the result of various processing performed by the processing unit 39 in response to the reception unit 29 receiving information such as a conversation sentence in pairs with the terminal identifier. Send to the terminal identified by the identifier.
  • the output unit 49 outputs information such as a search result to an output device such as a display or a speaker. It may be output via.
  • the output unit 49 may print out various types of information with a printer, store the information in a recording medium, pass it on to another program, or send it to an external device, and the output unit 49 may output the information.
  • the embodiment is not particularly limited.
  • the search result output means 41 outputs the search result acquired via the search result acquisition means 37.
  • the search result output means 41 transmits, for example, an image as a search result acquired by the search result acquisition means 37 in response to the conversation sentence receiving means 21 receiving the conversation sentence from the user terminal 10 to the user terminal 10. ..
  • the search result output means 41 displays, for example, an image as a search result acquired by the search result acquisition means 37 in response to the conversation sentence reception means 21 receiving the conversation sentence via an input device such as a microphone. It may be displayed via an output device such as.
  • the storage unit 19, the table storage unit 11, the intent storage unit 12, the API information storage unit 13, the corpus storage unit 14, the entity storage unit 15, and the day conversion information storage unit 18 are non-volatile, for example, a hard disk or a flash memory.
  • a recording medium is suitable, but a volatile recording medium such as a RAM can also be realized.
  • the process of storing information in the storage unit 19 or the like is not particularly limited.
  • information may be stored in the storage unit 19 or the like via a recording medium, or information transmitted via a network, a communication line, or the like may be stored in the storage unit 1 or the like.
  • Well, or the information input via the input device may be stored in the storage unit 1 or the like.
  • the input device may be, for example, a keyboard, a mouse, a touch panel, a microphone, or the like.
  • the reception unit 29, the conversation text reception means 21, the voice reception means 211, and the voice recognition means 212 may or may not include the input device.
  • the reception unit 29 and the like can be realized by the driver software of the input device or by the input device and the driver software thereof. Further, even if the function as the reception unit 29 is implemented in the user terminal 10 and the conversation text information acquired in the user terminal 10 is sent to the image identification system 2 via the public communication network 50. good.
  • Processing unit 39 intent determination means 31, conversational sentence information determination means 32, entity acquisition unit 33, parameter acquisition unit 34, API information acquisition means 35, inquiry information configuration unit 36, search result acquisition means 37, determination means 341,
  • the day information acquisition means 342, the entity name acquisition means 343, the translation item name acquisition means 344, the table identifier acquisition means 345, the primary key identifier acquisition means 346, and the conversion parameter acquisition means 347 are usually used as a CPU (Central Processing Unit) or an MPU. It can be realized from memory or the like.
  • the processing procedure of the processing unit 39 and the like is usually realized by software, and the software is recorded in a recording medium such as ROM. However, the processing procedure may be realized by hardware (dedicated circuit).
  • the output unit 49 and the search result output means 41 may or may not include output devices such as displays and speakers.
  • the output unit 49 and the like can be realized by the driver software of the output device, or by the output device and the driver software thereof.
  • the receiving function of the reception unit 29 or the like is usually realized by a wireless or wired communication means (for example, a communication module such as a NIC (Network interface controller) or a modem), but a means for receiving a broadcast (for example, a broadcast). It may be realized by the receiving module).
  • the transmission function of the output unit 49 or the like is usually realized by a wireless or wired communication means, but may be realized by a broadcasting means (for example, a broadcasting module).
  • Steps S11 to S17 show a processing operation of extracting metadata from an image of each information medium by the metadata extraction device 1.
  • step S11 the image acquisition unit 5 acquires an image of the information medium.
  • the image of the information medium is imaged via the user terminal 10 and received by the metadata extraction device 1 via the public communication network 50, or directly imaged through a camera (not shown) attached to the metadata extraction device. Will be done.
  • step S12 the feature map generation unit 6 generates a feature map for the image acquired in step S11.
  • This feature map is generated on a pixel-by-pixel basis, or, for example, on a block area basis, which is an aggregate of a plurality of pixels. This is done by reflecting it on a two-dimensional image.
  • step S13 the general inference unit 7 extracts the object from the feature map.
  • step S13 the content of each object in the image is inferred from the feature map by using the comprehensive inference model DB1 stored in the table storage unit 11, and metadata is generated.
  • the feature map and the feature amount constituting the feature map are input to the individual inference models DB11 to DB15 constituting the comprehensive inference model DB1 in pixel units or floc area units.
  • each individual inference model DB11 to DB15 has the content (“car”, “tree”, “fence”, etc.), color, and character of the object, respectively.
  • a learning model is built in advance so that each element such as a column can be discriminated. In step S13, these learning models are referred to and the contents of the target are extracted.
  • the process proceeds to step S14, and the associating unit 8 refers to the associative model set TB2 and derives the associative word from the feature label as the overall inference result output in step S13.
  • the associative model set TB2 is stored in a state where the feature label and the associative word are associated with each other. Therefore, by inputting the feature label, one or two or more associative words associated with the feature label can be easily extracted. Through this associative word, in addition to the content of the object that can be extracted from the image, the atmosphere, sensation, image, and emotion that can be recalled from it can be inflated.
  • the feature label "hot spring” is associated with associative words such as “warm”, “sulfur”, “alkaline”, “steam”, and “recreation”. Therefore, when “hot spring” is extracted as a feature label as a result of general reasoning, these associative words “warm”, “sulfur”, “alkaline”, “steam”, “recreation”, etc. related to this are used. In addition, it is possible to inflate the image from the feature label. In this step S14, information posted on the Internet may be used as necessary in deriving this associative word. In such a case, as a result of searching through a search engine using the feature label as a search term, a word having a high frequency of appearance may be taken in as an associative word.
  • step S15 metadata is generated for the image acquired in step S11.
  • This metadata includes the associative word derived in step S14 in addition to the character string of the feature label as the individual inference result described above.
  • the metadata extraction device 1 may associate the metadata consisting of such a feature label and an associative word with an image, and then store the metadata in the metadata storage unit 27, or the associated metadata may be stored.
  • the image may be stored in the image data storage unit 26. At this time, the image may be stored as the image metadata associative label TB5 associated with the metadata (step S16).
  • FIG. 9 schematically shows the processing operation from extraction to storage of such metadata.
  • the newly acquired images A1, A2, and so on are associated with the associative word.
  • an image metadata associative label TB5 in which the feature label and the associative word derived from the feature label are associated with each image A1, A2, ... Is generated, and this is stored in the metadata storage unit 27.
  • each image A1, A2, ... Is associated with the metadata consisting of the feature label and the associative word, and conversely, when such metadata input is accepted, it is associated with this.
  • Images A1, A2, ... can be specified.
  • step S17 the process proceeds to step S17, and the entity table TB3 and the corpus table TB4 are created from the generated metadata (feature label, associative word).
  • the entity table creates a table consisting of the entity and the entity value linked to each other from the feature label and the associative word as described above. At this time, the corresponding images A1, A2, ... Are associated with each entity and the combination of the entity values. As a result, when the image is read out after the fact, it can be realized via the entity, and the convenience of the search can be enhanced.
  • the corpus table is linked with the intent and the conversation text information extracted by the image identification system 2 as described later. TB4 may be generated.
  • FIG. 8 For a method of specifying an image corresponding to the content of the conversation text information based on the entity table TB3, the corpus table TB4, and the image metadata associative label TB5 sequentially created through steps S11 to S17 described above. I will explain in detail while doing so.
  • the image corresponding to the content of the conversation text information is specified by the image specifying system 2 through steps S21 to 26.
  • step S21 the reception unit 29 recognizes the conversational sentence uttered by voice.
  • the recognition of this conversational sentence may be performed from the conversational sentence described in the manually input text data other than the acquisition from the voice.
  • step S22 the reception unit 29 converts the conversational sentence acquired through voice recognition into text data.
  • a known conversion method may be used.
  • step S23 morphological analysis and parsing are performed on the conversational sentences converted into text data.
  • step S24 the process proceeds to step S24, and one or more independent words are acquired from the text data analyzed by morphological analysis and syntax in step S23.
  • the intent determining means 31 determines an intent having an action name having a word that is the same as or similar to one or more acquired independent words. For example, when the conversation sentence is "Is there an image of lineup A?", Two independent words “lineup A” and “image” are acquired from the conversation sentence by morphological analysis, and each independent word is used as a key to store the intent.
  • the corpus table TB4 in the unit 12 is searched, and an intent having "image search" as an action name that partially matches "lineup A" is determined.
  • step S24 the entity included in the conversation sentence is acquired.
  • the acquired entity may be further parameterized. For example, when the conversation sentence is "Is there an image of lineup A?", Two independent words “lineup A” and “image” are acquired from the conversation sentence by morphological analysis, and the entity is "lineup A”. Become.
  • step S25 image identification processing is performed. This image identification process is performed based on the intent determined in step S24 and the extracted entity.
  • the entity is in a state in which an image corresponding to each of the entity and the entity value is associated with the entity and the entity value in the entity table TB3 stored in the entity storage unit 15 described above. Therefore, the entity extracted from the conversation sentence in step S24 is compared with the entity and the entity value in the entity table TB3. Then, the image corresponding to the compared entity and the entity value is specified.
  • the image A1 associated with “lineup A” as the entity value is obtained. Can be identified.
  • the image A3 associated with "Ishiyakiimo” can be specified as an entity by comparing with the entity and the entity value in the entity table TB3.
  • the wording quoted in the entity table TB3 may be either an entity or an entity value.
  • the image metadata associative label TB5 may be referred to as an alternative to the entity table TB3.
  • the image metadata associative label TB5 also stores an image associated with the feature label and the associative word. Therefore, the feature label and the associative word corresponding to the entity extracted from the conversational sentence can be extracted in step S24, and the image associated with the extracted feature label and the associative word can be specified.
  • the image specified in step S25 may further utilize the intent determined in step S24.
  • the intent is associated with conversational sentence information, an entity, and an entity value in the corpus table TB4 stored in the corpus storage unit 14. Then, this entity and the entity value are associated with the image in the entity table TB3. Therefore, by referring to the corpus table TB4 and the entity table TB3 based on the extracted intent, it is possible to realize more accurate image identification.
  • step S25 in the process of specifying the image, the image may be searched via the API.
  • the search result acquisition means 37 may generate API information including the extracted intents, entities, and the like, and perform an image search from the storage unit 1 (database) based on the generated API information. That is, an image search may be performed based on API information that reflects parameters (intents and entities).
  • step S26 After specifying the image, the process proceeds to step S26, and the specified image is displayed.
  • the image When displaying an image to the user of the user terminal 10, the image is transmitted from the image specifying system 2 to the user terminal 10 via the public communication network 50, and the image is displayed via the user terminal 10. Further, the image may be displayed directly from the image specifying system 2, and in such a case, the image is displayed via the output unit 49.
  • imaged information media there are various types of imaged information media, and it is possible to generate metadata that is more convenient for performing an ex post facto search of images of these information media. Can be associated with an image.
  • the phrase "Is there an image of lineup A?” Is acquired by voice in a spoken conversation, it is possible to extract an image of an appropriate information medium corresponding to the received conversation with high accuracy. It is possible to generate and correlate metadata that can achieve this.
  • this metadata includes associative words, in addition to the keyword of the feature label itself, various words that can be associated with it are included. Therefore, at the time of specifying the image, even if such an associative word is included in the conversation sentence, the image can be specified from there.
  • the present invention is not limited to the above-described embodiment.
  • the general inference model DB1 shown in FIG. 3 may further include an individual inference model DB16 in addition to the individual inference models DB11 to DB15.
  • the individual inference model DB 16 is a database for inferring the type of information medium, and for example, the individual inference model DB 16 infers the type through the shape of the information medium.
  • the individual inference model DB 16 determines whether the information medium is, for example, a pamphlet, a catalog, financial statements, an attendance record, or an X-ray photograph.
  • the comprehensive inference result regarding the type of such an information medium is also feature-labeled, and the entity and the entity value are similarly converted, so that the convenience of the ex post facto search can be enhanced.
  • Metadata extraction device 1
  • Image identification system 3
  • Execution unit 4 Auxiliary storage unit 5
  • Image acquisition unit 6
  • Feature map generation unit 7
  • General inference unit 8
  • Linking unit 9
  • Extraction unit 10
  • User terminal 11
  • Table storage unit 12
  • Information Storage unit 14
  • Corpus storage unit 15
  • Entity storage unit 18-day conversion information storage unit
  • Storage unit 20
  • Central control unit 21
  • Conversational text reception unit 26
  • Image data storage unit 27
  • Metadata storage unit 29
  • Parameterization means 30
  • In Tent determination means 32
  • Conversational text information determination means 33
  • Entity acquisition unit 34
  • Parameter acquisition unit 35
  • Information acquisition means 36
  • Inquiry information configuration unit 37
  • Search result acquisition means 39
  • Processing unit 41
  • Search result output means 49
  • Public communication network 100
  • Metadata Extraction system 211
  • Voice reception means 212
  • Judgment means 342
  • Day information acquisition means 343
  • Entity name acquisition means 344

Abstract

[Problem] To extract metadata from information included in an image. [Solution] A metadata extraction program for extracting metadata from information included in an image of an information medium, the metadata extraction program being characterized by causing a computer to execute a feature map generation step for generating a feature map obtained by extracting features from the image of the information medium, and an inference step for referring to one or more individual inference models each associating the feature map with a feature label for each element, and extracting the feature label for each element as metadata from the feature map generated in the feature map generation step.

Description

メタデータ抽出プログラムMetadata extraction program
 本発明は、情報媒体の画像に含まれる情報からメタデータを抽出し、更には受け付けた会話文に対応する適切な画像を特定する上で好適なメタデータ抽出プログラムに関する。 The present invention relates to a metadata extraction program suitable for extracting metadata from information contained in an image of an information medium and further identifying an appropriate image corresponding to a received conversational sentence.
 近年において、パンフレット、カタログ、会社案内や広告、更には各種文書や説明資料等を始めとする情報媒体は何れも画像データ化された上で提供、共有され、更には保存されるようになっている。今後のペーパーレス化に伴い、このような情報媒体の画像が急増することになればその管理は煩雑になり、また利用者が実際に求める情報媒体の画像を探す上でも多大な労力を要することになる。 In recent years, information media such as pamphlets, catalogs, company information and advertisements, as well as various documents and explanatory materials have all been provided, shared, and even stored as image data. There is. If the number of images of such information media increases rapidly with the shift to paperless offices in the future, its management will become complicated, and it will take a lot of labor to search for the images of information media that users actually want. Become.
 このような問題を解決するために、各情報媒体の画像毎にメタデータを関連付けることが行われる。ここでいうメタデータとは、情報媒体の画像に対して、決められた属性についてその属性値を書き表したものである。例えば温泉旅館のパンフレットの画像であれば、その画像中に表示されている旅館名や宿泊料金、旅館の住所や連絡先、チェックインからチェックアウトの時間等、文字から読み取れる文字情報に加え、その露天風呂の写真の内容(つまりその写真の内容が「露天風呂」であることを示す情報)、露天風呂の写真から出ている湯気から醸し出される心地よさや温かさ、更には露天風呂の周囲の風景との調和から醸し出される色彩や美しさ、更には食事の写真から抽出できる料理のメニュー等、写真から得られる付帯情報が存在する。これらの付帯情報をテキスト化したものが、情報媒体としての温泉のパンフレットの画像に関連付けられるメタデータとなる。 In order to solve such a problem, metadata is associated with each image of each information medium. The metadata referred to here is a description of the attribute values of the determined attributes for the image of the information medium. For example, in the case of an image of a hot spring inn brochure, in addition to the textual information that can be read from the characters, such as the inn name and room rate displayed in the image, the address and contact information of the inn, the time from check-in to check-out, etc. The contents of the open-air bath photo (that is, the information indicating that the content of the photo is an "open-air bath"), the comfort and warmth created by the steam from the open-air bath photo, and the scenery around the open-air bath. There is additional information that can be obtained from the photos, such as the colors and beauty created from the harmony of the hot springs, and the menu of dishes that can be extracted from the photos of the meal. The textualized version of these incidental information becomes the metadata associated with the image of the pamphlet of the hot spring as an information medium.
 特にデータ管理を行う上で各情報媒体の画像にメタデータを付すことにより、様々なメリットを享受できる。例えばメタデータを画像に関連付けることにより、情報媒体に関する付帯情報が補填されることから、全体的なデータの品質を向上させることができる。またこのようなメタデータを画像に関連付けることにより、データの分析や検索・抽出を容易に行うことができる。特にメタデータのキーワードを介して検索を行うことにより、これらに紐付けられた情報媒体を容易に閲覧することができ、利便性が非常に高くなる。  Especially when managing data, various merits can be enjoyed by attaching metadata to the images of each information medium. For example, by associating the metadata with the image, incidental information about the information medium is supplemented, so that the overall quality of the data can be improved. Further, by associating such metadata with an image, it is possible to easily analyze, search, and extract the data. In particular, by performing a search via metadata keywords, the information media associated with these can be easily browsed, which greatly enhances convenience. It was
特開2017-68859号公報JP-A-2017-68859
 ところで、このような各情報媒体の画像にメタデータを関連付ける作業は非常に煩雑な作業になる。例えば、画像内にある文字情報はそのまま周知のOCR等の技術を利用することでこれを容易にメタデータ化することができるが、写真についてはこれを即座にメタデータ化することは難しい。特に写真に写し出されている内容物や色彩、更にはその写真から醸し出される雰囲気や感覚をメタデータ化するためには、人間の視覚による判別をするステップを経て手入力でテキストを入力せざるを得ない。特にビジネスにおいて有用性が高いメタデータは人間による定義付けが必要となる場合が多いため、手間がかかるものとなる。 By the way, the work of associating metadata with images of such information media is a very complicated work. For example, the text information in an image can be easily converted into metadata by using a well-known technique such as OCR, but it is difficult to immediately convert this into metadata for a photograph. In particular, in order to convert the contents and colors shown in a photograph, as well as the atmosphere and sensations created by the photograph, into metadata, it is necessary to manually input text through the steps of human visual discrimination. I don't get it. Metadata, which is particularly useful in business, often requires human definition, which is time-consuming.
 これに加えて、画像化された情報媒体の種類は多岐に亘るため、これら情報媒体の画像の事後的な検索を行う上で利便性がより高くなるようなメタデータを生成し、関連付けておく必要がある。特に口語調の会話文で「ラインナップAの画像ある?」という文言が音声により取得された場合に、受け付けた会話文に対応する適切な情報媒体の画像を即座に抽出するためには、その画像に対応するメタデータを関連付けする必要がある。 In addition to this, since there are various types of imaged information media, metadata that is more convenient for post-searching images of these information media is generated and associated with each other. There is a need. In particular, when the phrase "Is there an image of lineup A?" Is acquired by voice in a spoken conversational sentence, in order to immediately extract an image of an appropriate information medium corresponding to the received conversational sentence, that image. You need to associate the corresponding metadata with.
 従来において、画像からメタデータを生成する手法は提案されているものの(例えば、特許文献1参照。)、特に人間の視覚による判別に頼らざるを得ないメタデータの生成には言及がなく、また受け付けた会話文に対して情報媒体の画像の事後的な検索を行う上で利便性がより高くなるようなメタデータを生成や関連付けを行う点については特段の言及がない。 Although a method for generating metadata from an image has been proposed in the past (see, for example, Patent Document 1), there is no mention of generating metadata that must rely on human visual discrimination. There is no particular mention of generating and associating metadata that is more convenient for performing an ex post facto search of images in information media with respect to the received conversational text.
 そこで、本発明は、上述した問題点に鑑みて案出されたものであり、その目的とするところは、画像に含まれる情報からメタデータを抽出するメタデータ抽出プログラムにおいて、特に人間の視覚による判別に頼らざるを得ないような情報媒体の画像からもメタデータを抽出でき、しかも事後的な検索を行う上で利便性がより高くなるようなメタデータを生成や関連付けを行うことが可能なメタデータ抽出プログラム及びシステムを提供することにある。 Therefore, the present invention has been devised in view of the above-mentioned problems, and an object thereof is a metadata extraction program for extracting metadata from information contained in an image, particularly by human vision. It is possible to extract metadata from images of information media that have to rely on discrimination, and it is also possible to generate and associate metadata that is more convenient for subsequent searches. To provide a metadata extraction program and system.
 本発明に係るメタデータ抽出システムは、情報媒体の画像に含まれる情報からメタデータを抽出するメタデータ抽出プログラムにおいて、上記情報媒体の画像から特徴を抽出した特徴マップを生成する特徴マップ生成ステップと、特徴マップと要素毎の特徴ラベルとが互いに関連付けられた1以上の個別推論モデルを参照し、上記特徴マップ生成ステップにおいて生成された特徴マップから上記要素毎の特徴ラベルをメタデータとして抽出する推論ステップとをコンピュータに実行させることを特徴とする。 The metadata extraction system according to the present invention is a metadata extraction program that extracts metadata from information contained in an image of an information medium, and a feature map generation step of generating a feature map that extracts features from the image of the information medium. , Inference that refers to one or more individual inference models in which the feature map and the feature label for each element are related to each other, and extracts the feature label for each element as metadata from the feature map generated in the feature map generation step. It is characterized by having a computer perform steps.
 本発明に係る画像特定プログラムは、会話文を受け付ける会話文受付ステップと、上記会話文受付ステップにおいて受け付けた1以上の会話文に含まれる1以上のエンティティを抽出するエンティティ抽出ステップと、特徴ラベル並びに導出した連想単語からなるエンティティが画像に対して1対1又は1対複数で紐付けられたエンティティテーブルを参照し、上記エンティティ抽出ステップにおいて抽出した1以上のエンティティに紐付けられた画像を特定する画像特定ステップとを有することを特徴とする。 The image specifying program according to the present invention includes a conversation sentence reception step for accepting a conversation sentence, an entity extraction step for extracting one or more entities included in one or more conversation sentences received in the conversation sentence reception step, a feature label, and a feature label. The entity consisting of the derived associative words refers to the entity table associated with the image one-to-one or one-to-many, and identifies the image associated with one or more entities extracted in the above entity extraction step. It is characterized by having an image identification step.
 本発明に係るメタデータ抽出システムは、情報媒体の画像に含まれる情報からメタデータを抽出するメタデータ抽出システムにおいて、上記情報媒体の画像から特徴を抽出した特徴マップを生成する特徴マップ生成手段と、特徴マップと要素毎の特徴ラベルとが互いに関連付けられた1以上の個別推論モデルを参照し、上記特徴マップ生成手段において生成された特徴マップから上記要素毎の特徴ラベルをメタデータとして抽出する推論手段とを備えることを特徴とする。 The metadata extraction system according to the present invention is a metadata extraction system that extracts metadata from information contained in an image of an information medium, and is a feature map generation means for generating a feature map that extracts features from the image of the information medium. , Inference that refers to one or more individual inference models in which the feature map and the feature label for each element are associated with each other, and extracts the feature label for each element as metadata from the feature map generated by the feature map generation means. It is characterized by having means.
 上述した構成からなる本発明によれば、各情報媒体の画像にメタデータを関連付ける作業は非常に煩雑な作業を自動的に行うことが可能となる。写真について人間の視覚による判別や定義づけを経ることなくメタデータ化するができ、労力の負担を軽減できる。 According to the present invention having the above-mentioned configuration, it is possible to automatically perform a very complicated work of associating metadata with an image of each information medium. Photographs can be converted into metadata without going through human visual discrimination and definition, reducing the burden of labor.
 また本発明によれば、画像化された情報媒体の種類は多岐に亘る中、これら情報媒体の画像の事後的な検索を行う上で利便性がより高くなるようなメタデータを生成することができ、画像と関連付けすることができる。特に口語調の会話文が音声により取得された場合に、受け付けた会話文に対応する適切な情報媒体の画像を高精度に抽出することができ、またこれが実現できるようなメタデータの生成や関連付けを行うことが可能となる。 Further, according to the present invention, there are various types of imaged information media, and it is possible to generate metadata that is more convenient for performing an ex post facto search of images of these information media. Can be associated with an image. In particular, when spoken conversational sentences are acquired by voice, it is possible to extract images of appropriate information media corresponding to the received conversational sentences with high accuracy, and generate and associate metadata that can realize this. Can be done.
 しかもこのメタデータは、連想単語も含めているため、特徴ラベルのキーワードそのものに加え、そこから連想できる様々な単語が含まれる。このため、画像の特定時において、会話文中にこのような連想単語が含まれている場合においても、そこから画像を特定することもできる。このため、画像を見た人間が受ける「暑い」、「重い」、「速い」等のような感覚的な印象に応じたメタデータを生成することも可能となる。 Moreover, since this metadata includes associative words, in addition to the keyword of the feature label itself, various words that can be associated with it are included. Therefore, at the time of specifying the image, even if such an associative word is included in the conversation sentence, the image can be specified from there. For this reason, it is also possible to generate metadata according to a sensory impression such as "hot", "heavy", "fast", etc. that a person who sees an image receives.
図1は、メタデータ抽出システムの全体構成を示す図である。FIG. 1 is a diagram showing an overall configuration of a metadata extraction system. 図2は、メタデータ抽出装置のブロック構成図である。FIG. 2 is a block configuration diagram of a metadata extraction device. 図3は、本実施の形態による総括推論モデルを示す図である。FIG. 3 is a diagram showing a comprehensive inference model according to the present embodiment. 図4は、メタデータを抽出する対象となる画像の例を示す図である。FIG. 4 is a diagram showing an example of an image for which metadata is to be extracted. 図5は、各種推論を行う上で必要となるテーブルの例を示す図である。FIG. 5 is a diagram showing an example of a table required for performing various inferences. 図6は、本実施の形態による連想単語推論モデルを示す図である。FIG. 6 is a diagram showing an associative word inference model according to the present embodiment. 図7は、画像特定システムのブロック構成図である。FIG. 7 is a block configuration diagram of the image identification system. 図8は、メタデータ抽出システムの処理動作を示すフローチャートである。FIG. 8 is a flowchart showing the processing operation of the metadata extraction system. 図9は、メタデータの抽出から保存までの処理動作を模式的に示した図である。FIG. 9 is a diagram schematically showing the processing operation from the extraction of the metadata to the storage. 図10は、画像特定システムの動作について説明するための図である。FIG. 10 is a diagram for explaining the operation of the image specifying system. 図11は、本実施の形態による総括推論モデルの他の例を示す図である。FIG. 11 is a diagram showing another example of the comprehensive inference model according to the present embodiment.
 以下、本発明を適用したメタデータ抽出システムについて図面を参照しながら詳細に説明をする。 Hereinafter, the metadata extraction system to which the present invention is applied will be described in detail with reference to the drawings.
 全体構成
 図1は、メタデータ抽出システム100の全体構成を示している。メタデータ抽出システム100は、公衆通信網50へアクセス可能なユーザ端末10と、公衆通信網50に接続されたメタデータ抽出装置1と、画像特定システム2とを備えている。
Overall Configuration FIG. 1 shows the overall configuration of the metadata extraction system 100. The metadata extraction system 100 includes a user terminal 10 that can access the public communication network 50, a metadata extraction device 1 connected to the public communication network 50, and an image identification system 2.
 公衆通信網50は、インターネット通信網等であるが、社内等の狭いエリア内で運用する場合には、LAN(Local Area Network)で構成されてもよい。また、公衆通信網50は、いわゆる光ファイバ通信網で構成されてもよい。また、この公衆通信網50は、有線通信網に限定されるものではなく、無線通信網で実現されてもよい。 The public communication network 50 is an Internet communication network or the like, but when operating in a narrow area such as in the company, it may be configured by a LAN (Local Area Network). Further, the public communication network 50 may be configured by a so-called optical fiber communication network. Further, the public communication network 50 is not limited to the wired communication network, and may be realized by a wireless communication network.
 ユーザ端末10は、パーソナルコンピュータ(PC)、スマートフォン、タブレット型端末、携帯電話機、ウェアラブル端末、デジタルカメラ等のような情報媒体の画像を視認することができ、或いは情報媒体の画像を撮像することが可能なあらゆる電子機器が含まれる。ユーザ端末10は、情報媒体の画像A1を自らに実装されているデジタルカメラによって撮像し、或いは公衆通信網50を経由して図示しないサーバ等から取得する。ここでいう情報媒体とは、パンフレット、カタログ、会社案内や広告、更には各種文書や説明資料等、情報を示すあらゆる媒体を示す。画像A1は、このような情報媒体を画像データとして構成したものである。ユーザ端末10は、このようにして取得した画像A1を公衆通信網50を介してメタデータ抽出装置1へ送信する。 The user terminal 10 can visually recognize an image of an information medium such as a personal computer (PC), a smartphone, a tablet terminal, a mobile phone, a wearable terminal, a digital camera, or the like, or can capture an image of the information medium. Includes all possible electronic devices. The user terminal 10 captures an image A1 of an information medium by a digital camera mounted on the user terminal 10 or acquires an image A1 of an information medium from a server or the like (not shown) via a public communication network 50. The information medium referred to here refers to all media showing information such as pamphlets, catalogs, company information and advertisements, and various documents and explanatory materials. The image A1 is composed of such an information medium as image data. The user terminal 10 transmits the image A1 thus acquired to the metadata extraction device 1 via the public communication network 50.
 メタデータ抽出装置1は、ユーザ端末10から公衆通信網50を介して受信した画像A1についてメタデータを抽出するデバイスであり、例えばPC、サーバ等で構成される。このメタデータ抽出装置1では、画像A1毎にメタデータを関連付ける。ここでいうメタデータとは、情報媒体の画像に対して、決められた属性についてその属性値を書き表したものであり、例えば販売中の車のカタログの画像であれば、その画像中に表示されている車の販売元やブランド、価格、燃費や性能等の文字列から読み取れる文字情報に加え、その車の写真の内容(つまりその写真の内容が「車」であることを示す情報)、車の色、車の形状、車の出す雰囲気や速そうなイメージ等、写真から得られる付帯情報のメタデータが存在する。このメタデータ、つまりこれらの付帯情報をテキスト化したものが、情報媒体としての車の画像A1に関連付けられるメタデータとなる。更にこのメタデータは、情報媒体の種類がそもそも何であるのかを示す情報で構成されていてもよく、例えば情報媒体の種類としてパンフレット、レントゲン写真、会社案内、カタログ、宣伝用のちらし、財務諸表等の種別を示す情報で構成されていてもよい。 The metadata extraction device 1 is a device that extracts metadata about an image A1 received from a user terminal 10 via a public communication network 50, and is composed of, for example, a PC, a server, or the like. In this metadata extraction device 1, metadata is associated with each image A1. The metadata referred to here is an image of an information medium in which the attribute values of the determined attributes are written, and for example, if it is an image of a catalog of a car for sale, it is displayed in the image. In addition to the text information that can be read from the character string such as the seller, brand, price, fuel consumption and performance of the car, the content of the picture of the car (that is, the information indicating that the content of the picture is "car"), the car There is metadata of incidental information obtained from photographs, such as the color of the car, the shape of the car, the atmosphere of the car, and the image that seems to be fast. This metadata, that is, the textualized version of these incidental information, becomes the metadata associated with the image A1 of the car as an information medium. Further, this metadata may be composed of information indicating what kind of information medium is in the first place. For example, as the type of information medium, pamphlets, roentgen photographs, company information, catalogs, leaflets for advertising, financial statements, etc. It may be composed of information indicating the type of.
 また、例えば温泉旅館のパンフレットの画像であれば、その画像中に表示されている旅館名や宿泊料金、旅館の住所や連絡先、チェックインからチェックアウトの時間等、文字から読み取れる文字情報に加え、その露天風呂の写真の内容(つまりその写真の内容が「露天風呂」であることを示す情報)、露天風呂の写真から出ている湯気から醸し出される心地よさや温かさ、更には露天風呂の周囲の風景との調和から醸し出される色彩や美しさ、更には食事の写真から抽出できる料理のメニュー等、写真から得られる付帯情報のメタデータが存在する。このメタデータ、つまりこれらの付帯情報をテキスト化したものが、情報媒体としての温泉のパンフレットの画像A1に関連付けられるメタデータとなる。 Also, for example, in the case of an image of a hot spring inn pamphlet, in addition to the textual information that can be read from the characters, such as the inn name and room rate displayed in the image, the inn's address and contact information, and the check-in to check-out time. , The content of the picture of the open-air bath (that is, the information indicating that the content of the picture is "open-air bath"), the comfort and warmth created by the steam from the picture of the open-air bath, and the surroundings of the open-air bath There is metadata of incidental information obtained from photographs, such as the colors and beauty created from harmony with the landscape, and the menu of dishes that can be extracted from photographs of meals. This metadata, that is, the textualized version of these incidental information, becomes the metadata associated with the image A1 of the pamphlet of the hot spring as an information medium.
 メタデータ抽出装置1は、ユーザ端末10を介して情報媒体に関する画像A1を収集し、この画像A1毎にメタデータを関連付ける処理動作を順次実行する。メタデータ抽出装置1は、このメタデータを関連付けた画像A1につきそれぞれ記憶することで、画像A1にそれぞれのメタデータが関連付けられたデータベースを構築する。 The metadata extraction device 1 collects an image A1 related to an information medium via the user terminal 10, and sequentially executes a processing operation of associating the metadata for each of the images A1. The metadata extraction device 1 builds a database in which each metadata is associated with the image A1 by storing each of the images A1 to which the metadata is associated.
 画像特定システム2は、このメタデータ抽出装置1において構築された、画像A1とメタデータとの関係が関係付けられたデータベースを参照し、ユーザが望む画像を特定するシステムである。この画像特定システム2では、ユーザから会話文を直接、又はユーザ端末10から受け付ける。そして画像特定システム2は、メタデータ抽出装置1へアクセスし、受け付けた会話文に対応する適切な画像を特定し、直接ユーザに表示するか、これをユーザ端末10へ送信する。ユーザ端末10を操作するユーザは、見たい画像を音声で入力することにより、画像特定システム2から送られてきた所望の画像を取得することが可能となる。 The image identification system 2 is a system for specifying an image desired by a user by referring to a database in which the relationship between the image A1 and the metadata is related, which is constructed in the metadata extraction device 1. The image specifying system 2 receives a conversational sentence directly from the user or from the user terminal 10. Then, the image specifying system 2 accesses the metadata extraction device 1, identifies an appropriate image corresponding to the received conversational sentence, displays it directly to the user, or transmits it to the user terminal 10. The user who operates the user terminal 10 can acquire a desired image sent from the image specifying system 2 by inputting an image to be viewed by voice.
 以下、メタデータ抽出システム100を構成するメタデータ抽出装置1と、画像特定システム2の構成について説明をする。 Hereinafter, the configurations of the metadata extraction device 1 constituting the metadata extraction system 100 and the image identification system 2 will be described.
 メタデータ抽出装置の構成
 図2は、メタデータ抽出装置1のブロック構成を示している。メタデータ抽出装置1は、中央制御部20と、この中央制御部20にそれぞれ接続された実行部3並びに補助記憶部4とを備えている。
Configuration of Metadata Extraction Device FIG. 2 shows a block configuration of the metadata extraction device 1. The metadata extraction device 1 includes a central control unit 20, an execution unit 3 and an auxiliary storage unit 4 connected to the central control unit 20, respectively.
 中央制御部20は、例えばCPU(Central Processing Unit)であって、実行部3に記憶されたプログラムを呼び出すことで処理を実行する。中央制御部20は、メタデータ抽出装置1内に実装された各構成要素を制御する。実行部3は、画像取得部5、特徴マップ生成部6、総括推論部7、紐づけ部8、抽出部9をそれぞれ備えている。実行部3をRAM(Random Access Memory)で構成する場合には、これら画像取得部5~抽出部9の構成に対応するプログラムがこれに格納されている。 The central control unit 20 is, for example, a CPU (Central Processing Unit), and executes processing by calling a program stored in the execution unit 3. The central control unit 20 controls each component mounted in the metadata extraction device 1. The execution unit 3 includes an image acquisition unit 5, a feature map generation unit 6, a general inference unit 7, a linking unit 8, and an extraction unit 9, respectively. When the execution unit 3 is configured by a RAM (Random Access Memory), a program corresponding to the configurations of the image acquisition unit 5 to the extraction unit 9 is stored in the execution unit 3.
 画像取得部5は、ユーザ端末10から公衆通信網50を介して受信した画像A1を取得する。 The image acquisition unit 5 acquires the image A1 received from the user terminal 10 via the public communication network 50.
 特徴マップ生成部6は、画像A1から特徴を抽出したデータである後述の特徴マップ30を生成する。特徴マップ30は、画素単位、又は例えば複数の画素の集合体であるブロック領域単位で構成されており、周知の画像解析に基づいて、必要に応じてディープラーニング技術を利用した、解析画像の特徴量を2次元画像上に反映させたものである。これにより、例えば写真の画像から判別したい対象につき特徴となる部分を際立たせた特徴マップ30を得ることも可能となる。 The feature map generation unit 6 generates a feature map 30, which will be described later, which is data obtained by extracting features from the image A1. The feature map 30 is composed of a pixel unit or, for example, a block area unit which is an aggregate of a plurality of pixels, and features of an analysis image using a deep learning technique as necessary based on a well-known image analysis. The amount is reflected on the two-dimensional image. As a result, for example, it is possible to obtain a feature map 30 in which a featured portion of an object to be discriminated from a photograph image is highlighted.
 総括推論部7は、特徴マップ生成部6において生成した特徴マップ30に対して、後述する1以上の個別推論モデルDB11~DB16から構成される1つの総括推論モデルDB1を使用する。総括推論モデルDB1を使用して、総括推論部7は、少なくとも、画像A1における少なくとも1つの対象の内容を推論しメタデータを生成する。 The general inference unit 7 uses one general inference model DB1 composed of one or more individual inference models DB11 to DB16, which will be described later, for the feature map 30 generated in the feature map generation unit 6. Using the general inference model DB1, the general inference unit 7 infers at least the content of at least one object in the image A1 and generates metadata.
 紐付け部8は、総括推論部7が推論した結果である総括推論結果としての特徴ラベルに対して、1又は複数の連想単語を紐づける。連想単語は、特徴ラベルの単語から連想される単語であり、例えば、特徴ラベルが「車」であれば、「乗り物」、「重い」等が連想単語であり、特徴ラベルが「温泉」であれば、「温かい」、「硫黄」、「アルカリ性」、「湯気」、「保養」等が連想単語である。このような各特徴ラベルに対して連想単語が1又が複数に亘り紐付けられたものが、連想モデルセットTB2であり、補助記憶部4に記憶される。 The associating unit 8 associates one or a plurality of associative words with the feature label as the overall inference result, which is the result of inference by the overall inference unit 7. The associative word is a word associated with the word of the feature label. For example, if the feature label is "car", "vehicle", "heavy", etc. are associative words, and the feature label is "hot spring". For example, "warm", "sulfur", "alkaline", "steam", "recreation", etc. are associative words. A plurality of associative words associated with each of such feature labels is the associative model set TB2, which is stored in the auxiliary storage unit 4.
 補助記憶部4は、例えばSSD(Solid State Drive)やHDD(Hard Disk Drive)であって、テーブル格納部11と、コーパス格納部14と、エンティティ格納部15と、画像データ格納部26と、メタデータ格納部27とを備えている。 The auxiliary storage unit 4 is, for example, an SSD (Solid State Drive) or an HDD (Hard Disk Drive), and is a table storage unit 11, a corpus storage unit 14, an entity storage unit 15, an image data storage unit 26, and a meta. It includes a data storage unit 27.
 テーブル格納部11には、1または2以上のテーブルが格納される。このテーブル格納部11に格納されるテーブルとしては、補助記憶部4に記憶される総括推論モデルDB1と総括推論結果テーブルTB1、連想モデルセットTB2が記憶される。 One or two or more tables are stored in the table storage unit 11. As the table stored in the table storage unit 11, the comprehensive inference model DB1 stored in the auxiliary storage unit 4, the comprehensive inference result table TB1, and the associative model set TB2 are stored.
 図3に総括推論モデルDBの例を示す。総括推論モデルDB1は、1以上の個別推論モデルDB11~DB15により構成されている。個別推論モデルDB11~DB15の入力は、共通の特徴マップであり、出力は、個別推論結果である。個別推論結果は、それぞれ推論した結果を示す文言からなるテキストデータで構成される。入力データと出力データとは、互いに連関度を通じて関連付けられる。連関度は、入力データと出力データとの繋がりの度合いを示しており、例えば連関度が高いほど各データの繋がりが強いと判断することができる。連関度は、例えば百分率等の3値以上又は3段階以上で示されるほか、2値又は2段階で関連付けされるものであってもよい。 Figure 3 shows an example of the comprehensive inference model DB. The comprehensive inference model DB1 is composed of one or more individual inference models DB11 to DB15. The inputs of the individual inference models DB11 to DB15 are common feature maps, and the output is the individual inference result. The individual inference result is composed of text data consisting of words indicating the inferred result. The input data and the output data are related to each other through the degree of association. The degree of association indicates the degree of connection between the input data and the output data. For example, it can be determined that the higher the degree of association, the stronger the connection of each data. The degree of association is indicated by, for example, three or more values such as percentages or three or more stages, and may be associated with two values or two stages.
 個別推論モデルDB11~DB15は、学習用の複数の特徴マップと、学習用の複数の個別推論結果とを学習用データセットとして、機械学習により生成する。機械学習としては、例えば畳み込みニューラルネットワーク(CNN:Convolution Neural Network)とし、例えば深層学習等が適用される。 The individual inference models DB11 to DB15 generate a plurality of feature maps for learning and a plurality of individual inference results for learning as a learning data set by machine learning. As machine learning, for example, a convolutional neural network (CNN) is used, and for example, deep learning is applied.
 各個別推論モデルDB11~DB15は、特徴マップから、情報媒体の画像に映し出されている対象物の内容や色彩、文字等を推論する学習用データセットで構成される。例えば、図4に示すような車のパンフレットからなる情報媒体の画像において、個別推論モデルDB11,12,14は、当該情報媒体の画像に映し出されている対象物の内容として「車」、「木」、「柵」を推論するためのモデルであり、個別推論モデルDB13は、画像に映し出されている対象物の色を推論するためのモデルであり、個別推論モデルDB15は、画像に映し出されている文字列を推論するためのモデルとしている。このように、個別推論モデルは、対象物の内容、対象物の色、画像に映し出されている文字列等からなる各要素毎にぞれぞれ独立して設けられている。 Each individual inference model DB11 to DB15 is composed of a learning data set that infers the content, color, characters, etc. of an object displayed on an image of an information medium from a feature map. For example, in an image of an information medium consisting of a car brochure as shown in FIG. 4, the individual inference models DB11, 12, and 14 have "car" and "tree" as the contents of the object displayed in the image of the information medium. , The individual inference model DB 13 is a model for inferring the color of the object projected on the image, and the individual inference model DB 15 is projected on the image. It is used as a model for inferring existing character strings. As described above, the individual inference model is independently provided for each element consisting of the content of the object, the color of the object, the character string projected on the image, and the like.
 個別推論結果は、例えば、個別推論モデルDB11を例に挙げる場合、車:96.999%、木:2.01%といったように探索解が推論される確率である推論確率で表されるようにしてもよい。総括推論モデルDB1は、1つの特徴マップ30を入力とし、類論確率の閾値として例えば80%以上の探索解のみを総括推論結果として出力するようにしてもよい。また、推論確率の閾値以上の探索解が存在せず、閾値未満の探索解のみしか存在しない場合、当該探索解を複数出力するものであってもよい。例えば、DB13、DB15については総括推論結果として複数の探索解を出力している。 The individual inference result is expressed by the inference probability, which is the probability that the search solution is inferred, such as car: 96.999%, tree: 2.01%, for example, when the individual inference model DB11 is taken as an example. May be. The general inference model DB 1 may input one feature map 30 and output only a search solution of, for example, 80% or more as a threshold value of the class field probability as a general inference result. Further, when there is no search solution above the threshold value of the inference probability and only the search solution below the threshold value exists, a plurality of search solutions may be output. For example, for DB13 and DB15, a plurality of search solutions are output as a comprehensive inference result.
 総括推論モデルDB1では、個々の個別推論モデルDB11~DB15の探索解として求められた、テキストデータで表される個別推論結果をそれぞれ総括推論結果とする。図3の例では、個々の個別推論モデルDB11~DB15の探索解として、「車」、「木」、「赤、緑」、「柵」、「A社、ラインナップA」が求められたため、これらを総括推論結果として出力する。なお、この総括推論結果は、総括推論結果テーブルTB1に示すようにテーブル化した上でテーブル格納部11に格納するようにしてもよい。この総括推論結果テーブルTB1は、特徴マップを形成した情報媒体の画像と、総括推論結果が互いに紐付けられたテーブルで構成される。このように各情報媒体の画像が総括推論結果と関連付けられた総括推論結果テーブルTB1を準備し、テーブル格納部11に格納しておくことで、事後的に行われる検索の利便性を高めることが可能となる。 In the comprehensive inference model DB1, the individual inference results represented by the text data obtained as the search solutions of the individual individual inference models DB11 to DB15 are used as the comprehensive inference results. In the example of FIG. 3, "car", "tree", "red, green", "fence", "company A, lineup A" were obtained as search solutions for the individual inference models DB11 to DB15. Is output as a comprehensive inference result. The general inference result may be stored in the table storage unit 11 after being made into a table as shown in the general inference result table TB1. The comprehensive inference result table TB1 is composed of an image of an information medium forming a feature map and a table in which the comprehensive inference results are linked to each other. By preparing the general inference result table TB1 in which the image of each information medium is associated with the general inference result and storing it in the table storage unit 11, the convenience of the search performed after the fact can be improved. It will be possible.
 連想モデルセットTB2は、図5に示すように、特徴ラベルと連想単語が互いに紐付けられたテーブルとして構成される。ここでいう特徴ラベルは、上述した総括推論結果として出力された探索解としてのテキストデータ「車」、「木」、「赤」、「緑」、「柵」、「A社」、「ラインナップA」がこれに該当する。 As shown in FIG. 5, the associative model set TB2 is configured as a table in which feature labels and associative words are linked to each other. The feature label here is the text data "car", "tree", "red", "green", "fence", "company A", "lineup A" as the search solution output as the above-mentioned general inference result. "Is applicable to this.
 また、連想単語は、これらの特徴ラベルから連想することができる単語である。例えば、特徴ラベルが「車」であれば連想単語として「重い」、「乗り物」等がある。また特徴ラベルが「木」であれば連想単語として「資源」、「自然」等がある。また、ある自動車メーカのセール対象のラインナップである「ラインナップA」が特徴ラベルであれば「車」、「メーカ」が連想単語として挙げられる。また特徴ラベルとして「赤」であれば、連想単語として「色」が連想単語として挙げられる。 Also, associative words are words that can be associated with these feature labels. For example, if the feature label is "car", there are associative words such as "heavy" and "vehicle". If the feature label is "tree", there are "resources", "nature", etc. as associative words. Further, if "Lineup A", which is a lineup for sale by a certain automobile manufacturer, is a feature label, "car" and "manufacturer" can be mentioned as associative words. If the feature label is "red", the associative word is "color".
 このような特徴ラベルに対して1対1又は1対複数の関係で連想単語が紐付けられた連想モデルセットTB2は、システム側(運営者側)において予め設定しておき、これをテーブル格納部11に格納しておく。このような連想モデルセットTBを予め準備しておくことにより、特徴ラベルが入力された場合にこれに紐付けられた1以上の連想単語を出力することが可能となる。 The associative model set TB2 in which associative words are associated with such a feature label in a one-to-one or one-to-many relationship is set in advance on the system side (operator side), and this is set in the table storage unit. Stored in 11. By preparing such an associative model set TB in advance, it is possible to output one or more associative words associated with the feature label when the feature label is input.
 ちなみに、この連想モデルセットTB2は、特徴ラベルと連想単語が互いに紐付いているか否かの2値で表される場合に限定されるものでは無く、図6に示すような連関性を以って互いに関連付けられるものであってもよい。 By the way, this associative model set TB2 is not limited to the case where the feature label and the associative word are represented by two values of whether or not they are associated with each other, and they are related to each other as shown in FIG. It may be associated.
 この図6の例によれば、入力が特徴ラベルであり、出力は連想単語となる。特徴ラベルと連想単語との間は互いに連関度を通じて関連付けられる。連関度は、入力データと出力データとの繋がりの度合いを示しており、例えば連関度が高いほど各データの繋がりが強いと判断することができる。連関度は、例えば百分率等の3値以上又は3段階以上で示されるほか、2値又は2段階で示されてもよい。このような連関度を構成する隠れ層の各ノードは、ニューラルネットワークのノードで構成されていてよい。 According to the example of FIG. 6, the input is a feature label and the output is an associative word. The feature label and the associative word are associated with each other through the degree of association. The degree of association indicates the degree of connection between the input data and the output data. For example, it can be determined that the higher the degree of association, the stronger the connection of each data. The degree of association is indicated by, for example, three or more values such as percentages or three or more stages, and may be indicated by two values or two stages. Each node of the hidden layer constituting such a degree of association may be composed of a node of a neural network.
 かかる場合には、特徴ラベルと連想単語とを学習用データセットとして、機械学習により生成する。機械学習としては、例えば畳み込みニューラルネットワークとし、例えば深層学習等が適用される。 In such a case, the feature label and the associative word are generated as a learning data set by machine learning. As machine learning, for example, a convolutional neural network is used, and for example, deep learning is applied.
 連想モデルセットTB2として、このようなニューラルネットワークを予め準備しておくことにより、特徴ラベルが入力された場合にこれと連関度の高い連想単語を出力することが可能となる。 By preparing such a neural network in advance as the associative model set TB2, it is possible to output an associative word with a high degree of association with the feature label when it is input.
 エンティティ格納部15には、1または2以上のエンティティ並びにエンティティ値が格納される。エンティティとは、会話文情報に対応付けられた1または2以上の各単語である。単語とは、文を構成する単位である。単語は、例えば、単に「語」、または「ことば」などと呼んでもよいし、形態素の一種(例えば、後述する自立語)と考えてもよい。 The entity storage unit 15 stores one or more entities and entity values. An entity is one or more words associated with conversational sentence information. A word is a unit that constitutes a sentence. A word may be simply called, for example, a "word" or a "word", or may be considered as a kind of morpheme (for example, an independent word described later).
 エンティティ格納部15には、図5に示すエンティティテーブルTB3のように、例えば、1以上の各エンティティに対応付けて、1または2以上のエンティティ値が格納される。エンティティ値は、エンティティを具体化した文字列である。 In the entity storage unit 15, as shown in the entity table TB3 shown in FIG. 5, for example, one or more entity values are stored in association with each one or more entities. The entity value is a character string that embodies the entity.
 エンティティは、通常、コーパス格納部14に格納されている1以上の会話文情報の中のいずれか1つ又は2以上の会話文情報に対応付いている。従って、エンティティ格納部15には、例えば、コーパス格納部14に格納されている1以上の会話文情報ごとに、1または2以上のエンティティが格納されてもよい。 The entity usually corresponds to any one or more conversational text information in one or more conversational text information stored in the corpus storage unit 14. Therefore, in the entity storage unit 15, for example, one or two or more entities may be stored for each one or more conversational sentence information stored in the corpus storage unit 14.
 エンティティテーブルTB3では、連想モデルセットTB2における一の特徴ラベル又は連想単語をエンティティとした場合、当該連想モデルセットTB2においてそのエンティティに紐付けられているものをエンティティ値とする。例えば連想モデルセットTB2において、特徴ラベル及び連想単語に「車」を含むものについて、これに紐付けられているものは、「車」、「ラインナップA」、「重い」、「乗り物」である。このため、「車」をエンティティとした場合、エンティティ値は「車」、「ラインナップA」、「重い」、「乗り物」となる。また、「石焼き芋」をエンティティとした場合、エンティティ値は「甘い」、「美味しい」、「作物」、「品種改良」、「土の中」等がエンティティ値となる。 In the entity table TB3, when one feature label or associative word in the associative model set TB2 is an entity, the entity value is associated with that entity in the associative model set TB2. For example, in the associative model set TB2, those including "car" in the feature label and the associative word are associated with "car", "lineup A", "heavy", and "vehicle". Therefore, when "car" is used as an entity, the entity values are "car", "lineup A", "heavy", and "vehicle". When "Ishiyakiimo" is used as an entity, the entity values are "sweet", "delicious", "crop", "breeding", "in the soil" and the like.
 エンティティテーブルTB3は、このようなエンティティとエンティティ値が互いに一対一、又は一対複数で紐付けられている。このため、このエンティティテーブルTB3を介して、エンティティ値からこれに関連するエンティティを導出することができ、またエンティティからエンティティ値を導出することも可能となる。 In the entity table TB3, such an entity and the entity value are associated with each other on a one-to-one basis or one-to-many basis. Therefore, the entity related to the entity can be derived from the entity value via the entity table TB3, and the entity value can be derived from the entity.
 エンティティ及びエンティティ値の設定の際には例えば、図示せぬサーバにおいて、特徴ラベル及び連想単語を検索した検索結果の件数の多い文字列である特徴ラベル又は連想単語をエンティティに設定し、検索結果の件数の少ないもう一方の文字列をエンティティ値に設定するようにしてもよい。 When setting the entity and the entity value, for example, on a server (not shown), the feature label or the associative word, which is a character string having a large number of search results searched for the feature label and the associative word, is set as the entity, and the search result is set. The other character string with a small number of cases may be set as the entity value.
 また、エンティティテーブルTB3は、互いに対応付けされたエンティティ及びエンティティ値に対して更に画像が紐付けされて記憶される。エンティティ及びエンティティ値は、上述したように連想モデルセットTB2から抽出しており、連想モデルセットTBにおける特徴ラベルは、総括推論結果テーブルTB1において画像と紐付いている。このため、これら互いに紐付いた対応関係から、エンティティ及びエンティティ値に対して画像を紐付けることが可能となる。その結果、図5に示すように各画像A1、A2、A3、・・・・に対して紐付いたエンティティ及びエンティティ値の関係を格納することが可能となる。 Further, in the entity table TB3, an image is further associated with the entity and the entity value associated with each other and stored. The entity and the entity value are extracted from the associative model set TB2 as described above, and the feature label in the associative model set TB is associated with the image in the comprehensive inference result table TB1. Therefore, it is possible to associate an image with an entity and an entity value from the correspondence relationship associated with each other. As a result, as shown in FIG. 5, it is possible to store the relationship between the entity and the entity value associated with each image A1, A2, A3, ....
 なおエンティティやエンティティ値を構成する単語は、例えば、連語でもよい。連語とは、2以上の自律語が結び付いて一定の意味を表す語であり、複合語といってもよい。連語は、例えば、“温泉”と“旅館”が結合した“温泉旅館”や、“ラインナップ”と“A”が結合した“ラインナップA”等であるが、“中村一郎”といった氏と名の組でもよく、2以上の語の組であれば何でもよい。 Note that the word that constitutes the entity or entity value may be, for example, a collocation. A collocation is a word that expresses a certain meaning by connecting two or more autonomous words, and may be called a compound word. The collocations are, for example, "hot spring inn" which is a combination of "hot spring" and "inn", "lineup A" which is a combination of "lineup" and "A", etc. However, any set of two or more words may be used.
 また、エンティティテーブルTB3には、1または2以上の日変換情報が格納される。日変換情報とは、日単語を日付に変換するための情報である。日単語とは、日に関する単語である。日単語は、通常、エンティティ名「日付エンティティ」に対応付いた単語であり、例えば、「先月」、「昨日」、「先週」、「今年」、「今月」、「昨年」、「前期」、「今年度」などであるが、日付に変換し得る情報であれば何でもよい。 In addition, one or two or more day conversion information is stored in the entity table TB3. The day conversion information is information for converting a day word into a date. A day word is a word about a day. A day word is usually a word associated with the entity name "date entity", for example, "last month", "yesterday", "last week", "this year", "this month", "last year", "previous term", For example, "this year", but any information that can be converted into a date may be used.
 日変換情報は、日単語と、日情報取得情報とを有する。日情報取得情報とは、日情報を取得するための情報である。日情報とは、日単語に対応する日に関する情報であり、問合情報を構成する際に使用する情報である。日情報は、例えば、“4月1日”等の日付を示す情報でもよいし、“4/1~4/30”等の開始日付から終了日付までの期間を示す情報でもよく、これらに限定されるものではない。日情報取得情報は、例えば、関数名、またはメソッド名であるが、API情報でもよいし、プログラム自体でもよく、またこれらに限定されるものではない。 The day conversion information includes a day word and day information acquisition information. The day information acquisition information is information for acquiring day information. The day information is information about the day corresponding to the day word, and is information used when composing inquiry information. The day information may be, for example, information indicating a date such as "April 1" or information indicating a period from a start date to an end date such as "4/1 to 4/30", and is limited thereto. It is not something that will be done. The day information acquisition information is, for example, a function name or a method name, but may be API information, may be a program itself, and is not limited thereto.
 具体的には、日単語「先月」に対する日情報取得情報は、例えば、現在時刻情報(例えば“2020年5月10日11時15分”:以下同様)を取得し、当該現在時刻情報が有する月(例えば“5月”)に対して前の月(例えば“4月”)を取得し、当該前の月のカレンダー情報を参照して、当該前の月の初日から末日までの日情報(例えば“4/1~4/30”等)を取得するプログラム等でもよい。 Specifically, as the day information acquisition information for the day word "last month", for example, the current time information (for example, "May 10, 2020 11:15": the same applies hereinafter) is acquired, and the current time information has. Obtain the previous month (for example, "April") for a month (for example, "May"), refer to the calendar information of the previous month, and perform day information (for example, the first day to the last day of the previous month). For example, a program or the like for acquiring "4/1 to 4/30" etc. may be used.
 また、日単語「今年」に対する日情報取得情報は、例えば、現在時刻情報を取得し、当該現在時刻情報が有する年(例えば“2020年”)のカレンダー情報を参照して、当該年の初日から、当該現在時刻情報が有する日までの日情報(例えば、“2020/1/1~2020/5/10”)を取得するAPI情報等でもよい。 Further, for the day information acquisition information for the day word "this year", for example, the current time information is acquired, and the calendar information of the year (for example, "2020") possessed by the current time information is referred to from the first day of the year. , API information or the like for acquiring day information (for example, "2020/1/1 to 2020/5/10") up to the day of the current time information may be used.
 さらに、日単語「昨日」に対する日情報取得情報は、現在時刻情報を取得し、当該現在時刻情報が有する日の前の日の日情報(例えば、“5/9”)を取得するメソッド、またはそのメソッド名等でもよい。 Further, the day information acquisition information for the day word "yesterday" is a method of acquiring the current time information and acquiring the day information of the day before the day of the current time information (for example, "5/9"), or The method name or the like may be used.
 コーパス格納部14には、1または2以上の会話文情報が格納される。会話文情報とは、会話文の情報である。会話文情報は、通常、会話文の例文である。例文は、例えば、“温泉旅館のパンフレット見せて”や“〇〇ブランドの車のカタログ見せて”等であるが、これらに限定されるものではない。 The corpus storage unit 14 stores one or more conversational sentence information. Conversational sentence information is information on conversational sentences. Conversational sentence information is usually an example sentence of a conversational sentence. Examples of sentences are, for example, "show me a pamphlet of a hot spring inn" and "show me a catalog of cars of XX brands", but are not limited to these.
 ただし、会話文情報は、会話文のテンプレートであってもよい。テンプレートは、例えば、“{車}の画像はある?”、“{乗り物}の{パンフレット}を見せて”、“{メーカ}の{車}の{情報}を教えて”、“{企業}の{車}の{情報}を教えて”等であり、テンプレートに含まれる{車}等の“{”,“}”で表現される情報は、エンティティ、即ち変数である。 However, the conversation text information may be a conversation text template. Templates are, for example, "Do you have an image of a {car}?", "Show me a {pamphlet} of a {vehicle}", "Tell me the {information} of a {car} of a {maker}", "{company}" The information expressed by "{", "}" such as {car} included in the template is an entity, that is, a variable.
 会話文情報は、通常、インテントに対応付いている。インテントとは、処理動作を特定するための情報ともいえる。処理動作の例としては、例えば「画像検索」、「パンフレット検索」、「画像情報検索」等であり、これらの各処理動作を特定する情報に対応付けて、コーパステーブルTB4に格納されることであってもよい。 Conversational text information usually corresponds to the intent. The intent can also be said to be information for specifying the processing operation. Examples of processing operations are, for example, "image search", "pamphlet search", "image information search", etc., and by being stored in the corpus table TB4 in association with the information specifying each of these processing operations. There may be.
 すなわち、コーパス格納部14には、例えば、インテント格納部12に格納されている1以上のインテントごとに、1または2以上の会話文情報が格納される。 That is, in the corpus storage unit 14, for example, one or two or more conversational sentence information is stored for each one or more intents stored in the intent storage unit 12.
 各インテントに紐付いた会話文情報(コーパス)に関するコーパステーブルTB4の例を図5に示す。コーパステーブルTB4には、通常、格納されている1以上の会話文情報ごとに、1または2以上のエンティティ情報も格納される。エンティティ情報とは、一の会話文情報に対応付いている1以上の各エンティティに関する情報である。エンティティ情報は、上述したエンティティ、エンティティ値に加え、各エンティティの開始位置、終了位置、およびエンティティ名を有する。 FIG. 5 shows an example of the corpus table TB4 relating to the conversational sentence information (corpus) associated with each intent. The corpus table TB4 usually stores one or more entity information for each one or more stored conversational sentence information. The entity information is information about each one or more entities corresponding to one conversational sentence information. The entity information has the start position, end position, and entity name of each entity in addition to the above-mentioned entity and entity value.
 ここでいう開始位置とは、会話文情報において当該エンティティが始まる位置である。開始位置は、例えば、会話文を構成する文字列において、当該エンティティの最初の文字が何番目の文字であるかを示す値(例えば、“1”や“4”等)で表現される。同様に、終了位置とは、会話文情報において当該エンティティが終わる位置であり、例えば、当該エンティティの最後の文字が何番目の文字であるかを示す値(例えば、“2”や“5”等)で表現される。ただし、開始位置や終了位置の表現形式はこれらに限定されるものではない。なお、開始位置および終了位置は、オフセットといってもよい。また、オフセットは、バイト数で表現されてもよく、また、これに限定されるものではない。 The start position here is the position where the entity starts in the conversation text information. The start position is represented by, for example, a value (for example, "1", "4", etc.) indicating which character the first character of the entity is in the character string constituting the conversation sentence. Similarly, the end position is the position where the entity ends in the conversation text information, and is, for example, a value indicating the number of the last character of the entity (for example, "2", "5", etc.). ). However, the expression format of the start position and the end position is not limited to these. The start position and the end position may be referred to as offsets. Further, the offset may be expressed by the number of bytes, and is not limited to this.
 エンティティ名とは、エンティティの名称である。エンティティ名は、例えば、「対象物エンティティ」、「日付エンティティ」、「情報エンティティ」等であるが、エンティティの属性を表現し得る情報であれば、その形式はこれらに限定されるものではない。対象物エンティティとは、対象物に関するエンティティであり、表1でいうところの{乗り物}や、{車}、{企業}等である。日付エンティティとは、日付に関するエンティティである。情報エンティティとは、必要とされている情報に関するエンティティである。 The entity name is the name of the entity. The entity name is, for example, "object entity", "date entity", "information entity", etc., but the format is not limited to these as long as it is information that can express the attributes of the entity. The object entity is an entity related to the object, such as {vehicle}, {car}, {company}, etc. in Table 1. A date entity is an entity related to a date. An information entity is an entity related to the required information.
 または、エンティティ情報は、会話文情報がテンプレートの場合、例えば、エンティティ名、および順序情報を有していてもよい。順序情報とは、テンプレートに含まれる1以上の変数において、当該エンティティ名が何番目の変数に対応するかを示す値である。ただし、エンティティ情報の構造は、これに限定されるものではない。 Alternatively, the entity information may have, for example, an entity name and an order information when the conversation text information is a template. The order information is a value indicating which variable the entity name corresponds to in one or more variables included in the template. However, the structure of the entity information is not limited to this.
 なお、実施形態におけるコーパスとは、例えば、コーパス格納部14に格納されている1以上の会話文情報の各々と考えてもよいし、1以上の会話文情報、および各会話文情報に対応付いているエンティティ情報の集合と考えることもできる。 The corpus in the embodiment may be considered as each of one or more conversational sentence information stored in the corpus storage unit 14, for example, and corresponds to one or more conversational sentence information and each conversational sentence information. It can also be thought of as a set of entity information.
 上述した連想モデルセットTB2、エンティティテーブルTB3、コーパステーブルTB4は、例えば、表形式のデータベースであってもよい。テーブルには、例えば、1または2以上の項目名が登録され、さらに、当該1以上の項目名ごとに、1または2以上の値が登録される。なお、項目名は、属性名といってもよいし、一の項目名に対応する1以上の各値は、属性値といってもよい。また、テーブルは、例えば、リレーショナルデータベースのテーブル、TSV、エクセル、CSV等であるが、その種類はこれらに限定されるものではない。 The above-mentioned associative model set TB2, entity table TB3, and corpus table TB4 may be, for example, a tabular database. For example, one or more item names are registered in the table, and one or two or more values are registered for each one or more item names. The item name may be referred to as an attribute name, and each value of 1 or more corresponding to one item name may be referred to as an attribute value. Further, the table is, for example, a relational database table, TSV, Excel, CSV, etc., but the type thereof is not limited to these.
 画像データ格納部26は、画像取得部5を介して取得された画像を格納するための領域である。総括推論結果テーブルTB1やエンティティテーブルTB3において各パラメータに対して各画像が紐付けられているが、その紐付いた画像を即座に読み出すことができるように、これを画像データ格納部26に格納しておく。 The image data storage unit 26 is an area for storing an image acquired via the image acquisition unit 5. Each image is associated with each parameter in the general inference result table TB1 and the entity table TB3, and this is stored in the image data storage unit 26 so that the associated image can be read out immediately. back.
 メタデータ格納部27は、補助記憶部4に記憶された各モデルDB1やテーブルTB1~TB4は、中央制御部20が実行部3に対して各種処理動作を実行する上で読み出され、参照される。また補助記憶部4に記憶された各モデルDB1やテーブルTB1~TB4は、画像取得部5において取得された画像に基づき、総括推論部7において、新たにメタデータとして特徴ラベルが抽出され、或いは紐付け部8において新たに連想単語が導出される都度、更新されるものであってもよい。 In the metadata storage unit 27, each model DB1 and tables TB1 to TB4 stored in the auxiliary storage unit 4 are read out and referred to when the central control unit 20 executes various processing operations on the execution unit 3. To. Further, in each model DB1 and tables TB1 to TB4 stored in the auxiliary storage unit 4, a feature label is newly extracted as metadata in the general inference unit 7 based on the image acquired in the image acquisition unit 5, or a string is used. It may be updated every time a new associative word is derived in the attachment unit 8.
 抽出部9は、後述する画像特定システム2において抽出されたエンティティと、特徴ラベル及び連想単語の少なくとも1つと、を比較し、抽出されたエンティティと少なくとも一部が一致する特徴ラベル及び連想単語の少なくとも1つと紐づく画像を抽出する。かかる場合には、上述した各モデルDB1やテーブルTB1~TB4を参照する。抽出部9は、抽出した画像をユーザ端末10に送信し、ユーザ端末10は、画像を表示する。 The extraction unit 9 compares the entity extracted by the image specifying system 2 described later with at least one of the feature label and the associative word, and at least a feature label and the associative word that at least partially match the extracted entity. Extract the image associated with one. In such a case, each model DB1 and tables TB1 to TB4 described above are referred to. The extraction unit 9 transmits the extracted image to the user terminal 10, and the user terminal 10 displays the image.
 画像特定装置の構成
 図7は、画像特定システム2のブロック図である。画像特定システム2は、格納部19、受付部29、処理部39、および出力部49を備える。
The configuration diagram 7 of the image identification device is a block diagram of the image identification system 2. The image identification system 2 includes a storage unit 19, a reception unit 29, a processing unit 39, and an output unit 49.
 格納部19は、テーブル格納部11、インテント格納部12、API情報格納部13、コーパス格納部14、エンティティ格納部15、および日変換情報格納部18を備える。受付部29は、会話文受付手段21を備える。会話文受付手段21は、音声受付手段211、および音声認識手段212を備える。 The storage unit 19 includes a table storage unit 11, an intent storage unit 12, an API information storage unit 13, a corpus storage unit 14, an entity storage unit 15, and a day conversion information storage unit 18. The reception unit 29 includes a conversation sentence reception means 21. The conversation sentence receiving means 21 includes a voice receiving means 211 and a voice recognizing means 212.
 処理部39は、パラメータ化手段30、インテント決定手段31、会話文情報決定手段32、エンティティ取得部33、パラメータ取得部34、API情報取得手段35、問合情報構成部36、および検索結果取得手段37を備える。パラメータ取得部34は、判断手段341、日情報取得手段342、エンティティ名取得手段343、翻訳項目名取得手段344、テーブル識別子取得手段345、主キー識別子取得手段346、および変換パラメータ取得手段347を備える。出力部4は、検索結果出力手段41を備える。 The processing unit 39 includes a parameterization unit 30, an intent determination unit 31, a conversational sentence information determination unit 32, an entity acquisition unit 33, a parameter acquisition unit 34, an API information acquisition unit 35, an inquiry information configuration unit 36, and a search result acquisition unit. Means 37 is provided. The parameter acquisition unit 34 includes a determination unit 341, a day information acquisition unit 342, an entity name acquisition unit 343, a translation item name acquisition unit 344, a table identifier acquisition unit 345, a primary key identifier acquisition unit 346, and a conversion parameter acquisition unit 347. .. The output unit 4 includes a search result output means 41.
 格納部19は、各種の情報を格納するデータベースである。各種の情報とは、例えば、テーブル、インテント、API情報、コーパス、エンティティ、エンティティマッピング情報、PK項目、および日変換情報などである。なお、テーブル等の情報については後述する。また、その他の情報については、適時説明する。 The storage unit 19 is a database that stores various types of information. The various types of information include, for example, tables, intents, API information, corpora, entities, entity mapping information, PK items, and day conversion information. Information on tables and the like will be described later. In addition, other information will be explained in a timely manner.
 テーブル格納部11は、メタデータ抽出装置1におけるテーブル格納部11に記憶される総括推論モデルDB1と総括推論結果テーブルTB1、連想モデルセットTB2と同様のテーブルを記憶する。画像特定システム2において、このテーブル格納部11は、メタデータ抽出装置1におけるテーブル格納部11を利用するのであれば、この画像特定システム2側において省略するようにしてもよい。 The table storage unit 11 stores the same table as the general inference model DB1, the general inference result table TB1, and the associative model set TB2 stored in the table storage unit 11 in the metadata extraction device 1. In the image identification system 2, if the table storage unit 11 in the metadata extraction device 1 is used, the table storage unit 11 may be omitted on the image identification system 2 side.
 インテント格納部12には、1または2以上のインテントが格納される。インテントとは、画像特定の処理毎に管理される情報であり、画像特定の処理動作を特定するための情報ともいえる。インテントは、通常、業務処理を特定する処理動作名を有する。処理動作名とは、処理動作の名称である。処理動作とは、通常、APIを介して実行される業務処理である。ただし、処理動作は、例えば、SQL文に応じて実行される業務処理であってもよい。 One or two or more intents are stored in the intent storage unit 12. The intent is information managed for each image-specific processing, and can be said to be information for specifying an image-specific processing operation. The intent usually has a process operation name that identifies the business process. The processing operation name is the name of the processing operation. The processing operation is usually a business process executed via API. However, the processing operation may be, for example, a business process executed according to the SQL statement.
 なお、処理動作名は、通常、後述するAPI情報にも対応付いている。従って、インテントは、例えば、処理動作名を介して、API情報に対応付けられる、と考えてもよい。 Note that the processing operation name usually corresponds to the API information described later. Therefore, it may be considered that the intent is associated with the API information, for example, via the processing operation name.
 API情報格納部13には、1または2以上のAPI情報が格納される。API情報とは、APIに関する情報である。APIとは、プログラムの機能を利用するためのインターフェースである。APIは、例えば、関数、メソッド、または実行モジュールなどのソフトウェアである。APIは、例えば、WebAPIであるが、それ以外のAPIでもよい。WebAPIとは、HTTPやHTTPSなどのWeb通信のプロトコルを用いて構築されたAPIである。なお、WebAPI等のAPIについては、公知技術であるので、詳しい説明を省略する。 One or two or more API information is stored in the API information storage unit 13. API information is information about API. API is an interface for using the functions of a program. APIs are software such as, for example, functions, methods, or execution modules. The API is, for example, a Web API, but other APIs may be used. The Web API is an API constructed by using a Web communication protocol such as HTTP or HTTPS. Since APIs such as WebAPI are known techniques, detailed description thereof will be omitted.
 API情報は、インテントに対応付いている情報である。API情報は、前述したように、例えば、処理動作名を介して、インテントに対応付いている。 API information is information that corresponds to the intent. As described above, the API information corresponds to the intent, for example, through the processing operation name.
 API情報は、通常、画像化された情報媒体を検索を行うための情報である。ただし、API情報は、例えば、情報の登録、または情報に基づく処理などを行うための情報であってもよい。 API information is usually information for searching an imaged information medium. However, the API information may be, for example, information for registering information or performing processing based on the information.
 API情報は、1または2以上のパラメータ特定情報を有する。パラメータ特定情報とは、パラメータを特定する情報である。パラメータとは、特定の属性を有する値である、といってもよい。値は、通常、変数である。変数は、引数といってもよい。 API information has one or more parameter-specific information. The parameter specific information is information that specifies a parameter. It may be said that the parameter is a value having a specific attribute. The value is usually a variable. Variables can also be called arguments.
 パラメータは、通常、エンティティを変換した情報であるが、エンティティそのものでもよい。パラメータは、例えば、APIに与える引数、またはSQL文の変数などである。 The parameter is usually the information obtained by converting the entity, but it may be the entity itself. The parameters are, for example, arguments given to the API or variables in the SQL statement.
 パラメータは、例えば、属性名と値との組で構成されてもよい。属性名と値との組とは、具体的には、例えば、“shain_code=2”、“sta_date=20190401,end_date=20190430”等であるが、その形式はこれらに限定されるものではない。 The parameter may be composed of, for example, a set of an attribute name and a value. Specifically, the set of the attribute name and the value is, for example, "shin_code = 2", "sta_date = 2019041, end_date = 20190430", etc., but the format is not limited thereto.
 パラメータ特定情報は、例えば、パラメータ名である。パラメータ名とは、パラメータの名称である。または、パラメータ特定情報は、例えば、属性名であるが、パラメータを特定し得る情報であれば何でもよい。 The parameter specific information is, for example, a parameter name. The parameter name is the name of the parameter. Alternatively, the parameter-specific information is, for example, an attribute name, but any information that can specify the parameter may be used.
 コーパス格納部14には、メタデータ抽出装置1におけるコーパス格納部14に記憶されるコーパステーブルTB4と同様のテーブルを記憶する。画像特定システム2において、このコーパス格納部14は、メタデータ抽出装置1におけるコーパス格納部14を利用するのであれば、この画像特定システム2側において省略するようにしてもよい。 The corpus storage unit 14 stores a table similar to the corpus table TB4 stored in the corpus storage unit 14 in the metadata extraction device 1. In the image identification system 2, if the corpus storage unit 14 in the metadata extraction device 1 is used, the corpus storage unit 14 may be omitted on the image identification system 2 side.
 エンティティ格納部15には、メタデータ抽出装置1におけるエンティティ格納部15に記憶されるエンティティテーブルTB3と同様のテーブルを記憶する。画像特定システム2において、このエンティティ格納部15は、メタデータ抽出装置1におけるエンティティ格納部15を利用するのであれば、この画像特定システム2側において省略するようにしてもよい。 The entity storage unit 15 stores the same table as the entity table TB3 stored in the entity storage unit 15 in the metadata extraction device 1. In the image identification system 2, if the entity storage unit 15 in the metadata extraction device 1 is used, the entity storage unit 15 may be omitted on the image identification system 2 side.
 受付部29は、各種の情報を受け付ける。各種の情報とは、例えば、会話文である。受付部29は、会話文等の情報を、例えば、端末から受信するが、キーボードやタッチパネルやマイクロフォン等の入力デバイスを介して受け付けてもよい。または、受付部29は、例えば、ディスクや半導体メモリ等の記録媒体から読み出された情報を受け付けてもよく、その受け付けの態様は特に限定されるものではない。 The reception unit 29 receives various information. The various types of information are, for example, conversational sentences. The reception unit 29 receives information such as conversational sentences from a terminal, for example, but may receive information via an input device such as a keyboard, a touch panel, or a microphone. Alternatively, the reception unit 29 may receive information read from a recording medium such as a disk or a semiconductor memory, and the mode of reception is not particularly limited.
 会話文受付手段21は、会話文を受け付ける。会話文とは、人が会話する文であり、自然言語による文といってもよい。会話文の受け付けは、例えば、音声での受け付けであるが、テキストでの受け付けでもよい。音声とは、人が発した声である。テキストとは、人が発した声を音声認識した文字列である。文字列は、1または2以上の文字の配列で構成される。 The conversation sentence receiving means 21 accepts a conversation sentence. A conversational sentence is a sentence in which a person speaks, and can be said to be a sentence in natural language. The reception of conversational sentences is, for example, reception by voice, but reception by text may also be used. Voice is a voice made by a person. A text is a character string that is voice-recognized a voice uttered by a person. A character string consists of an array of one or more characters.
 音声受付手段211は、会話文の音声を受け付ける。音声受付手段211は、会話文の音声を、例えば、端末から、端末識別子と対に受信するが、マイクロフォンを介して受け付けてもよい。端末識別子とは、端末を識別する情報である。端末識別子は、例えば、MACアドレス、IPアドレス、IDなどであるが、端末を識別し得る情報であれば何でもよい。なお、端末識別子は、端末のユーザを識別するユーザ識別子でもよい。ユーザ識別子は、例えば、メールアドレス、電話番号等であるが、IDや住所・氏名等でもよく、ユーザを識別し得る情報であれば何でもよい。 The voice receiving means 211 receives the voice of the conversational sentence. The voice receiving means 211 receives the voice of the conversation sentence from the terminal, for example, in pairs with the terminal identifier, but may receive the voice via the microphone. The terminal identifier is information that identifies the terminal. The terminal identifier is, for example, a MAC address, an IP address, an ID, or the like, but any information that can identify the terminal may be used. The terminal identifier may be a user identifier that identifies the user of the terminal. The user identifier is, for example, an e-mail address, a telephone number, or the like, but may be an ID, an address, a name, or the like, and may be any information that can identify the user.
 音声認識手段212は、音声受付手段211が受け付けた音声に対して音声認識処理を行い、文字列である会話文を取得する。なお、音声認識処理は公知技術であり、詳しい説明を省略する。 The voice recognition means 212 performs voice recognition processing on the voice received by the voice reception means 211, and acquires a conversational sentence which is a character string. The voice recognition process is a known technique, and detailed description thereof will be omitted.
 処理部39は、各種の処理を行う。各種の処理とは、例えば、パラメータ化手段30、インテント決定手段31、会話文情報決定手段32、エンティティ取得部33、パラメータ取得部34、API情報取得手段35、問合情報構成部36、検索結果取得手段37、判断手段341、日情報取得手段342、エンティティ名取得手段343、翻訳項目名取得手段344、テーブル識別子取得手段345、主キー識別子取得手段346、および変換パラメータ取得手段347などの処理である。また、各種の処理には、例えば、フローチャートで説明する各種の判別なども含まれる。 The processing unit 39 performs various processes. The various processes include, for example, parameterization means 30, intent determination means 31, conversational sentence information determination means 32, entity acquisition unit 33, parameter acquisition unit 34, API information acquisition means 35, inquiry information configuration unit 36, and search. Processing of result acquisition means 37, determination means 341, day information acquisition means 342, entity name acquisition means 343, translation item name acquisition means 344, table identifier acquisition means 345, primary key identifier acquisition means 346, conversion parameter acquisition means 347, and the like. Is. Further, the various processes include, for example, various discriminations described in the flowchart.
 処理部39は、例えば、会話文受付手段21が会話文を受け付けたことに応じて、パラメータ化手段30及びインテント決定手段31等の処理を行う。なお、1以上の各端末から、端末識別子と対に会話文が送信される場合、処理部39は、インテント決定手段31等の処理を、1以上の端末識別子ごとに行う。 The processing unit 39 processes, for example, the parameterization means 30 and the intent determination means 31 in response to the conversation sentence reception means 21 receiving the conversation sentence. When a conversational sentence is transmitted from one or more terminals to a pair with the terminal identifier, the processing unit 39 performs processing by the intent determination means 31 or the like for each one or more terminal identifiers.
 パラメータ化手段30は、会話文受付手段21が受け付けた1以上の会話文に含まれる1以上のエンティティをパラメータ化する。なお、パラメータ化手段30は、会話文情報決定手段32が決定した会話文情報に対応するエンティティをパラメータ化してもよい。 The parameterizing means 30 parameterizes one or more entities included in one or more conversational sentences received by the conversational sentence receiving means 21. The parameterizing means 30 may parameterize the entity corresponding to the conversational sentence information determined by the conversational sentence information determining means 32.
 詳しくは、パラメータ化手段30は、音声として入力された会話文に含まれるエンティティ、一例として自立語をパラメータ化する。例えば、「ラインナップAの画像はある?」という会話文と「ラインナップAについての画像はありますか?」という会話文を比較すると、助詞が入れ替わっている点以外、2つの会話文に異なる点は無い。これにも拘わらず、これまでの検索結果では、異なる意味を持つ会話文として、必ずしも同じ意味内容であると認識されていないことがあった。そこで、パラメータ化手段30は、これらの会話文に含まれる「ラインナップA」、「画像」という自立語、即ちエンティティをパラメータ化する。 Specifically, the parameterizing means 30 parameterizes an entity included in a conversational sentence input as voice, as an example, an independent word. For example, comparing the conversational sentence "Is there an image of lineup A?" And the conversational sentence "Is there an image of lineup A?", There is no difference between the two conversational sentences except that the particles are interchanged. .. In spite of this, in the search results so far, there are cases where conversational sentences with different meanings are not always recognized as having the same meaning. Therefore, the parameterizing means 30 parameterizes the independent words "lineup A" and "image" included in these conversational sentences, that is, the entities.
 インテント決定手段31は、会話文受付手段21が受け付けた会話文に対応するインテントを決定する。 The intent determination means 31 determines the intent corresponding to the conversation sentence received by the conversation sentence reception means 21.
 詳しくは、インテント決定手段31は、最初、例えば、会話文受付手段21が受け付けた会話文に対応するテキストを取得する。テキストは、前述したように、例えば、会話文受付手段21が受け付けた会話文を音声認識した結果であるが、会話文受付手段21が受け付けた会話文そのものでもよい。 Specifically, the intent determination means 31 first acquires, for example, the text corresponding to the conversation sentence received by the conversation sentence reception means 21. As described above, the text is, for example, the result of voice recognition of the conversational sentence received by the conversational sentence receiving means 21, but may be the conversational sentence itself received by the conversational sentence receiving means 21.
 すなわち、インテント決定手段31は、音声である会話文が受け付けられた場合、当該会話文を音声認識し、テキストを取得する。なお、テキストである会話文が受け付けられた場合、インテント決定手段31は、当該テキストを取得すればよい。 That is, when the conversation sentence which is a voice is received, the intent determination means 31 voice-recognizes the conversation sentence and acquires the text. When a conversational sentence, which is a text, is accepted, the intent determination means 31 may acquire the text.
 次に、インテント決定手段31は、取得したテキストに対して、例えば、形態素解析、構文解析を行うことにより、当該テキストから1以上の自立語を取得する。なお、形態素解析は公知技術であり、詳しい説明を省略する。 Next, the intent determining means 31 acquires one or more independent words from the acquired text by, for example, performing morphological analysis and syntactic analysis. The morphological analysis is a known technique, and detailed description thereof will be omitted.
 そして、インテント決定手段31は、取得した1以上の自立語と同一または類似する単語を有する処理動作名を有するインテントを決定する。 Then, the intent determining means 31 determines an intent having a processing operation name having a word that is the same as or similar to the acquired one or more independent words.
 詳しくは、例えば、格納部1に類義語辞書が格納されている。類義語辞書とは、類義語に関する辞書である。類義語辞書には、インテント格納部12に格納されている1以上の各インテントを構成する処理動作名ごとに、当該処理動作名が有する単語と、当該単語の1または2以上の類義語とが登録されている。 For details, for example, a synonym dictionary is stored in the storage unit 1. A synonym dictionary is a dictionary related to synonyms. In the synonym dictionary, for each processing action name constituting each one or more intents stored in the intent storage unit 12, a word possessed by the processing action name and one or more synonyms of the word are included. It is registered.
 例えば、会話文受付手段21が会話文「メーカー〇〇の車の販売価格情報を教えて」を受け付けた場合、インテント決定手段31は、当該会話文から、「メーカー〇〇」や「車」や「販売価格情報」等の1以上の自立語を取得し、各自立語をキーにインテント格納部12を検索して、当該自立語と一致する処理動作名を有するインテントがあるか否かを判断する。なお、一致は、例えば、完全一致であるが、部分一致でもよい。そして、当該自立語と一致する単語を有する、処理動作を特定するインテントがある場合、インテント決定手段31は、当該インテントを決定する。本例では、表1の例に照らし合わせるのであれば、自立語「メーカー〇〇」と部分一致する単語「メーカー」、「車」と一致する単語「車」、「販売価格情報」と部分一致する単語「情報」とを有するインテントとして「画像情報検索」があるため、当該インテントが決定される。 For example, when the conversation sentence receiving means 21 accepts the conversation sentence "Tell me the selling price information of the car of the manufacturer 〇〇", the intent determination means 31 uses the conversation sentence as "maker 〇〇" or "car". Or "selling price information", etc., one or more independent words are acquired, the intent storage unit 12 is searched using each independent word as a key, and whether or not there is an intent having a processing operation name that matches the independent word. Judge. The match is, for example, an exact match, but may be a partial match. Then, when there is an intent that has a word that matches the independent word and specifies the processing operation, the intent determining means 31 determines the intent. In this example, in light of the example in Table 1, the word "maker" that partially matches the independent word "maker 〇〇", the word "car" that partially matches "car", and "selling price information" partially match. Since there is an "image information search" as an intent having the word "information", the intent is determined.
 なお、当該自立語と一致する単語を有する処理動作名を有するインテントがない場合、例えば、インテント決定手段31は、類義語辞書から、当該自立語に対応する1以上の類義語のうち、一の類義語を取得し、当該一の類義語をキーにインテント格納部12を検索して、当該当該一の類義語と一致する単語を有する処理動作名を有するインテントがあるか否かを判断する。そして、当該一の類義語と一致する単語を有する処理動作名を有するインテントがある場合、インテント決定手段31は、当該インテントを決定する。かかるインテントがない場合、インテント決定手段31は、他の類義語について、同様の処理を行い、インテントを決定する。どの類義語についても、かかるインテントがない場合、インテント決定手段31は、インテントが決定されない旨を出力してもよい。 When there is no intent having a processing operation name having a word matching the independent word, for example, the intent determining means 31 is one of one or more synonyms corresponding to the independent word from the synonym dictionary. The synonym is acquired, the intent storage unit 12 is searched using the one synonym as a key, and it is determined whether or not there is an intent having a processing operation name having a word matching the one synonym. Then, when there is an intent having a processing operation name having a word matching the one synonym, the intent determining means 31 determines the intent. If there is no such intent, the intent determining means 31 performs the same processing for other synonyms to determine the intent. For any synonym, if there is no such intent, the intent determining means 31 may output that the intent is not determined.
 会話文情報決定手段32は、インテント決定手段31が決定したインテントをキーにコーパス格納部14を検索し、当該インテントに対応する1以上の会話文情報の中から、会話文受付手段21が受け付けた会話文に最も近似する会話文情報を決定する。 The conversation sentence information determining means 32 searches the corpus storage unit 14 using the intent determined by the intent determining means 31 as a key, and the conversation sentence receiving means 21 is selected from one or more conversation sentence information corresponding to the intent. Determines the conversational sentence information that most closely matches the conversational sentence received by.
 会話文に最も近似する会話文情報とは、例えば、会話文に対する類似度が最も高い会話文情報である。すなわち、会話文情報決定手段32は、例えば、受け付けられた会話文と、決定されたインテントに対応する1以上の各会話文情報との類似度を算出し、類似度が最大の会話文情報を決定する。 The conversational sentence information that most closely resembles the conversational sentence is, for example, the conversational sentence information that has the highest degree of similarity to the conversational sentence. That is, the conversation sentence information determining means 32 calculates, for example, the degree of similarity between the accepted conversation sentence and each one or more conversation sentence information corresponding to the determined intent, and the conversation sentence information having the maximum similarity degree. To decide.
 または、会話文情報決定手段32は、例えば、受け付けられた会話文の名詞の位置を変数にしたテンプレートに一致する会話テンプレートを検索してもよい。すなわち、コーパス格納部14には、1または2以上の各エンティティ名を変数にしたテンプレートが格納されており、会話文情報決定手段32は、受け付けられた会話文の1または2以上の各エンティティ名の位置を取得し、取得したエンティティ名の位置に対応するテンプレートを会話文情報に決定する。なお、会話文の1以上の各エンティティ名の位置とは、1以上の各エンティティ名を有するテンプレートにおいて、当該エンティティ名が何番目のエンティティ名かを示す情報である。 Alternatively, the conversation sentence information determining means 32 may search for a conversation template that matches the template in which the position of the noun of the accepted conversation sentence is used as a variable. That is, the corpus storage unit 14 stores a template in which one or more entity names are used as variables, and the conversation sentence information determining means 32 stores each entity name of one or two or more of the accepted conversation sentences. Acquires the position of, and determines the template corresponding to the position of the acquired entity name as conversational sentence information. The position of each one or more entity names in the conversation sentence is information indicating the number of the entity name in the template having one or more entity names.
 エンティティ取得部33は、会話文情報決定手段32が決定した会話文情報に対応付いている1以上の各エンティティに対応し、会話文受付手段21が受け付けた会話文が有する単語である1以上のエンティティを取得する。 The entity acquisition unit 33 corresponds to one or more entities corresponding to the conversational sentence information determined by the conversational sentence information determining means 32, and is one or more words included in the conversational sentence received by the conversational sentence receiving means 21. Get an entity.
 エンティティ取得部33は、例えば、決定された会話文情報に対応付いている1以上のエンティティごとに、当該エンティティの開始位置および終了位置をコーパス格納部14から取得し、受け付けられた会話文から、当該開始位置および当該終了位置により特定される単語を取得する。 For example, the entity acquisition unit 33 acquires the start position and end position of the entity from the corpus storage unit 14 for each one or more entities corresponding to the determined conversation text information, and from the received conversation text, the entity acquisition unit 33 obtains the start position and the end position of the entity. Acquires the word specified by the start position and the end position.
 パラメータ取得部34は、エンティティ取得部33が取得した1以上の各エンティティに対応する1以上のパラメータを取得する。 The parameter acquisition unit 34 acquires one or more parameters corresponding to each of the one or more entities acquired by the entity acquisition unit 33.
 取得されるパラメータは、例えば、取得されたエンティティそのものであるが、取得されたエンティティを変換した情報でもよい。すなわち、例えば、取得された1以上のエンティティの中に日単語が含まれている場合、パラメータ取得部34は、当該日単語をパラメータである日情報に変換する。 The acquired parameter is, for example, the acquired entity itself, but it may be information obtained by converting the acquired entity. That is, for example, when the acquired day word is included in one or more entities, the parameter acquisition unit 34 converts the day word into the parameter day information.
 パラメータ取得部34を構成する判断手段341は、エンティティ取得部33が取得した1以上のエンティティの中に日単語が存在するか否かを判断する。詳しくは、例えば、格納部1に、1または2以上の日単語が格納されており、判断手段341は、取得された1以上のエンティティごとに、格納されているいずれかの日単語と一致するか否かの判別を行い、少なくとも1つのエンティティについての判別結果が一致を示す場合に、取得された1以上のエンティティの中に日単語が存在すると判断する。 The determination means 341 constituting the parameter acquisition unit 34 determines whether or not the day word exists in one or more entities acquired by the entity acquisition unit 33. Specifically, for example, one or two or more day words are stored in the storage unit 1, and the determination means 341 matches any of the stored day words for each of the acquired one or more entities. Whether or not it is determined is performed, and when the determination results for at least one entity show a match, it is determined that the day word exists in the acquired one or more entities.
 日情報取得手段342は、取得された1以上のエンティティの中に日単語が存在すると判断手段341が判断した場合に、当該日単語に対応する日変換情報を日変換情報格納部18から取得し、当該日変換情報を用いて、パラメータである日情報を取得する。 When the determination means 341 determines that the day word exists in one or more acquired entities, the day information acquisition means 342 acquires the day conversion information corresponding to the day word from the day conversion information storage unit 18. , The day information which is a parameter is acquired by using the day conversion information.
 具体的には、例えば、格納部1に、日単語「昨年」等が格納されており、会話文「ブランドBの昨年のパンフレットを見せて」が受け付けられて、3つのエンティティ「ブランドB」、「昨年」、および「パンフレット」が取得された場合、判断手段341は、エンティティ「先月」が日単語「先月」と一致することから、取得された3のエンティティの中に日単語が存在すると判断する。例えば、現在時刻情報を取得し、日情報(例えば“4/1~4/30”等)を取得する。 Specifically, for example, the Japanese word "last year" and the like are stored in the storage unit 1, and the conversational sentence "Show me the last year's pamphlet of brand B" is accepted, and the three entities "brand B", When "last year" and "brochure" are acquired, the determination means 341 determines that the day word exists in the acquired 3 entities because the entity "last month" matches the day word "last month". do. For example, the current time information is acquired, and the day information (for example, "4/1 to 4/30", etc.) is acquired.
 日情報取得手段342は、当該日単語「昨年」に対応する日情報取得情報(例えば、プログラム)を日変換情報格納部18から取得する。そして、日情報取得手段342は、当該日情報取得情報を用いて、MPU(Micro Processing Unit)の内蔵時計やNTPサーバ等から現在時刻情報(例えば“2020年5月10日11時15分”)を取得し、当該現在時刻情報が有する年(例えば“2020年”)に対して前の月(例えば“2019年”)を取得する。そして、日情報取得手段342は、当該前の月のカレンダー情報を参照して、当該前の年の初日から末日までの日情報“2019年1月1日~2019年12月31日”を取得する。 The day information acquisition means 342 acquires the day information acquisition information (for example, a program) corresponding to the day word "last year" from the day conversion information storage unit 18. Then, the day information acquisition means 342 uses the day information acquisition information to obtain the current time information (for example, “May 10, 2020 11:15”) from the built-in clock of the MPU (Micro Processing Unit), the NTP server, or the like. Is acquired, and the previous month (for example, "2019") is acquired for the year (for example, "2020") that the current time information has. Then, the day information acquisition means 342 acquires the day information "January 1, 2019-December 31, 2019" from the first day to the last day of the previous year by referring to the calendar information of the previous month. do.
 なお、会話文から取得された日単語が「今年」である場合、日情報取得手段342は、当該日単語「今年」に対応する日情報取得情報(例えば、API情報)を日変換情報格納部18から取得する。そして、日情報取得手段342は、当該日情報取得情報を用いて、内蔵時計等から現在時刻情報を取得し、当該現在時刻情報が有する年(例えば“2020年”)のカレンダー情報を参照して、当該年の初日から、当該現在時刻情報が有する日までの日情報(例えば、“2020年1月1日~2020年12月31日”)を取得する。 When the day word acquired from the conversation sentence is "this year", the day information acquisition means 342 stores the day information acquisition information (for example, API information) corresponding to the day word "this year" in the day conversion information storage unit. Obtained from 18. Then, the day information acquisition means 342 acquires the current time information from the built-in clock or the like by using the day information acquisition information, and refers to the calendar information of the year (for example, “2020”) possessed by the current time information. , Acquires day information (for example, "January 1, 2020 to December 31, 2020") from the first day of the year to the day of the current time information.
 また、取得された日単語が「昨日」である場合、日情報取得手段342は、当該日単語「昨日」に対応する日情報取得情報(例えば、メソッド)を日変換情報格納部18から取得する。そして、日情報取得手段342は、当該日情報取得情報を用いて、内蔵時計等から現在時刻情報を取得し、当該現在時刻情報が有する日の前の日の日情報(例えば、“5/9”)を取得する。 Further, when the acquired day word is "yesterday", the day information acquisition means 342 acquires the day information acquisition information (for example, a method) corresponding to the day word "yesterday" from the day conversion information storage unit 18. .. Then, the day information acquisition means 342 acquires the current time information from the built-in clock or the like by using the day information acquisition information, and the day information of the day before the day that the current time information has (for example, "5/9"). ”) Is acquired.
 エンティティ名取得手段343は、エンティティ取得部33が取得した1以上のエンティティごとに、当該エンティティに対応するエンティティ名をエンティティ格納部15から取得する。 The entity name acquisition means 343 acquires the entity name corresponding to the entity from the entity storage unit 15 for each one or more entities acquired by the entity acquisition unit 33.
 当該エンティティに対応するエンティティ名とは、当該エンティティが取得された会話文において、当該エンティティに対応するエンティティの位置に一致または類似する開始位置および終了位置と対になるエンティティ名である。エンティティ名取得手段343は、エンティティ取得部33が取得した1以上のエンティティごとに、例えば、当該エンティティに対応付いたエンティティ情報を用いて、当該エンティティに対応するエンティティ名をエンティティ格納部15から取得してもよい。 The entity name corresponding to the entity is an entity name paired with a start position and an end position that match or are similar to the position of the entity corresponding to the entity in the conversation sentence acquired by the entity. The entity name acquisition means 343 acquires the entity name corresponding to the entity from the entity storage unit 15 for each one or more entities acquired by the entity acquisition unit 33, for example, using the entity information associated with the entity. You may.
 具体的には、例えば、受け付けられた会話文「メーカー〇〇の車の販売価格情報を教えて」から3つのエンティティ「メーカー〇〇」、「車」、および「販売価格情報」が取得された場合、エンティティ名取得手段343は、コーパス格納部14に格納されている会話文情報「メーカー〇〇の車の販売価格情報を教えて」において、当該会話文情報に対応付けて格納されている3つのエンティティ情報のうち、受け付けられた会話文「メーカー〇〇の車の販売価格情報を教えて」における「メーカー〇〇」と同じ開始位置および終了位置を有する1つ目のエンティティ情報を用いて、「メーカー〇〇」に対応付いた「メーカエンティティ」を取得する。 Specifically, for example, three entities "maker 〇〇", "car", and "sales price information" were acquired from the accepted conversation "Tell me the selling price information of the car of maker 〇〇". In the case, the entity name acquisition means 343 is stored in association with the conversational text information in the conversational text information "Tell me the selling price information of the car of the manufacturer XX" stored in the corpus storage unit 14. Of the two entity information, using the first entity information that has the same start and end positions as "maker 〇〇" in the accepted conversation "Tell me the selling price information of the maker 〇〇's car", Acquire the "maker entity" corresponding to "maker 〇〇".
 また、エンティティ名取得手段343は、例えば、上記3つのエンティティ情報のうち、会話文「メーカー〇〇の車の販売価格情報を教えて」における「車」と同じ開始位置および終了位置を有する2つ目のエンティティ情報を用いて、「車」に対応付いた「車エンティティ」を取得し、さらに、会話文「メーカー〇〇の車の販売価格情報を教えて」における「販売価格情報」と同じ開始位置および終了位置を有する3つ目のエンティティ情報を用いて、「販売価格情報」に対応付いた「情報エンティティ」を取得する。 Further, the entity name acquisition means 343 has, for example, two of the above three entity information having the same start position and end position as the "car" in the conversation sentence "Tell me the selling price information of the car of the manufacturer XX". Using the entity information of the eyes, acquire the "car entity" corresponding to the "car", and start the same as the "sales price information" in the conversation "Tell me the sales price information of the car of the manufacturer 〇〇". The "information entity" corresponding to the "sales price information" is acquired by using the third entity information having the position and the end position.
 API情報取得手段35は、インテント決定手段31が決定したインテントに対応するAPI情報をAPI情報格納部13から取得する。 The API information acquisition means 35 acquires API information corresponding to the intent determined by the intent determination means 31 from the API information storage unit 13.
 API情報取得手段35は、例えば、インテント決定手段31が決定したインテントに対応する処理動作名を有するAPI情報を、API情報格納部13から取得する。 The API information acquisition means 35 acquires, for example, API information having a processing operation name corresponding to the intent determined by the intent determination means 31 from the API information storage unit 13.
 具体的には、API情報格納部13に、例えば、処理動作名“画像情報検索”と3以上のパラメータ特定情報“企業コード,Cope_code”,“車コード,Car_code”,および“情報コード,Info_code”などを有するAPI情報1が格納されており、インテント名「画像情報検索」で特定されるインテントが取得された場合、API情報取得手段35は、当該インテントが有する処理動作名“画像情報検索”を有するAPI情報1を取得する。 Specifically, in the API information storage unit 13, for example, the processing operation name "image information search" and three or more parameter specific information "company code, Cope_code", "vehicle code, Car_code", and "information code, Info_code" are stored. When the API information 1 having the above is stored and the intent specified by the intent name "image information search" is acquired, the API information acquisition means 35 has the processing operation name "image information" possessed by the intent. Acquire API information 1 having "search".
 問合情報構成部36は、パラメータ取得部34が取得した1以上のパラメータと、API情報取得手段35が取得したAPI情報とを用いて、問合情報を構成する。問合情報とは、情報検索するための情報であり、通常、実行可能な情報である。問合情報は、例えば、引数が挿入された関数またはメソッドであるが、完成されたSQL文でもよいし、URLとパラメータの組でもよい。 The inquiry information configuration unit 36 configures inquiry information by using one or more parameters acquired by the parameter acquisition unit 34 and the API information acquired by the API information acquisition means 35. Inquiry information is information for information retrieval, and is usually feasible information. The query information is, for example, a function or method in which an argument is inserted, but it may be a completed SQL statement or a set of URL and parameter.
 問合情報構成部36は、例えば、API情報取得手段35が取得したAPI情報が有する1以上の各変数の箇所に、各箇所に対応付くパラメータであり、パラメータ取得部34が取得したパラメータを配置することにより、問合情報を構成する。 The inquiry information configuration unit 36 is, for example, a parameter corresponding to each location at each of one or more variables possessed by the API information acquired by the API information acquisition means 35, and arranges the parameters acquired by the parameter acquisition unit 34. By doing so, the inquiry information is constructed.
 検索結果取得手段37は、問合情報構成部36が構成した問合情報を実行し、パラメータ化手段30により得られたパラメータを用いて格納部1(データベース)を検索することにより検索結果を取得する。また、検索結果取得手段37は、パラメータ化手段30により得られたパラメータを含むAPI情報を生成し、生成したAPI情報に基づいて格納部1(データベース)を検索してもよい。即ち、新たにパラメータを書き込むことにより、あるいは既に書き込まれているパラメータを新たなパラメータに書き換えることによりAPI情報を生成し、パラメータが反映されたAPI情報に基づいてデータベースを検索してもよい。なお、API情報やSQL等の問合のための情報や、検索結果取得手段37の詳しい動作については、具体例や変形例で説明する。 The search result acquisition means 37 executes the inquiry information configured by the inquiry information configuration unit 36, and acquires the search result by searching the storage unit 1 (database) using the parameters obtained by the parameterization means 30. do. Further, the search result acquisition means 37 may generate API information including the parameters obtained by the parameterization means 30 and search the storage unit 1 (database) based on the generated API information. That is, API information may be generated by writing a new parameter or by rewriting a parameter that has already been written to a new parameter, and the database may be searched based on the API information that reflects the parameter. Information for inquiries such as API information and SQL, and detailed operations of the search result acquisition means 37 will be described with specific examples and modified examples.
 出力部49は、各種の情報を出力する。各種の情報とは、例えば、検索した情報媒体の画像である。 The output unit 49 outputs various information. The various types of information are, for example, images of the searched information medium.
 出力部49は、例えば、受付部29が端末識別子と対に会話文等の情報を受信したことに応じて処理部39が各種の処理を行った結果である検索結果等の情報を、当該端末識別子で識別される端末に送信する。または、例えば、受付部29がタッチパネルやマイクロフォン等の入力デバイスを介して会話文等の情報を受け付けたことに応じて、出力部49は、検索結果等の情報を、ディスプレイやスピーカ等の出力デバイスを介して出力してもよい。 The output unit 49, for example, obtains information such as a search result, which is the result of various processing performed by the processing unit 39 in response to the reception unit 29 receiving information such as a conversation sentence in pairs with the terminal identifier. Send to the terminal identified by the identifier. Alternatively, for example, in response to the reception unit 29 receiving information such as a conversation sentence via an input device such as a touch panel or a microphone, the output unit 49 outputs information such as a search result to an output device such as a display or a speaker. It may be output via.
 ただし、出力部49は、各種の情報を、例えば、プリンタでプリントアウトしたり、記録媒体に蓄積したり、他のプログラムに引き渡したり、外部の装置に送信したりしてもよく、その出力の態様は特に限定されるものではない。 However, the output unit 49 may print out various types of information with a printer, store the information in a recording medium, pass it on to another program, or send it to an external device, and the output unit 49 may output the information. The embodiment is not particularly limited.
 検索結果出力手段41は、検索結果取得手段37を介して取得した検索結果を出力する。検索結果出力手段41は、例えば、会話文受付手段21がユーザ端末10から会話文を受信したことに応じて検索結果取得手段37が取得した検索結果としての画像を、当該ユーザ端末10に送信する。または、検索結果出力手段41は、例えば、会話文受付手段21がマイクロフォン等の入力デバイスを介して会話文を受け付けたことに応じて検索結果取得手段37が取得した検索結果としての画像を、ディスプレイ等の出力デバイスを介して表示してもよい。 The search result output means 41 outputs the search result acquired via the search result acquisition means 37. The search result output means 41 transmits, for example, an image as a search result acquired by the search result acquisition means 37 in response to the conversation sentence receiving means 21 receiving the conversation sentence from the user terminal 10 to the user terminal 10. .. Alternatively, the search result output means 41 displays, for example, an image as a search result acquired by the search result acquisition means 37 in response to the conversation sentence reception means 21 receiving the conversation sentence via an input device such as a microphone. It may be displayed via an output device such as.
 格納部19、テーブル格納部11、インテント格納部12、API情報格納部13、コーパス格納部14、エンティティ格納部15、および日変換情報格納部18は、例えば、ハードディスクやフラッシュメモリといった不揮発性の記録媒体が好適であるが、RAMなど揮発性の記録媒体でも実現可能である。 The storage unit 19, the table storage unit 11, the intent storage unit 12, the API information storage unit 13, the corpus storage unit 14, the entity storage unit 15, and the day conversion information storage unit 18 are non-volatile, for example, a hard disk or a flash memory. A recording medium is suitable, but a volatile recording medium such as a RAM can also be realized.
 格納部19等に情報が記憶される過程は、特に限定されるものではない。例えば、記録媒体を介して情報が格納部19等で記憶されるようになってもよく、ネットワークや通信回線等を介して送信された情報が格納部1等で記憶されるようになってもよく、あるいは、入力デバイスを介して入力された情報が格納部1等で記憶されるようになってもよい。入力デバイスは、例えば、キーボード、マウス、タッチパネル、マイクロフォン等、何でもよい。 The process of storing information in the storage unit 19 or the like is not particularly limited. For example, information may be stored in the storage unit 19 or the like via a recording medium, or information transmitted via a network, a communication line, or the like may be stored in the storage unit 1 or the like. Well, or the information input via the input device may be stored in the storage unit 1 or the like. The input device may be, for example, a keyboard, a mouse, a touch panel, a microphone, or the like.
 受付部29、会話文受付手段21、音声受付手段211、および音声認識手段212は、入力デバイスを含むと考えても、含まないと考えてもよい。受付部29等は、入力デバイスのドライバーソフトによって、または入力デバイスとそのドライバーソフトとで実現され得る。また、この受付部29としての機能は、ユーザ端末10内に実装されており、ユーザ端末10において取得した会話文情報が公衆通信網50を介して画像特定システム2に送られるものであってもよい。 The reception unit 29, the conversation text reception means 21, the voice reception means 211, and the voice recognition means 212 may or may not include the input device. The reception unit 29 and the like can be realized by the driver software of the input device or by the input device and the driver software thereof. Further, even if the function as the reception unit 29 is implemented in the user terminal 10 and the conversation text information acquired in the user terminal 10 is sent to the image identification system 2 via the public communication network 50. good.
 処理部39、インテント決定手段31、会話文情報決定手段32、エンティティ取得部33、パラメータ取得部34、API情報取得手段35、問合情報構成部36、検索結果取得手段37、判断手段341、日情報取得手段342、エンティティ名取得手段343、翻訳項目名取得手段344、テーブル識別子取得手段345、主キー識別子取得手段346、および変換パラメータ取得手段347は、通常、CPU(Central Processing Unit)あるいはMPUやメモリ等から実現され得る。処理部39等の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはROM等の記録媒体に記録されている。ただし、処理手順は、ハードウェア(専用回路)で実現してもよい。 Processing unit 39, intent determination means 31, conversational sentence information determination means 32, entity acquisition unit 33, parameter acquisition unit 34, API information acquisition means 35, inquiry information configuration unit 36, search result acquisition means 37, determination means 341, The day information acquisition means 342, the entity name acquisition means 343, the translation item name acquisition means 344, the table identifier acquisition means 345, the primary key identifier acquisition means 346, and the conversion parameter acquisition means 347 are usually used as a CPU (Central Processing Unit) or an MPU. It can be realized from memory or the like. The processing procedure of the processing unit 39 and the like is usually realized by software, and the software is recorded in a recording medium such as ROM. However, the processing procedure may be realized by hardware (dedicated circuit).
 出力部49、および検索結果出力手段41は、ディスプレイやスピーカ等の出力デバイスを含むと考えても含まないと考えてもよい。出力部49等は、出力デバイスのドライバーソフトによって、または出力デバイスとそのドライバーソフトとで実現され得る。 The output unit 49 and the search result output means 41 may or may not include output devices such as displays and speakers. The output unit 49 and the like can be realized by the driver software of the output device, or by the output device and the driver software thereof.
 なお、受付部29等の受信機能は、通常、無線または有線の通信手段(例えば、NIC(Network interface controller)やモデム等の通信モジュール)で実現されるが、放送を受信する手段(例えば、放送受信モジュール)で実現されてもよい。また、出力部49等の送信機能は、通常、無線または有線の通信手段で実現されるが、放送手段(例えば、放送モジュール)で実現されてもよい。 The receiving function of the reception unit 29 or the like is usually realized by a wireless or wired communication means (for example, a communication module such as a NIC (Network interface controller) or a modem), but a means for receiving a broadcast (for example, a broadcast). It may be realized by the receiving module). Further, the transmission function of the output unit 49 or the like is usually realized by a wireless or wired communication means, but may be realized by a broadcasting means (for example, a broadcasting module).
 次に、本発明を適用したメタデータ抽出システム100の処理動作について図8に示すフローチャートを参照しながら詳細に説明をする。 Next, the processing operation of the metadata extraction system 100 to which the present invention is applied will be described in detail with reference to the flowchart shown in FIG.
 ステップS11~S17は、メタデータ抽出装置1により、各情報媒体の画像からメタデータを抽出する処理動作を示している。 Steps S11 to S17 show a processing operation of extracting metadata from an image of each information medium by the metadata extraction device 1.
 ステップS11において、画像取得部5は、情報媒体の画像の取得を行う。情報媒体の画像は、ユーザ端末10を介して撮像されて公衆通信網50を介してメタデータ抽出装置1において受信され、或いはメタデータ抽出装置に取り付けられた図示しないカメラを介して直接的に撮像される。 In step S11, the image acquisition unit 5 acquires an image of the information medium. The image of the information medium is imaged via the user terminal 10 and received by the metadata extraction device 1 via the public communication network 50, or directly imaged through a camera (not shown) attached to the metadata extraction device. Will be done.
 ステップS12では、特徴マップ生成部6により、ステップS11において取得された画像について特徴マップを生成する。この特徴マップの生成は、画素単位、又は例えば複数の画素の集合体であるブロック領域単位で、周知の画像解析に基づいて、必要に応じてディープラーニング技術を利用した、解析画像の特徴量を2次元画像上に反映させることで行う。 In step S12, the feature map generation unit 6 generates a feature map for the image acquired in step S11. This feature map is generated on a pixel-by-pixel basis, or, for example, on a block area basis, which is an aggregate of a plurality of pixels. This is done by reflecting it on a two-dimensional image.
 次にステップS13へ移行し、総括推論部7は、特徴マップから対象物の抽出を行う。このステップS13では、テーブル格納部11に格納された総括推論モデルDB1を利用し、特徴マップから画像中の各対象の内容を推論し、メタデータを生成する。 Next, the process proceeds to step S13, and the general inference unit 7 extracts the object from the feature map. In this step S13, the content of each object in the image is inferred from the feature map by using the comprehensive inference model DB1 stored in the table storage unit 11, and metadata is generated.
 かかる場合には、図3に示すように、総括推論モデルDB1を構成する個別推論モデルDB11~DB15に特徴マップ、ひいてはこれを構成する特徴量が画素単位又はフロック領域単位で入力が入力される。それぞれ画像中の対象物の内容を判別することを可能とするため、各個別推論モデルDB11~DB15には、それぞれ対象の内容(「車」、「木」、「柵」等)や色、文字列等の各要素をそれぞれ判別可能とするための学習モデルが予め構築されている。ステップS13においてはこれらの学習モデルを参照し、対象の内容を抽出する。この各個別推論モデルDB11~DB15からは、例えば、「車:96.99%、木:2.01%」や、「赤 50.01%、緑 37.7%」等が出力されるが、個別推論結果としては、これらの出力された結果の1又は2以上について、予めシステム側又は運営側において決めたルールに基づいて個別推論結果を出力する。この個別推論結果が、特徴ラベルとなる。 In such a case, as shown in FIG. 3, the feature map and the feature amount constituting the feature map are input to the individual inference models DB11 to DB15 constituting the comprehensive inference model DB1 in pixel units or floc area units. In order to make it possible to discriminate the content of the object in each image, each individual inference model DB11 to DB15 has the content (“car”, “tree”, “fence”, etc.), color, and character of the object, respectively. A learning model is built in advance so that each element such as a column can be discriminated. In step S13, these learning models are referred to and the contents of the target are extracted. From each of the individual inference models DB11 to DB15, for example, "car: 96.99%, tree: 2.01%", "red 50.01%, green 37.7%", etc. are output. As the individual inference result, the individual inference result is output for one or two or more of these output results based on the rules determined in advance on the system side or the operation side. This individual inference result becomes a feature label.
 次にステップS14に移行し、紐づけ部8は、連想モデルセットTB2を参照し、ステップS13において出力された総括推論結果としての特徴ラベルから連想単語を導出する。上述したように、連想モデルセットTB2は、特徴ラベルと連想単語が互いに紐付いた状態で記憶されている。このため、特徴ラベルを入力することにより、これに紐づいている1又は2以上の連想単語を容易に抽出することができる。この連想単語を通じて、画像から抽出することができる対象の内容に加え、そこから想起することができる雰囲気や感覚、イメージや感情を膨らませることができる。例えば、連想モデルセットTB2において特徴ラベル「温泉」には、連想単語として「温かい」、「硫黄」、「アルカリ性」、「湯気」、「保養」等が紐付けられている。このため、総括推論結果として、「温泉」が特徴ラベルとして抽出された場合、これに関係するこれらの連想単語「温かい」、「硫黄」、「アルカリ性」、「湯気」、「保養」等のように、特徴ラベルからイメージを膨らませることが可能となる。なお、このステップS14において、この連想単語を導出する上で、必要に応じてインターネットに掲載されている情報を援用するようにしてもよい。かかる場合には、特徴ラベルを検索用タームとして検索エンジンを通じて検索した結果、出現頻度が高い単語を連想単語として取り込むようにしてもよい。 Next, the process proceeds to step S14, and the associating unit 8 refers to the associative model set TB2 and derives the associative word from the feature label as the overall inference result output in step S13. As described above, the associative model set TB2 is stored in a state where the feature label and the associative word are associated with each other. Therefore, by inputting the feature label, one or two or more associative words associated with the feature label can be easily extracted. Through this associative word, in addition to the content of the object that can be extracted from the image, the atmosphere, sensation, image, and emotion that can be recalled from it can be inflated. For example, in the associative model set TB2, the feature label "hot spring" is associated with associative words such as "warm", "sulfur", "alkaline", "steam", and "recreation". Therefore, when "hot spring" is extracted as a feature label as a result of general reasoning, these associative words "warm", "sulfur", "alkaline", "steam", "recreation", etc. related to this are used. In addition, it is possible to inflate the image from the feature label. In this step S14, information posted on the Internet may be used as necessary in deriving this associative word. In such a case, as a result of searching through a search engine using the feature label as a search term, a word having a high frequency of appearance may be taken in as an associative word.
 ステップS15では、ステップS11において取得した画像に対するメタデータの生成を行う。このメタデータは、上述した個別推論結果としての特徴ラベルの文字列に加え、ステップS14において導出された連想単語も含まれる。メタデータ抽出装置1は、このような特徴ラベル、連想単語からなるメタデータを画像と紐づけた上で、メタデータ格納部27にこれを格納するようにしてもよいし、この紐付けられた画像を画像データ格納部26に格納するようにしてもよい。このとき、画像がメタデータと紐づけられた画像メタデータ連想ラベルTB5として記憶するようにしてもよい(ステップS16)。 In step S15, metadata is generated for the image acquired in step S11. This metadata includes the associative word derived in step S14 in addition to the character string of the feature label as the individual inference result described above. The metadata extraction device 1 may associate the metadata consisting of such a feature label and an associative word with an image, and then store the metadata in the metadata storage unit 27, or the associated metadata may be stored. The image may be stored in the image data storage unit 26. At this time, the image may be stored as the image metadata associative label TB5 associated with the metadata (step S16).
 図9は、このようなメタデータの抽出から保存までの処理動作を模式的に示したものである。ステップS13において出力された総括推論結果としての特徴ラベルから連想モデルセットに基づき、連想単語を連想することを新たに取得した画像A1、A2、・・毎に行う。そして、特徴ラベルと、これから派生する連想単語とを各画像A1、A2、・・毎に紐付けた画像メタデータ連想ラベルTB5を生成し、これをメタデータ格納部27に格納する。これにより、特徴ラベルや連想単語からなるメタデータに対して各画像A1、A2、・・が紐付けられることで、逆にこのようなメタデータの入力を受け付けた場合に、これに紐付いている画像A1、A2、・・を特定することが可能となる。 FIG. 9 schematically shows the processing operation from extraction to storage of such metadata. Based on the associative model set from the feature label as the summary inference result output in step S13, the newly acquired images A1, A2, and so on are associated with the associative word. Then, an image metadata associative label TB5 in which the feature label and the associative word derived from the feature label are associated with each image A1, A2, ... Is generated, and this is stored in the metadata storage unit 27. As a result, each image A1, A2, ... Is associated with the metadata consisting of the feature label and the associative word, and conversely, when such metadata input is accepted, it is associated with this. Images A1, A2, ... can be specified.
 次にステップS17に移行し、生成されたメタデータ(特徴ラベル、連想単語)からエンティティテーブルTB3、コーパステーブルTB4を作成する。 Next, the process proceeds to step S17, and the entity table TB3 and the corpus table TB4 are created from the generated metadata (feature label, associative word).
 エンティティテーブルは、特徴ラベル及び連想単語から、上述したようにエンティティとエンティティ値とを互いに紐付けたからなるテーブルを作成する。このとき、個々のエンティティ及びエンティティ値の組み合わせに対して、それぞれ対応する画像A1、A2、・・を紐付けておく。これにより、事後的に画像を読み出す場合において、エンティティを介して実現することができ、検索の利便性を高めることができる。 The entity table creates a table consisting of the entity and the entity value linked to each other from the feature label and the associative word as described above. At this time, the corresponding images A1, A2, ... Are associated with each entity and the combination of the entity values. As a result, when the image is read out after the fact, it can be realized via the entity, and the convenience of the search can be enhanced.
 また、コーパステーブルTB4を作成する場合には、エンティティテーブルTB3に保存されたエンティティに加え、後述するように画像特定システム2において抽出されたインテントや会話文情報とを紐づけることで、コーパステーブルTB4を生成してもよい。 Further, when creating the corpus table TB4, in addition to the entity stored in the entity table TB3, the corpus table is linked with the intent and the conversation text information extracted by the image identification system 2 as described later. TB4 may be generated.
 次に、上述したステップS11~S17を通じて順次作成したエンティティテーブルTB3、コーパステーブルTB4、画像メタデータ連想ラベルTB5に基づいて、会話文情報の内容に対応する画像を特定する方法について、図8を参照しながら詳細に説明をする。 Next, refer to FIG. 8 for a method of specifying an image corresponding to the content of the conversation text information based on the entity table TB3, the corpus table TB4, and the image metadata associative label TB5 sequentially created through steps S11 to S17 described above. I will explain in detail while doing so.
 会話文情報の内容に対応する画像の特定は、画像特定システム2によりステップS21~26を通じて行う。 The image corresponding to the content of the conversation text information is specified by the image specifying system 2 through steps S21 to 26.
 ステップS21において、受付部29は、音声により発せられた会話文を認識する。この会話文の認識は、音声から取得する以外に手入力されたテキストデータに記述された会話文から行うようにしてもよい。 In step S21, the reception unit 29 recognizes the conversational sentence uttered by voice. The recognition of this conversational sentence may be performed from the conversational sentence described in the manually input text data other than the acquisition from the voice.
 次にステップS22に移行し、受付部29は、音声認識を通じて取得した会話文をテキストデータ化する。この音声データをテキストデータ化へ変換する処理動作においては、公知の変換手法を用いるようにしてもよい。 Next, the process proceeds to step S22, and the reception unit 29 converts the conversational sentence acquired through voice recognition into text data. In the processing operation of converting this voice data into text data, a known conversion method may be used.
 次にステップS23へ移行し、テキストデータ化した会話文を形態素解析や構文解析を行う。 Next, the process proceeds to step S23, and morphological analysis and parsing are performed on the conversational sentences converted into text data.
 次にステップS24へ移行し、ステップS23において形態素解析、構文解析されたたテキストデータから1以上の自立語を取得する。次にインテント決定手段31は、取得した1以上の自立語と同一または類似する単語を有するアクション名を有するインテントを決定する。例えば会話文が「ラインナップAの画像はある?」の場合には、形態素解析によって、会話文から2つの自立語「ラインナップA」、「画像」を取得し、各自立語をキーにインテント格納部12におけるコーパステーブルTB4を検索し、「ラインナップA」と部分一致するアクション名として“画像検索”を有するインテントを決定する。 Next, the process proceeds to step S24, and one or more independent words are acquired from the text data analyzed by morphological analysis and syntax in step S23. Next, the intent determining means 31 determines an intent having an action name having a word that is the same as or similar to one or more acquired independent words. For example, when the conversation sentence is "Is there an image of lineup A?", Two independent words "lineup A" and "image" are acquired from the conversation sentence by morphological analysis, and each independent word is used as a key to store the intent. The corpus table TB4 in the unit 12 is searched, and an intent having "image search" as an action name that partially matches "lineup A" is determined.
 また、ステップS24においては、会話文に含まれるエンティティを取得する。ステップS24では、この取得したエンティティを更にパラメータ化するようにしてもよい。例えば会話文が「ラインナップAの画像はある?」の場合には、形態素解析によって、会話文から2つの自立語「ラインナップA」、「画像」を取得し、このうちエンティティは「ラインナップA」となる。 Further, in step S24, the entity included in the conversation sentence is acquired. In step S24, the acquired entity may be further parameterized. For example, when the conversation sentence is "Is there an image of lineup A?", Two independent words "lineup A" and "image" are acquired from the conversation sentence by morphological analysis, and the entity is "lineup A". Become.
 次にステップS25に移行し、画像特定処理を行う。この画像特定処理は、ステップS24において決定したインテントや、抽出したエンティティに基づいて行う。 Next, the process proceeds to step S25, and image identification processing is performed. This image identification process is performed based on the intent determined in step S24 and the extracted entity.
 エンティティは、上述したエンティティ格納部15に格納されているエンティティテーブルTB3においてエンティティ及びエンティティ値に対してそれぞれに対応する画像が紐付けられた状態とされている。このため、ステップS24において会話文から抽出したエンティティを、このエンティティテーブルTB3におけるエンティティ及びエンティティ値と照らし合わせる。そしてこの照らし合わせたエンティティ及びエンティティ値に対応する画像を特定する。 The entity is in a state in which an image corresponding to each of the entity and the entity value is associated with the entity and the entity value in the entity table TB3 stored in the entity storage unit 15 described above. Therefore, the entity extracted from the conversation sentence in step S24 is compared with the entity and the entity value in the entity table TB3. Then, the image corresponding to the compared entity and the entity value is specified.
 例えば図10に示すように、「ラインナップA」がエンティティとして抽出されたのであれば、エンティティテーブルTB3におけるエンティティ及びエンティティ値と照らし合わせると、エンティティ値として「ラインナップA」に紐付けられた画像A1を特定することができる。また、エンティティとして「石焼き芋」が抽出されたのであれば、エンティティテーブルTB3におけるエンティティ及びエンティティ値と照らし合わせると、エンティティとして「石焼き芋」に紐付けられた画像A3を特定することができる。このようにエンティティテーブルTB3において引用する文言は、エンティティ又はエンティティ値の何れであってもよい。なお、画像を特定する上では、エンティティテーブルTB3の代替として、画像メタデータ連想ラベルTB5を参照するようにしてもよい。画像メタデータ連想ラベルTB5も同様に特徴ラベル、連想単語に対して、画像が紐付けられて記憶されている。このため、ステップS24において会話文から抽出したエンティティに対応する特徴ラベル、連想単語を抽出し、抽出した特徴ラベル、連想単語に紐付けられた画像を特定することができる。 For example, as shown in FIG. 10, if "lineup A" is extracted as an entity, when compared with the entity and the entity value in the entity table TB3, the image A1 associated with "lineup A" as the entity value is obtained. Can be identified. Further, if "Ishiyakiimo" is extracted as an entity, the image A3 associated with "Ishiyakiimo" can be specified as an entity by comparing with the entity and the entity value in the entity table TB3. As described above, the wording quoted in the entity table TB3 may be either an entity or an entity value. In specifying the image, the image metadata associative label TB5 may be referred to as an alternative to the entity table TB3. Similarly, the image metadata associative label TB5 also stores an image associated with the feature label and the associative word. Therefore, the feature label and the associative word corresponding to the entity extracted from the conversational sentence can be extracted in step S24, and the image associated with the extracted feature label and the associative word can be specified.
 ステップS25における画像の特定は、ステップS24において決定したインテントを更に活用するようにしてもよい。インテントは、コーパス格納部14に格納されたコーパステーブルTB4において、会話文情報、エンティティ、エンティティ値が紐づいている。そして、このエンティティ、エンティティ値は、エンティティテーブルTB3において画像と紐づいている。このため、抽出したインテントに基づき、このコーパステーブルTB4、ひいてはエンティティテーブルTB3を参照することで、より高精度な画像の特定を実現することができる。 The image specified in step S25 may further utilize the intent determined in step S24. The intent is associated with conversational sentence information, an entity, and an entity value in the corpus table TB4 stored in the corpus storage unit 14. Then, this entity and the entity value are associated with the image in the entity table TB3. Therefore, by referring to the corpus table TB4 and the entity table TB3 based on the extracted intent, it is possible to realize more accurate image identification.
 また、エンティティやエンティティ値に紐付けられる画像をインテント毎に整理してグルーピングし保存しておくことにより、ステップS24において決定したインテントに応じたグループにのみ焦点を当てて検索すればよいため、より迅速に画像を特定することが可能となる。 In addition, by organizing the images associated with the entity or the entity value for each intent, grouping them, and saving them, it is only necessary to focus on the group corresponding to the intent determined in step S24 and search. , It becomes possible to identify the image more quickly.
 なお、ステップS25において、画像の特定を行う過程では、APIを介して画像を検索するようにしてもよい。かかる場合において、検索結果取得手段37は、抽出したインテントやエンティティや等を含むAPI情報を生成し、生成したAPI情報に基づいて格納部1(データベース)から画像検索してもよい。即ち、パラメータ(インテントやエンティティ)が反映されたAPI情報に基づいて画像検索してもよい。 Note that in step S25, in the process of specifying the image, the image may be searched via the API. In such a case, the search result acquisition means 37 may generate API information including the extracted intents, entities, and the like, and perform an image search from the storage unit 1 (database) based on the generated API information. That is, an image search may be performed based on API information that reflects parameters (intents and entities).
 画像を特定した後、ステップS26へ移行し、この特定した画像を表示する。ユーザ端末10のユーザに画像を表示する場合には、画像特定システム2から公衆通信網50を介してユーザ端末10へ画像を送信し、ユーザ端末10を介して画像を表示する。また、画像特定システム2から直接的に画像を表示するようにしてもよく、かかる場合には出力部49を介して画像を表示する。 After specifying the image, the process proceeds to step S26, and the specified image is displayed. When displaying an image to the user of the user terminal 10, the image is transmitted from the image specifying system 2 to the user terminal 10 via the public communication network 50, and the image is displayed via the user terminal 10. Further, the image may be displayed directly from the image specifying system 2, and in such a case, the image is displayed via the output unit 49.
 上述した構成からなる本発明によれば、各情報媒体の画像にメタデータを関連付ける作業は非常に煩雑な作業を自動的に行うことが可能となる。写真について人間の視覚による判別や定義づけを経ることなくメタデータ化するができ、労力の負担を軽減できる。 According to the present invention having the above-mentioned configuration, it is possible to automatically perform a very complicated work of associating metadata with an image of each information medium. Photographs can be converted into metadata without going through human visual discrimination and definition, reducing the burden of labor.
 また本発明によれば、画像化された情報媒体の種類は多岐に亘る中、これら情報媒体の画像の事後的な検索を行う上で利便性がより高くなるようなメタデータを生成することができ、画像と関連付けすることができる。特に口語調の会話文で「ラインナップAの画像ある?」という文言が音声により取得された場合に、受け付けた会話文に対応する適切な情報媒体の画像を高精度に抽出することができ、またこれが実現できるようなメタデータの生成や関連付けを行うことが可能となる。 Further, according to the present invention, there are various types of imaged information media, and it is possible to generate metadata that is more convenient for performing an ex post facto search of images of these information media. Can be associated with an image. In particular, when the phrase "Is there an image of lineup A?" Is acquired by voice in a spoken conversation, it is possible to extract an image of an appropriate information medium corresponding to the received conversation with high accuracy. It is possible to generate and correlate metadata that can achieve this.
 しかもこのメタデータは、連想単語も含めているため、特徴ラベルのキーワードそのものに加え、そこから連想できる様々な単語が含まれる。このため、画像の特定時において、会話文中にこのような連想単語が含まれている場合においても、そこから画像を特定することもできる。 Moreover, since this metadata includes associative words, in addition to the keyword of the feature label itself, various words that can be associated with it are included. Therefore, at the time of specifying the image, even if such an associative word is included in the conversation sentence, the image can be specified from there.
 なお本発明は、上述した実施の形態に限定されるものでは無い。図3に示す総括推論モデルDB1は、図11に示すように個別推論モデルDB11~DB15に加え、更に個別推論モデルDB16を加えてもよい。 The present invention is not limited to the above-described embodiment. As shown in FIG. 11, the general inference model DB1 shown in FIG. 3 may further include an individual inference model DB16 in addition to the individual inference models DB11 to DB15.
 個別推論モデルDB16は、情報媒体の種類を推論するためのデータベースとし、例えば個別推論モデルDB16は、情報媒体の形状を介してその種類を推論する。個別推論モデルDB16は、情報媒体が、例えばパンフレットなのか、カタログなのか、財務諸表なのか、出勤簿なのか、或いはレントゲン写真なのかを判別する。このような情報媒体の種類に関する総括推論結果も同様に特徴ラベル化し、同様にエンティティ、エンティティ値化することで、事後的な検索の利便性を高めることができる。 The individual inference model DB 16 is a database for inferring the type of information medium, and for example, the individual inference model DB 16 infers the type through the shape of the information medium. The individual inference model DB 16 determines whether the information medium is, for example, a pamphlet, a catalog, financial statements, an attendance record, or an X-ray photograph. Similarly, the comprehensive inference result regarding the type of such an information medium is also feature-labeled, and the entity and the entity value are similarly converted, so that the convenience of the ex post facto search can be enhanced.
1 メタデータ抽出装置
2 画像特定システム
3 実行部
4 補助記憶部
5 画像取得部
6 特徴マップ生成部
7 総括推論部
8 紐づけ部
9 抽出部
10 ユーザ端末
11 テーブル格納部
12 インテント格納部
13 情報格納部
14 コーパス格納部
15 エンティティ格納部
18 日変換情報格納部
19 格納部
20 中央制御部
21 会話文受付手段
26 画像データ格納部
27 メタデータ格納部
29 受付部
30 パラメータ化手段
30 特徴マップ
31 インテント決定手段
32 会話文情報決定手段
33 エンティティ取得部
34 パラメータ取得部
35 情報取得手段
36 問合情報構成部
37 検索結果取得手段
39 処理部
41 検索結果出力手段
49 出力部
50 公衆通信網
100 メタデータ抽出システム
211 音声受付手段
212 音声認識手段
341 判断手段
342 日情報取得手段
343 エンティティ名取得手段
344 翻訳項目名取得手段
345 テーブル識別子取得手段
346 主キー識別子取得手段
347 変換パラメータ取得手段
 
1 Metadata extraction device 2 Image identification system 3 Execution unit 4 Auxiliary storage unit 5 Image acquisition unit 6 Feature map generation unit 7 General inference unit 8 Linking unit 9 Extraction unit 10 User terminal 11 Table storage unit 12 Intent storage unit 13 Information Storage unit 14 Corpus storage unit 15 Entity storage unit 18-day conversion information storage unit 19 Storage unit 20 Central control unit 21 Conversational text reception unit 26 Image data storage unit 27 Metadata storage unit 29 Reception unit 30 Parameterization means 30 Feature map 31 In Tent determination means 32 Conversational text information determination means 33 Entity acquisition unit 34 Parameter acquisition unit 35 Information acquisition means 36 Inquiry information configuration unit 37 Search result acquisition means 39 Processing unit 41 Search result output means 49 Output unit 50 Public communication network 100 Metadata Extraction system 211 Voice reception means 212 Voice recognition means 341 Judgment means 342 Day information acquisition means 343 Entity name acquisition means 344 Translation item name acquisition means 345 Table identifier acquisition means 346 Main key identifier acquisition means 347 Conversion parameter acquisition means 347

Claims (4)

  1.  情報媒体の画像に含まれる情報からメタデータを抽出するメタデータ抽出プログラムにおいて、
     上記情報媒体の画像から特徴を抽出した特徴マップを生成する特徴マップ生成ステップと、
     特徴マップと要素毎の特徴ラベルとが互いに関連付けられた1以上の個別推論モデルを参照し、上記特徴マップ生成ステップにおいて生成された特徴マップから上記要素毎の特徴ラベルをメタデータとして抽出する推論ステップとを有し、
     上記推論ステップでは、上記個別推論モデルにおいて関連付けられる特徴ラベルから連想可能な連想単語が互いに紐付けられた連想モデルセットを参照し、上記抽出した特徴ラベルから連想単語を導出し、
     上記推論ステップにおいて抽出した特徴ラベル並びに導出した連想単語からなるエンティティが、上記画像に対して1対1又は1対複数で紐付けられたエンティティテーブルを生成するエンティティテーブル生成ステップと、
     処理動作を特定するための各インテントに対して、会話文情報並びに当該会話文情報に対応付いている1以上の上記各エンティティに関するエンティティ情報が紐付けられたコーパステーブルを生成するコーパステーブル生成ステップと、
     会話文を受け付ける会話文受付ステップと、
     上記会話文受付ステップにおいて受け付けた会話文に対応する、処理動作を特定するためのインテントを決定するインテント決定ステップと、
     上記インテント決定ステップにおいて決定されたインテントに基づいて、上記コーパステーブル生成ステップにおいて生成したコーパステーブルを参照することにより、上記会話文受付ステップにおいて受け付けた1以上の会話文に含まれる1以上のエンティティを抽出するエンティティ抽出ステップと、
     上記エンティティテーブル生成ステップにおいて生成したエンティティテーブルを参照し、上記エンティティ抽出ステップにおいて抽出した1以上のエンティティに紐付けられた画像を特定する画像特定ステップとをコンピュータに実行させること
     を特徴とするメタデータ抽出プログラム。
    In a metadata extraction program that extracts metadata from information contained in images on information media
    A feature map generation step that generates a feature map that extracts features from the image of the above information medium, and
    An inference step that refers to one or more individual inference models in which a feature map and a feature label for each element are associated with each other, and extracts the feature label for each element as metadata from the feature map generated in the feature map generation step. And have
    In the inference step, the associative words that can be associated with the feature labels associated with the individual inference model are referred to the associative model set associated with each other, and the associative words are derived from the extracted feature labels.
    An entity table generation step in which an entity consisting of the feature label extracted in the inference step and the derived associative word generates an entity table associated with the image on a one-to-one or one-to-many basis.
    A corpus table generation step that generates a corpus table in which conversational text information and entity information related to one or more of the above entities corresponding to the conversational text information are associated with each intent for specifying a processing operation. When,
    Conversational sentence reception step that accepts conversational sentences, and
    The intent determination step for determining the intent for specifying the processing operation corresponding to the conversation sentence received in the above conversation sentence reception step, and the intent determination step.
    By referring to the corpus table generated in the corpus table generation step based on the intent determined in the intent determination step, one or more conversation sentences included in one or more conversation sentences received in the conversation sentence reception step. The entity extraction step to extract the entity and the entity extraction step
    Metadata that refers to the entity table generated in the entity table generation step and causes a computer to execute an image identification step that identifies an image associated with one or more entities extracted in the entity extraction step. Extraction program.
  2.  会話文を受け付ける会話文受付ステップと、
     上記会話文受付ステップにおいて受け付けた会話文に対応する、処理動作を特定するためのインテントを決定するインテント決定ステップと、
     上記インテント決定ステップにおいて決定されたインテントに基づいて、予め取得されたコーパステーブルを参照することにより、上記会話文受付ステップにおいて受け付けた1以上の会話文に含まれる1以上のエンティティを抽出するエンティティ抽出ステップと、
     予め取得されたエンティティテーブルを参照し、上記エンティティ抽出ステップにおいて抽出した1以上のエンティティに紐付けられた画像を特定する画像特定ステップとをコンピュータに実行させ、
     上記画像特定ステップは、画像から特徴を抽出した特徴マップと要素毎の特徴ラベルとが互いに関連付けられた1以上の個別推論モデルを参照し、情報媒体の画像から特徴を抽出した特徴マップから抽出した上記要素毎の特徴ラベル、並びに上記個別推論モデルにおいて関連付けられる特徴ラベルから連想可能な連想単語が互いに紐付けられた連想モデルセットを参照し、上記抽出した特徴ラベルから導出した連想単語からなるエンティティが、上記画像に対して1対1又は1対複数で紐付けられるエンティティテーブルを参照し、
     上記エンティティ抽出ステップは、各インテントに対して、会話文情報並びに当該会話文情報に対応付いている1以上の上記各エンティティに関するエンティティ情報が紐付けられるコーパステーブルを参照すること
     を特徴とする画像特定プログラム。
    Conversational sentence reception step that accepts conversational sentences, and
    The intent determination step for determining the intent for specifying the processing operation corresponding to the conversation sentence received in the above conversation sentence reception step, and the intent determination step.
    By referring to the corpus table acquired in advance based on the intent determined in the intent determination step, one or more entities included in the one or more conversation sentences received in the conversation sentence reception step are extracted. Entity extraction step and
    With reference to the entity table acquired in advance, the computer is made to execute an image identification step for specifying an image associated with one or more entities extracted in the above entity extraction step.
    The image identification step refers to one or more individual inference models in which the feature map extracted from the image and the feature label for each element are associated with each other, and is extracted from the feature map extracted from the image of the information medium. The entity consisting of the associative words derived from the extracted feature labels by referring to the associative model set in which the associative words that can be associated with the feature labels for each element and the feature labels associated with the individual inference model are associated with each other is , Refer to the entity table associated with the above image on a one-to-one basis or one-to-many basis.
    The entity extraction step is characterized by referring to a corpus table to which the conversational sentence information and the entity information related to one or more of the above-mentioned entities corresponding to the conversational sentence information are associated with each intent. Specific program.
  3.  情報媒体の画像に含まれる情報からメタデータを抽出するメタデータ抽出システムにおいて、
     上記情報媒体の画像から特徴を抽出した特徴マップを生成する特徴マップ生成手段と、
     特徴マップと要素毎の特徴ラベルとが互いに関連付けられた1以上の個別推論モデルを参照し、上記特徴マップ生成手段において生成された特徴マップから上記要素毎の特徴ラベルをメタデータとして抽出する推論手段を備え、
     上記推論手段では、上記個別推論モデルにおいて関連付けられる特徴ラベルから連想可能な連想単語が互いに紐付けられた連想モデルセットを参照し、上記抽出した特徴ラベルから連想単語を導出し、
     上記推論手段において抽出した特徴ラベル並びに導出した連想単語からなるエンティティが、上記画像に対して1対1又は1対複数で紐付けられたエンティティテーブルを生成するエンティティテーブル生成手段と、
     処理動作を特定するための各インテントに対して、会話文情報並びに当該会話文情報に対応付いている1以上の上記各エンティティに関するエンティティ情報が紐付けられたコーパステーブルを生成するコーパステーブル生成手段と、
     会話文を受け付ける会話文受付手段と、
     上記会話文受付手段において受け付けた会話文に対応する、処理動作を特定するためのインテントを決定するインテント決定手段と、
     上記インテント決定手段において決定されたインテントに基づいて、上記コーパステーブル生成手段において生成したコーパステーブルを参照することにより、上記会話文受付手段において受け付けた1以上の会話文に含まれる1以上のエンティティを抽出するエンティティ抽出手段と、
     上記エンティティテーブル生成手段において生成したエンティティテーブルを参照し、上記エンティティ抽出手段において抽出した1以上のエンティティに紐付けられた画像を特定する画像特定手段とをさらに備えること
     を特徴とするメタデータ抽出システム。
    In a metadata extraction system that extracts metadata from information contained in images on information media
    A feature map generation means for generating a feature map by extracting features from an image of the above information medium, and
    An inference means that refers to one or more individual inference models in which a feature map and a feature label for each element are associated with each other, and extracts the feature label for each element as metadata from the feature map generated by the feature map generation means. Equipped with
    In the inference means, the associative words that can be associated with the feature labels associated with the individual inference model are referred to the associative model set associated with each other, and the associative words are derived from the extracted feature labels.
    An entity table generation means for generating an entity table in which an entity consisting of a feature label extracted by the inference means and an associative word derived is associated with the image on a one-to-one or one-to-many basis.
    A corpus table generation means for generating a corpus table in which conversational text information and entity information related to one or more of the above entities corresponding to the conversational text information are associated with each intent for specifying a processing operation. When,
    Conversational sentence reception means for accepting conversational sentences,
    An intent determination means for determining an intent for specifying a processing operation corresponding to a conversation sentence received by the above conversation sentence reception means, and an intent determination means.
    By referring to the corpus table generated by the corpus table generation means based on the intent determined by the intent determination means, one or more conversation sentences included in one or more conversation sentences received by the conversation sentence reception means. Entity extraction means to extract entities and
    A metadata extraction system characterized by further including an image specifying means for specifying an image associated with one or more entities extracted by the entity extracting means by referring to the entity table generated by the entity table generating means. ..
  4.  会話文を受け付ける会話文受付手段と、
     上記会話文受付手段において受け付けた会話文に対応する、処理動作を特定するためのインテントを決定するインテント決定手段と、
     上記インテント決定手段において決定されたインテントに基づいて、予め取得されたコーパステーブルを参照することにより、上記会話文受付手段において受け付けた1以上の会話文に含まれる1以上のエンティティを抽出するエンティティ抽出手段と、
     予め取得されたエンティティテーブルを参照し、上記エンティティ抽出手段において抽出した1以上のエンティティに紐付けられた画像を特定する画像特定手段とを備え、
     上記画像特定手段は、画像から特徴を抽出した特徴マップと要素毎の特徴ラベルとが互いに関連付けられた1以上の個別推論モデルを参照し、情報媒体の画像から特徴を抽出した特徴マップから抽出した上記要素毎の特徴ラベル、並びに上記個別推論モデルにおいて関連付けられる特徴ラベルから連想可能な連想単語が互いに紐付けられた連想モデルセットを参照し、上記抽出した特徴ラベルから導出した連想単語からなるエンティティが、上記画像に対して1対1又は1対複数で紐付けられるエンティティテーブルを参照し、
     上記エンティティ抽出手段は、各インテントに対して、会話文情報並びに当該会話文情報に対応付いている1以上の上記各エンティティに関するエンティティ情報が紐付けられるコーパステーブルを参照すること
     を特徴とする画像特定システム。
    Conversational sentence reception means for accepting conversational sentences,
    An intent determination means for determining an intent for specifying a processing operation corresponding to a conversation sentence received by the above conversation sentence reception means, and an intent determination means.
    By referring to the corpus table acquired in advance based on the intent determined by the intent determination means, one or more entities included in one or more conversation sentences received by the conversation sentence reception means are extracted. Entity extraction method and
    It is provided with an image specifying means for specifying an image associated with one or more entities extracted by the above-mentioned entity extracting means by referring to an entity table acquired in advance.
    The image specifying means refers to one or more individual inference models in which the feature map extracted from the image and the feature label for each element are associated with each other, and is extracted from the feature map extracted from the image of the information medium. The entity consisting of the associative words derived from the extracted feature labels by referring to the associative model set in which the associative words that can be associated with the feature labels for each element and the feature labels associated with the individual inference model are associated with each other is , Refer to the entity table associated with the above image on a one-to-one basis or one-to-many basis.
    The entity extraction means refers to a corpus table to which conversation text information and one or more entity information related to each of the above entities associated with the conversation text information are associated with each intent. Specific system.
PCT/JP2021/036456 2020-12-15 2021-10-01 Metadata extraction program WO2022130734A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020-207695 2020-12-15
JP2020207695A JP6902764B1 (en) 2020-12-15 2020-12-15 Metadata extraction program

Publications (1)

Publication Number Publication Date
WO2022130734A1 true WO2022130734A1 (en) 2022-06-23

Family

ID=76753240

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/036456 WO2022130734A1 (en) 2020-12-15 2021-10-01 Metadata extraction program

Country Status (2)

Country Link
JP (1) JP6902764B1 (en)
WO (1) WO2022130734A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000353173A (en) * 1999-06-11 2000-12-19 Hitachi Ltd Method and device for sorting image with document, and recording medium
JP2010068434A (en) * 2008-09-12 2010-03-25 Toshiba Corp Metadata editing device and method for generating metadata

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000353173A (en) * 1999-06-11 2000-12-19 Hitachi Ltd Method and device for sorting image with document, and recording medium
JP2010068434A (en) * 2008-09-12 2010-03-25 Toshiba Corp Metadata editing device and method for generating metadata

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Program Bot Framework", 29 October 2018, NIKKEI BP, ISBN: 978-4-8222-5372-1, article JOE MAYO, FUMITAKA OSAWA, MIKI SHIMIZU: "Table of Contents; Program Bot Framework", pages: 1 - 18, XP009537837 *

Also Published As

Publication number Publication date
JP2022094677A (en) 2022-06-27
JP6902764B1 (en) 2021-07-14

Similar Documents

Publication Publication Date Title
US10621988B2 (en) System and method for speech to text translation using cores of a natural liquid architecture system
US7565139B2 (en) Image-based search engine for mobile phones with camera
US20210089571A1 (en) Machine learning image search
US8577882B2 (en) Method and system for searching multilingual documents
US20150169525A1 (en) Augmented reality image annotation
EP2402867A1 (en) A computer-implemented method, a computer program product and a computer system for image processing
US20120215533A1 (en) Method of and System for Error Correction in Multiple Input Modality Search Engines
US20090144056A1 (en) Method and computer program product for generating recognition error correction information
JP2011070412A (en) Image retrieval device and image retrieval method
WO2022134701A1 (en) Video processing method and apparatus
KR20100114082A (en) Search based on document associations
WO2024046189A1 (en) Text generation method and apparatus
CN112069326A (en) Knowledge graph construction method and device, electronic equipment and storage medium
US20150146040A1 (en) Imaging device
CN112860642A (en) Court trial data processing method, server and terminal
US9165186B1 (en) Providing additional information for text in an image
WO2022130734A1 (en) Metadata extraction program
JP2005202939A (en) Method of creating xml file
JP6954549B1 (en) Automatic generators and programs for entities, intents and corpora
CN112236768A (en) Search text generation system and search text generation method
JP2022181319A (en) Video search apparatus, video search system, and program
JP7207543B2 (en) Information recommendation device, information recommendation system, information recommendation method, and information recommendation program
US10579738B2 (en) System and method for generating a multi-lingual and multi-intent capable semantic parser based on automatically generated operators and user-designated utterances relating to the operators
JP6107003B2 (en) Dictionary updating apparatus, speech recognition system, dictionary updating method, speech recognition method, and computer program
KR101350978B1 (en) System for managing personal relationship using application and method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21906099

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21906099

Country of ref document: EP

Kind code of ref document: A1