WO2013054839A1 - 画像認識システムを備えた知識情報処理サーバシステム - Google Patents

画像認識システムを備えた知識情報処理サーバシステム Download PDF

Info

Publication number
WO2013054839A1
WO2013054839A1 PCT/JP2012/076303 JP2012076303W WO2013054839A1 WO 2013054839 A1 WO2013054839 A1 WO 2013054839A1 JP 2012076303 W JP2012076303 W JP 2012076303W WO 2013054839 A1 WO2013054839 A1 WO 2013054839A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
image
network
image recognition
information processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2012/076303
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
久夛良木 健
隆 薄
靖彦 横手
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CYBER AI ENTERTAINMENT Inc
Original Assignee
CYBER AI ENTERTAINMENT Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CYBER AI ENTERTAINMENT Inc filed Critical CYBER AI ENTERTAINMENT Inc
Priority to US14/351,484 priority Critical patent/US20140289323A1/en
Priority to EP12840365.6A priority patent/EP2767907A4/en
Publication of WO2013054839A1 publication Critical patent/WO2013054839A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/40Business processes related to social networking or social networking services
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • G06F16/434Query formulation using image data, e.g. images, photos, pictures taken by a user
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/40Business processes related to social networking or social networking services
    • G06Q10/42Determination of affinities or common interests between users
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/40Business processes related to social networking or social networking services
    • G06Q10/44Identification of trends within social networks, e.g. identification of trending topics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/40Business processes related to social networking or social networking services
    • G06Q10/48Business processes related to social networking or social networking services using social graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/08Annexed information, e.g. attachments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Definitions

  • the present invention recognizes an image signal reflecting a subjective field of view of a user obtained from a camera incorporated in a headset system that can be worn on the user's head via a network via the user's network terminal.
  • objects such as a specific object, general object, person, photograph, or scene that the user has been interested in
  • the name of the camera image can be extracted by two-way communication by voice between the server system and the user, and the extraction process and the image recognition result of the objects are extracted from the server.
  • the system side passes the earphone built in the headset system via the user's network terminal. , And notifies the voice information to the user.
  • the user by enabling the user to keep voice tags such as messages, tweets, questions, etc. for the various targets that the user is interested in, various users including themselves in different time spaces
  • voice tags such as messages, tweets, questions, etc.
  • the user receives various messages and tweets related to the subject accumulated in the server system by voice in synchronization with the focus on the subject. It enables the user to return a further voice response to each message or tweet, thereby invoking a wide range of social communication related to the common focus of various users.
  • a wide range of users and various keywords can be obtained by continuously collecting, analyzing and accumulating on the server system a wide range of social communications originating from the visual interest of the many users who have been evoked.
  • various interests can be acquired as dynamic interest graphs, providing highly customized services based on them, providing highly recommended recommendations, or dynamic advertisements and announcements
  • the present invention relates to a knowledge information processing server system provided with the image recognition system, which can be connected to an effective information providing service for the like.
  • This information providing apparatus includes an access history storage means for storing access frequency information indicating a frequency of access to each content by a user in association with user identification information for identifying the user, and access to each content between the users.
  • Inter-user similarity calculating means for calculating similarity between users representing similarity of tendency based on the access frequency information stored in the access history storage means, and similarity between users between the user and each user Calculated by the content score calculation means, which calculates the content score, which is information representing the usefulness of the content for the user, from the access frequency information of each user weighted by the degree, and the content score calculation means
  • the content score of each content is determined as the user identification information.
  • Index storage means for storing in association with each other, query input means for accepting an input of a query including user identification information transmitted from a communication terminal device, and content identification of content conforming to the query accepted by the query input means Providing information generation that obtains information and generates provision information from the obtained content identification information with reference to the content score stored in the index storage means in association with the user identification information included in the query
  • An information providing apparatus comprising: means; and provision information output means for outputting the provision information generated by the provision information generation means to the communication terminal device.
  • General object recognition is a technique in which a computer recognizes an object included in an image obtained by capturing a real-world scene with a general name.
  • all attempts were made to build rules and models by hand.
  • an approach based on statistical machine learning using computers attracted attention. Became the trigger for the general object recognition boom.
  • a keyword for an image can be automatically assigned to the target image, and the image can be classified and searched according to its semantic content.
  • the goal is to realize all human image recognition functions with a computer (Non-Patent Document 1).
  • General object recognition technology has advanced rapidly with the approach from image database and the introduction of statistical probability methods.
  • Non-patent Document 2 a method for learning object correspondence by using data obtained by manually assigning keywords to images and performing object recognition
  • Non-patent Document 2 a method based on local features (non-patented) Reference 3).
  • SIFT method Non-patent Document 4
  • Video Google Non-Patent Document 5
  • a technique called “Bag-of-Keypoints” or “Bag-of-Features” was announced.
  • a target image is treated as a set of representative local pattern image pieces called visual words, and the appearance frequency is expressed by a multidimensional histogram.
  • Non-Patent Document 6 Non-Patent Document 6
  • the image recognition system side constructed on the server side is inquired via the network for the image taken by the network terminal with the camera, and the huge image database stored on the server side is used.
  • the image recognition system side recognizes the main objects included in the uploaded image by comparing and collating those images with the image feature database group describing the characteristics of each object that has been learned in advance.
  • a service for promptly presenting the recognition result to the network terminal side has already started.
  • image recognition techniques a specific human face detection technique has been rapidly applied and developed as one of methods for identifying individuals. In order to accurately extract the face of a specific person from a large number of face images, it is necessary to perform prior learning of a large number of face images.
  • the amount of knowledge database that must be prepared becomes extremely large, so that it is necessary to introduce a somewhat large-scale image recognition system.
  • the system when detecting a general “average face” used for auto-focusing in electronic cameras or identifying a limited number of human faces, the system is small enough to fit in a small housing such as an electronic camera. It can now be easily implemented.
  • a map providing service using the Internet that has started in service in recent years, it has become possible to have a bird's-eye view of street photographs (street views) at key points on the map.
  • Non-Patent Document 7 the license plate of a car accidentally reflected, the face of a pedestrian, or the state of a private house that can be glimpsed across the road are filtered to a level that cannot be discriminated beyond a certain level. The need to redisplay has also emerged (Non-Patent Document 7).
  • AR augmented reality
  • a three-dimensional positioning system using position information that can be acquired from a GPS, a wireless base station, etc., a network mobile terminal that is integrally provided with a camera, a display device, etc. was calculated from the above three-dimensional positioning system.
  • the real world image captured by the camera and the annotation (annotation) accumulated as digital information on the server are superimposed, and the real world as an air tag (Airtag) floating in cyberspace (Non-patent Document 8).
  • SNS social networking sites
  • communication between users is organically promoted by a user search function, a message transmission / reception function, and a community function such as a bulletin board.
  • SNS users actively participate in bulletin boards where users with similar hobbies and preferences gather, exchange personal information such as documents, images, and voices, and introduce their friends to other acquaintances, etc. By doing so, we can further deepen the mutual connection between people and broaden communication on the network organically and broadly.
  • a comment-added moving image distribution system that enables shared communication using the moving image as a medium among a plurality of users by scrolling and displaying these comment groups on the moving image surface (Patent Document 2).
  • the system receives comment information from the comment distribution server and starts playback of the shared video, and reads and reads a comment corresponding to a specific video playback time of the video to be played from the comment information from the comment distribution server.
  • the comment group can be displayed together with the video during the video playback time associated with the comment group.
  • the comment information can be individually displayed as a list, and when specific comment data is selected from the displayed comment information, the video is reproduced from the video playback time corresponding to the comment grant time of the selected comment data. And the read comment data is redisplayed on the display unit. Also, a comment input operation by the user is accepted, and the video playback time at the time when the comment is input is transmitted as comment addition time together with the comment content to the comment distribution server.
  • SNS also has a movement to place more emphasis on the real-time nature of communication by greatly limiting the size of information packets that can be exchanged on the network.
  • These user's short tweets and address data such as URLs related to them are embedded in real-time and extensively on the Internet. Share the occasional experience with a wide range of users, not only by tweeting with the user's text, but also as a single piece of information that includes image and audio data.
  • a service for invoking real-time communication on a global scale has already started (Non-Patent Document 9).
  • the technique disclosed in Patent Document 3 is a voice document conversion device that inputs and generates document information by inputting voice, and includes a display device that receives document information output and displays it on a screen.
  • a speech document conversion apparatus a speech recognition unit for recognizing input speech, a conversion table for converting input speech into a kanji mixed text, and receiving and aligning speech recognized from the speech recognition unit, the conversion table
  • a document forming unit that retrieves and converts to a sentence and edits the document in a predetermined format, a document memory that stores and saves the edited document, transmits the stored document information, and displays other information and signals
  • a transmission / reception unit that transmits / receives data to / from the device, and the display device transmits / receives information / signals to / from the transmission / reception unit of the voice document conversion device, and the received document information is displayed as display information.
  • a display information memory is characterized in that it has a display panel for displaying the display information to the storage window.
  • a speech synthesis system that reads a sentence consisting of character information on a computer fluently in a specified language is one of the most advanced areas in recent years.
  • the speech synthesis system is also called a speech synthesizer and includes a text-to-speech system that converts text into speech, a system that converts phonetic symbols into speech, and the like.
  • Historically since the development of computer-based speech synthesis systems has progressed since the end of the 1960s, there were many impersonal inorganic materials that made the speech produced by the early speech synthesizers feel like computer-generated speech.
  • Speech synthesis is possible.
  • the speech synthesis system built on the server side can not only use an enormous dictionary, but also its voice algorithm itself can incorporate a large number of digital filters so that it can produce complex sounds close to humans.
  • the applicable range has expanded further in recent years.
  • Speech synthesis technology can be broadly divided into formant synthesis and connected synthesis.
  • parameters such as frequency and timbre are adjusted on a computer without using human speech to generate an artificial synthesized waveform. These are often often heard as artificial sounds.
  • the concatenated synthesis is basically a method of recording human speech and synthesizing speech close to the real voice by smoothly connecting the phoneme fragments and the like.
  • the speech recorded for a certain period of time is divided into “sounds”, “syllables”, “morphemes”, “words”, “spoken words”, “sentences”, etc., and indexed to create a searchable speech library group.
  • a speech library is extracted with optimal phonemes and syllables as appropriate, and finally converted into a fluent series of speech close to human speech with appropriate accents.
  • the technique disclosed in Patent Document 4 is a recording voice recording unit in addition to a recording voice storage unit, an input text analysis unit, a recording voice selection unit, a connection boundary calculation unit, a rule synthesis unit, and a connection synthesis unit.
  • the target is an animal or a person
  • a kind of visual wall will be created by inserting a camera-equipped mobile terminal between the target and yourself, and the search results will first be displayed on the mobile phone. Because I tried to check with the terminal, communication with the target and the people around it tended to be interrupted, even temporarily.
  • these series of search processes take a considerable amount of time, even if the user suddenly becomes interested in objects, people, animals, or scenes that he / she sees while going out, the above series of operations can be performed on the spot. In many cases, it was not possible to complete the process, and it was necessary to take the photograph once taken home and search again on the PC.
  • Augmented Reality In a service called Augmented Reality, which has recently been put into practical use, positioning obtained from GPS, etc. as one of the methods to link the actual space in which we exist and the cyber space configured in the computer network
  • direction information there is a method of using direction information facing the camera together.
  • the use of only the position information often makes it difficult to deal with the real world situation that changes every moment, such as the movement of the target object itself or the fact that the target does not exist at the time of observation.
  • Unlike various buildings and city landmarks that are basically fixedly associated with location information, it is possible to move and transport objects such as cars, moving people and animals, or concepts such as "sunset".
  • the system does not have an image recognition function, it is difficult to associate each other in an essential sense.
  • the target stream video includes a live video distribution by a general user in addition to a press conference, a presentation, a national assembly relay, an event, a sport, and the like.
  • video sharing services it is possible to share a “place” related to an event in progress in real time via a network. However, it takes time and patience to follow the endless live stream video distribution.
  • microblogs real-time message exchange services
  • the user is interested in the tweet about the object or situation that he / she is interested in at that time, and the object of interest of other users in the vicinity of the user or in the field of view. Can not be said to give enough effective awareness.
  • a network communication system is a multifunctional input / output device that can be connected to a network terminal that can be connected to the Internet by wire or wirelessly. Reflecting the user's subjective field of view and viewpoint obtained from a headset system that has the above microphone, one or more earphones, and one or more image pickup devices (cameras) as one body.
  • An image and a sound signal can be uploaded to the knowledge information processing server system provided with the image recognition system on the Internet via the network terminal, and the specific object or general object focused on by the user included in the image Voice recognition system for people, photos or scenes
  • the series of image recognition processes and image recognition results by the user are As a result of the cooperative operation with the synthesis system, the server system side via the Internet, via the user's network terminal, the image recognition result and the recognition process as audio information for the earphones incorporated in the user's headset system, It is possible to notify the user's network terminal as voice and image information, and for the voice recognition, the message or tweet that the user has spoken with his / her own voice to the target that has become image recognizable.
  • the server system side Analyzing, classifying, and accumulating messages and tweets through a network and sharing them among a wide range of users, including those who have seen the same target, can endow the visual curiosity of many users.
  • the server system statistically observes, accumulates, and analyzes communications between these wide-ranging users, and is specific to the user or specific users.
  • the location of dynamic interests and curiosity common to all users and their transitions can be expressed by the above-mentioned extensive “user” group, “keyword” group that can be extracted, and node groups related to various “targets”. It can be acquired as a dynamic interest graph that connects the two.
  • the server system side accurately extracts and recognizes the target by cooperative operation with the voice recognition system, and the image recognition result As a reconfirmation for the user from the server system side, the user explicitly makes a sound to the server system side.
  • the server system side extracts a new object or event group that co-occurs on the target based on the camera video reflecting the user's subjective field of view, and expresses the target more accurately As a co-occurrence event that can be performed, they are formed into a series of sentences, and by the cooperative operation with the speech synthesis system, it is possible to ask the user to confirm again by voice.
  • the present invention provides an image signal reflecting a user's subjective field of view obtained from a camera incorporated in a headset system that can be worn on the user's head via the network via the network terminal of the user.
  • objects such as a specific object, a general object, a person, a photograph, or a scene that the user has been interested in
  • objects can be extracted by two-way communication by voice between the server system and the user, so that the user's “ It enables object extraction and recognition processing that reflects “subjectivity”, and has the effect of improving the image recognition rate itself.
  • Targeting (pointing) operation by the user of the speech there by incorporating a two-way process of re-confirmation by the voice from the server side to it, it is possible to continuously machine learning to the image recognition system.
  • a dynamic interest graph having various keywords and various objects as constituent node groups can be acquired.
  • the collection frequency can be further increased. This makes it possible to more effectively incorporate human “knowledge” into a continuous learning process by a computer system.
  • the present invention uploads voice messages and tweets left by the user into the server system via the network for the target of the user who can be recognized by the knowledge information processing system including the image recognition system.
  • the server system side via the network, via the user's network terminal
  • the messages and tweets can be sent interactively by voice communication with the user.
  • the description of the interest graph held in the server system can be performed in real time on the server system side by analyzing and classifying the contents related to the messages and tweets left by the user for various objects. Based on the above, the main topic included in the message or tweet is extracted, and other highly related topics with the extracted topic as the central node are extracted. By making it possible to share with other users and groups of users over the network, it is possible to continuously induce network communication that originates from various objects and events seen by a wide range of users. It becomes.
  • not only the message or tweet issued from the user side but also various interests, curiosity, or questions emanating from the server system side can be raised to the user or a group of users.
  • a specific user shows a certain level of interest in a specific target beyond the range that can be assumed from the relationship between the target nodes described in the interest graph, or conversely, only a certain level of interest
  • the relevant question or comment from the server system side is given to the user or a specific
  • the network communication system includes a headset system 200, a network terminal 220, a knowledge information processing server system 300, a biometric authentication system 310, a speech recognition system 320, and a speech synthesis system 330.
  • One or more headset systems exist, and one or more headset systems are connected to one network terminal via a network 251.
  • One or more network terminals exist and are connected to the Internet 250.
  • the knowledge information processing server system is connected to the biometric authentication system 310, the speech recognition system 320, and the speech synthesis system 330 through networks 252, 253, and 254, respectively.
  • the biological information processing system may be connected to the Internet 250.
  • the network in this embodiment may be a dedicated line, a public line including the Internet, or a virtual dedicated line constructed using VPN technology on the public line. .
  • the network is defined as described above.
  • FIG. 2A shows a configuration example of a headset system 200 according to an embodiment of the present invention.
  • the headset system is an interface device that can use the network communication system 100 when worn by a user, as shown in FIG.
  • headset systems 200a to 200c are connected to network terminal 220a via connections 251a to 251c
  • headset systems 200d to 200e are connected to network terminal 220b via connections 251d to 251e.
  • the headset system 200 indicates any one of the headset systems 200a to 200f.
  • the headset systems 200a to 200f need not all be the same model. Any similar device having an equivalent function or a minimum feasible function may be used.
  • the headset system 200 includes the following element groups, but is not limited thereto, and some of them may be selected and mounted.
  • There are one or more earphones 202 which are monaural or stereo, and notify the user of various voice information including messages and tweets of other users, responses by voice from the server system, and the like.
  • One or more biometric authentication sensors 204 exist, and as one example, obtains vein information (from the eardrum or outer ear) which is one of the useful biometric identification information of the user, and cooperates with the biometric authentication system 310, The user, the headset system, and the knowledge information processing server system 300 are authenticated and associated.
  • One or more biometric information sensors 205 exist, and acquire various biometric information (vital signs) that can be detected such as a user's body temperature, heart rate, blood pressure, brain wave, respiration, eye movement, vocalization, and body movement.
  • the depth sensor 206 detects the movement of a living body of a certain size or larger, including a human, approaching a user wearing the headset system.
  • the image output device 207 displays various notification information from the knowledge information processing server system 300.
  • the position information sensor 208 detects the position (latitude / longitude, altitude, direction) of the user wearing the headset system.
  • the position information sensor may be equipped with a six-axis motion sensor or the like, so that the movement direction, direction, rotation, and the like are additionally detected.
  • the environment sensor 209 detects brightness, color temperature, noise, sound pressure level, temperature and humidity, etc. around the headset system.
  • the gaze detection sensor 210 directly detects the user's gaze direction by irradiating a safe light beam from a part of the headset system toward the user's pupil or retina and measuring the reflected light. To do.
  • the wireless communication device 211 performs communication with the network terminal 220 and communication with the knowledge information processing server system 300.
  • the power supply unit 212 refers to a battery or the like for supplying power to the entire headset system. However, when the power supply unit 212 can be connected to the network terminal by wire, the power supply unit 212 may be supplied from the outside.
  • FIG. 2C shows a configuration example of the network terminal 220 in one embodiment of the present invention.
  • network terminals 220a to 220f are client terminal devices widely used by users, including PCs, personal digital assistants (PDAs), tablets, mobile phones that can be connected to the Internet, smartphones, etc., which are connected to the Internet. It shows how it is being done.
  • the term “network terminal 220” refers to any one of the network terminals 220a to 220f connected to the Internet.
  • the network terminals 220a to 220f do not need to be the same model. Any terminal device having an equivalent function or a minimum feasible function may be used.
  • the network terminal 220 includes the following element groups, but is not limited thereto, and some of them may be selected and mounted.
  • the operation unit 221 is a user interface unit of the network terminal 220 together with the display unit 222.
  • the network communication unit 223 is in charge of communication with the Internet and communication with one or more headset systems.
  • the network communication unit may be IMT-2000, IEEE 802.11, Bluetooth, IEEE 802.3, or a unique wired / wireless standard, and a mixed form via a router.
  • the recognition engine 224 is an image optimized for the network terminal specialized in image recognition processing related to a limited object from the image recognition processing function of the image recognition system 301 which is a main component of the knowledge information processing server system 300.
  • a recognition program is downloaded from the knowledge information processing server system side and executed.
  • the network terminal side also has a part of the image detection / recognition function within a certain range, thereby reducing the processing load on the image recognition system side on the server side and the load on the network line.
  • preliminary preprocessing corresponding to steps 30-20 to 30-37 in FIG. 3A described later can be executed.
  • the synchronization management unit 225 performs synchronization processing with the server side when the line is temporarily disconnected due to a network failure or the like and the line is restored.
  • the CPU 226 is a central processing unit
  • the storage unit 227 is a main memory device, and is a primary and secondary storage device including a flash memory and the like.
  • the power supply unit 228 is a power supply such as a battery for supplying power to the entire network terminal.
  • These network terminals play a buffering role for the network. For example, even if information that is not important to the user is uploaded to the network side, it is a noise in the sense of linking with the user for the knowledge processing server system 300, and an unnecessary overhead for the network line. Therefore, by performing a certain degree of screening processing on the network terminal side as much as possible, it is possible to secure an effective network bandwidth for the user and improve the response speed for processing with high locality.
  • FIG. 3A is used to explain the flow of the target image extraction process 30-01 by the user's voice when focusing on the target that the user is interested in as an embodiment of the present invention.
  • a specific object, a general object, a person, a photograph, or a scene is collectively referred to as “target”.
  • the target image extraction process starts with a voice input trigger by the user in step 30-02.
  • a voice input trigger a specific word or a series of natural languages may be used, a user's utterance may be detected by detecting a change in sound pressure level, or a GUI on the network terminal 220 may be detected. Depending on the operation.
  • a series of target image extraction and image recognition process flows are executed in the order of voice recognition process, image feature extraction process, target object extraction process, and image recognition process. Specifically, the user's utterance is recognized from the waiting for a voice input command (30-04), a word string is extracted from a series of words uttered by the user by the voice recognition process, and image features are based on the word string. Perform image extraction processing based on the image feature group that can be extracted, and if there are multiple targets, or if it is difficult to extract features from the target itself, further images for the user By obtaining the input of the feature group, a process for more surely recognizing the target focused by the user on the server side is configured.
  • the pointing method of the target by the user's voice there are a plurality of cases in which the user performs pointing while selecting each of the image feature groups individually as exemplified in steps 30-06 to 30-15. It is assumed that there are more cases of pointing together as a series of words including a group of image features. In this case, there is a high possibility that a plurality of image feature element groups expressing the target are obtained from the object extraction processing using a plurality of image feature groups performed in parallel. If more features can be extracted therefrom, the pointing accuracy of the target object is further increased.
  • the image recognition processing 30-16 by the image recognition system is started with the image feature group that can be extracted as a clue.
  • Image recognition is performed by the general object recognition system 106, the specific object recognition system 110, and the scene recognition system 108. In FIG. 3A, these are expressed in a continuous flow, but the image recognition processing can be performed in parallel or further in parallel in each of general object recognition, specific object recognition, and scene recognition processing.
  • the processing time related to the recognition speed of the image recognition processing can be greatly shortened. As a result of the above, it becomes possible to notify the user of various recognition results related to the image-recognized target as an image recognition result related to the target by voice.
  • the system side correctly extracts the target that the user really paid attention to.
  • the question remains.
  • the knowledge information processing server system side equipped with the image recognition system examines the proximity situation of the target based on the camera video, and “co-occurrence with the target”. ”Are extracted (30-38), and these new feature elements not explicitly pointed to by the user are added to the reconfirmation elements (30-39).
  • the reconfirmation by voice (30-40) it is possible to reconfirm that the target object of the user and the object extracted by the server system side are the same.
  • the above-described series of processes is basically a process related to the same object, and the user can always shift his / her interest to another object in his / her action. Therefore, a larger outer process loop including the steps in FIG. 3A is also included.
  • the image recognition processing loop may be started when the user wears the headset system, may be started by a voice trigger similar to Step 30-02, or operates the network terminal. You can start by things, but not necessarily.
  • the stop of the processing loop may be performed when the user removes the headset, or may be triggered by voice, or may be stopped by operating the network terminal, as in the means for starting the processing loop. However, it is not necessarily limited to them.
  • the target recognized as a result of the user's attention may be configured to be able to answer an inquiry at a later date by adding the spatiotemporal information and recording it in a graph database 365 described later.
  • the target image extraction process shown in FIG. 3A is an important process in the present invention, and each step will be described below.
  • a voice input trigger (30-02) is generated by the user, and after uploading of the camera image (30-03) is started, a word string is extracted from the user target detection command by the voice recognition processing 30-05, and the word If the column matches any feature in the condition groups 30-07 to 30-15, the sequence is transferred to the image feature extraction process.
  • the word string is “target name” (30-06)
  • the annotation is assumed to reflect a certain recognition judgment of the user.
  • Object recognition execution (110) processing is performed. If there is a discrepancy between the comparison result and the annotation, or if there is a question, the user is urged to call out that there is a possibility of misrecognition by the user.
  • execution (106) of general object recognition related to the general noun is performed, and the target is extracted from the image feature.
  • a scene recognition execution (108) process related to the scene is performed, and a target region is extracted from the image feature.
  • it may be specified as a scene including a plurality of features. For example, it is a designation method such as (state) yellow (color) taxi (general object) running on the left side (position) of a road (general object) and number “1234 (specific object)”.
  • These target designations may be a series of words, or may be designated individually.
  • a new image feature can be added and the objects can be narrowed down through a reconfirmation process by the image recognition system.
  • the image extraction result is reconfirmed by issuing a voice question to the user, for example, “is it?” (30-40). If the target object is extracted as intended by the user with respect to the reconfirmation contents, the user speaks a word or a word indicating that and executes Step 30-50 “End of camera image upload”, The target image extraction process is terminated (30-51). On the other hand, if it is different from the user's intention, the process returns to the step 30-04 “Waiting for voice command input” to input a further image feature group.
  • the process is interrupted (QUIT) and the target image extraction process is terminated. .
  • the color extraction process 30-20 is performed.
  • a method of extracting a range for each color in the RGB three primary colors may be used, or they may be extracted in the YUV color space. Moreover, it is not limited to these specific color space expressions.
  • the object is separated and extracted (30-29), and segmentation (cutout area) information is obtained.
  • the target image recognition process (30-16) is performed using the segmentation information as a clue.
  • the shape feature extraction 30-21 is performed.
  • the outline and main shape features are extracted while performing edge tracking on the target, and then the shape template / matching process is performed.
  • other methods may be used.
  • the object is separated (30-30), and segmentation information is obtained.
  • the target image recognition process (30-16) is performed using the segmentation information as a clue.
  • co-occurrence objects and co-occurrence events are extracted (30-38), and descriptions about all the feature groups that can be extracted are generated (30-39). Ask for reconfirmation (30-40). If the result is YES, the uploading of the camera image is terminated (30-50), and the target image extraction process by voice is terminated (30-51).
  • the object size detection process 30-22 is performed. Is called.
  • the relative size comparison of the target object separated by the feature extraction process other than the size in the object size detection process with other objects in the vicinity is interactive voice communication with the user. It is executed by. For example, it is an instruction such as “larger than the next to the left”. The reason for this is that when there is a single object, there is no specific index that can be used to compare the size of the target. However, other methods may be used.
  • the object is separated (30-31) to obtain segmentation information.
  • the target image recognition process (30-16) is performed using the segmentation information as a clue. Thereafter, using the image recognition processing result, co-occurrence objects and co-occurrence events are extracted (30-38), and descriptions about all the feature groups that can be extracted are generated (30-39). Ask for reconfirmation (30-40). If the result is YES, the uploading of the camera image is terminated (30-50), and the target image extraction process by voice is terminated (30-51).
  • the luminance detection process 30-23 is performed.
  • the luminance of the specific area is obtained from the RGB three primary colors or from the YUV color space, but other methods may be used.
  • extraction of relative luminance compared with the surroundings of the target is executed by interactive voice communication with the user. For example, an instruction such as “shining brighter than the surroundings”.
  • the object is separated (30-32), and segmentation information is obtained.
  • the target image recognition process (30-16) is performed using the segmentation information as a clue.
  • co-occurrence objects and co-occurrence events are extracted (30-38), and descriptions about all the feature groups that can be extracted are generated (30-39).
  • Ask for reconfirmation (30-40). If the result is YES, the uploading of the camera image is terminated (30-50), and the target image extraction process by voice is terminated (30-51).
  • the depth detection process 30-24 is performed.
  • the depth may be directly measured using the depth sensor 206 provided in the user's headset system 200, or may be calculated by calculation from parallax information obtained from two or more camera images. good. Also, other methods may be used.
  • the object is separated (30-33), and segmentation information is obtained.
  • the target image recognition process (30-16) is performed using the segmentation information as a clue.
  • the target region detection 30-25 is performed. Done.
  • the entire camera image reflecting the user's main field of view is divided into meshes at equal intervals in advance, and the target is specified from an area designation such as “upper right corner” as an interactive instruction from the user. It may be narrowed down or specified by a place where the target exists, such as “on the desk”. Moreover, the designation
  • region may be sufficient. After detecting the position / region where the target exists, the target is separated (30-34), and segmentation information is obtained.
  • the target image recognition process (30-16) is performed using the segmentation information as a clue. Thereafter, using the image recognition processing result, other co-occurrence objects and co-occurrence events are extracted (30-38), and a description including the co-occurrence feature group that can be extracted is generated (30-39). The user is requested to confirm again with the description (30-40). If the result is YES, the uploading of the camera image is terminated (30-50), and the target image extraction process by voice is terminated (30-51).
  • the co-occurrence relationship related to the target Detection 30-26 is performed.
  • the segmentation information related to the corresponding feature extracted by each process (106, 108, 110, 30-20 to 30-28) shown in FIG. 3A is used to deal with the segmentation information.
  • the target is extracted by examining the co-occurrence relationship with each feature. For example, an instruction such as “It is reflected together with” is used, but other methods may be used.
  • the target is separated based on the positional relationship between the target and the other object (30-35), and segmentation information related to the target is obtained.
  • the target image recognition process (30-16) is performed using the segmentation information as a clue.
  • other co-occurrence objects and co-occurrence events are extracted (30-38), and a description including the co-occurrence feature group that can be extracted is generated (30-39).
  • the motion detection process 30-27 is performed.
  • the motion detection processing by referring to a plurality of images continuously developed on the time axis, each image is divided into a plurality of mesh regions, and the regions are compared with each other, thereby moving the camera itself.
  • a region that is relatively moved individually is found, and a difference extraction (30-36) process of the region is performed, and segmentation information relating to the region that is relatively moved compared to the surroundings Get.
  • other methods may be used.
  • target image recognition processing (30-16) is performed.
  • the state detection process 30-28 is performed.
  • the state of an object for example, a motion state (stationary, moving, vibration, floating, rising, descending, flying, rotating, migrating, approaching, moving away, etc.), operating state (running, jumping) , Crouching, sitting, sleeping, lying down, sleeping, eating, drinking, observable emotions, etc.)
  • estimation and extraction (30-37) is performed from a plurality of continuous image groups to obtain segmentation information.
  • target image recognition processing (30-16) is performed.
  • the user can stop the target image extraction process by the user's utterance in the reconfirmation (30-40) step shown in FIG. If the cancel command is recognized in the voice recognition process 30-05, the process proceeds to step 30-50 to end the camera image upload and the voice target image extraction process is ended (30-51).
  • the processing time is prolonged for a certain time or longer, the situation indicating the progress of the processing for the purpose of continuing to attract interest to the user, and related information by voice I can tell you. For example, “We are still inquiring the server about the recognition process we are focusing on. Currently, people are paying attention to the same target. Please wait a little longer.”
  • a progress message such as “The progress is halfway” can be returned to the user by voice.
  • FIG. 3A will be described from the flow of data using FIG. 3B.
  • the input is an image 35-01 and an utterance 35-02.
  • the recognition / extraction processing control 35-03 one or more of steps 30-06 to 30-15 in FIG. 3A are executed in response to the input of the utterance 35-02, and step 35-16 in FIG. 3A is executed for the image 35-01.
  • at least one of general unit recognition processing by the general object recognition processing system 110, specific object recognition processing by the specific object recognition system 110, and scene recognition processing by the scene recognition system 108 is executed.
  • Each functional block of the image recognition systems 106, 108, and 110 can be further parallelized for each execution unit, and is distributed to one or more processes by the image recognition process dispatch 35-04 and executed in parallel.
  • steps 30-07 to 30-15 in FIG. 3A are executed for the input of the utterance 35-02
  • the feature extraction processing 30-20 to 30-28 and the separation extraction processing 30-29 to 30- 37 is executed.
  • the recognition / extraction processing control 35-03 when the user's utterance includes a word that affects the processing order (for example, in the case of “above”, it is necessary to recognize the image of “ ⁇ ”). Next, “up” is processed), and the order control is performed.
  • the recognition / extraction processing control 35-03 accesses a graph database 365 described later and extracts a representative node 35-06 (if the node does not exist in the database, a new representative node is generated. )
  • the image 35-01 is processed in accordance with the utterance 35-02, and the graph structure 35-07 as a result relating to the simultaneously executed recognition / extraction processing groups is accumulated in the graph database 365.
  • a series of data flows by the recognition / extraction processing control 35-03 for the input image 35-01 continues as long as the utterance 35-02 is valid for the input image.
  • FIG. 4A the target pointing operation by the voice of the user in one embodiment of the present invention will be described. This is an application to the procedure described in FIG. 3A.
  • the location of FIG. 4A (A) is around Times Square, Manhattan, New York. It is assumed that a user at this place or a user who has seen this picture tweeted utterance 41 “A yellow taxi on the load on the left side”. From here, the speech recognition system 320 extracts a plurality of character strings or word strings from the utterance 41. There are five words that can be extracted from the utterance: “One” “Yellow” “Taxi” looks like “Left” “On the road”.
  • the “target name”, “target color information”, “target position”, “region where the target exists” and the target target in the target image extraction flow shown in FIG. It turns out that it is the object of. From these clues, detection / extraction processing of the object having the image feature group is started, and it is possible for the image recognition system side to reply to the user by voice that there is a possibility that it is a dotted circle (50) taxi.
  • the reconfirmation content there is a case where the reconfirmation content is not always reliable if it is reconfirmed only with the feature element group explicitly indicated by the user.
  • Each of these detectable word strings indicates “unique name”, “general noun”, “scene”, “color”, “position”, “region”, “location”, etc., and image detection / extraction processing corresponding to them is performed. Executed. The result is delivered to the knowledge information processing server system 300 together with the spatiotemporal information and the image information. Note that the image shown in FIG. 4A is an example of the present invention and is not limited thereto.
  • FIG. 4B (A) is a snapshot of a portion of the graph structure (described below) acquired for an image reflecting the user's main field of view described in FIG. 4A.
  • the relationship between the image recognition process and the graph structure will be described.
  • the node (60) is a node representative of FIG. 4A, and is linked to the node (61) that records the image data of FIG. 4A. Hereinafter, information is expressed using a node-node link.
  • the node (60) is also linked to the node (62) representing the place and the node (63) representing the time, thereby holding information on the shooting location and time. Further, the node (60) is linked to the node (64) and the node (65).
  • the node (64) is a node representing the object of the dotted circle (50) in FIG. 4A, and the feature amount T1 (65), the feature amount T2 (66), the color attribute (67), and the cutout are expressed by the utterance 41.
  • the node (65) is a node representing the object of the dotted circle (51) in FIG. 4A, and holds the same information as the node (64). Note that the node (60), that is, FIG. 4A is linked to the node (77) as the subjective visual image of the user 1.
  • FIG. 4B (B) information held by the node (81) representing the subjective view of the node (80) representing the user 2 is shown in FIG. 4B (B).
  • the node (82) is a target representative node corresponding to the dotted circle (51) in FIG.
  • feature amounts C1 (84) and C2 (85) are held as information.
  • B1 (70) and B2 (71) that are feature quantities linked to the node (65) and C1 (84) and C2 (85) that are feature quantities linked to the node (82) are general object recognition systems.
  • the representative feature amount D (91) is calculated and attached to the learning when it is determined that they are the same target (that is, they belong to the same category), or when they can be statistically new centroids. Is done.
  • the learning result is recorded in the Visual Word dictionary 110-10.
  • a partial graph is generated by linking the node (90) representing the object and the subnode groups (91 to 93 and 75 to 76), and the node (60) links the node (65) to the node (90). ) And replace the link.
  • the node 81 replaces the link with the node 82 with the link with the node 90.
  • the feature group extracted in the feature extraction process corresponding to steps 30-20 to 30-28 shown in FIG. 3A can be expressed as a graph structure having the user's speech, segmentation information, and the feature as nodes.
  • the graph structure holds the feature nodes relating to the colors.
  • the graph structure is compared with the subgraph when a representative node related to the object already exists.
  • the graph structure is a partial graph of the representative node (64).
  • Such integration of the graph structure may be recorded. Thereby, in this example, since the relationship between the user's utterance and the color feature can be recorded, the probability of the color feature corresponding to “yellow” is increased.
  • the database group (107, 109, 111, 110-10) related to image recognition described later and the graph database 365 described later grow (acquire new data) by the procedure described above.
  • the case of a general object has been described. However, even for a specific object, a person, a photograph, or a scene, information related to the object is similarly stored in the database group.
  • the procedure can be used, for example, when selecting a target target of the user from a plurality of target candidates that can be extracted in steps 30-38 and 30-39 of the procedure in FIG. 3A.
  • Step (S10) extracts a representative node corresponding to the co-occurring object / event as a result of step 30-38 from the graph database 365 (S11).
  • This step is performed by accessing the graph database in Step 30-16 and Steps 30-20 to 30-28 shown in FIG. 3A.
  • the target nodes (64) and (65) can be extracted from the link of the node 60 in FIG. 4A and the two color nodes (67) and (72).
  • step (S11) one or more representative nodes can be extracted.
  • step (S13) one representative node is stored in variable i.
  • the number of nodes referring to the representative node of the variable i is stored in the variable n_ref [i] (S14).
  • the link from the node referring to the node (90) is the link of the dotted circle (94), and is “3”.
  • the total number of nodes in the subgraph of node i is substituted for n_all [i] (S15). In the node (90) of FIG. 4B (C), “5” is substituted.
  • n_ref [i] greater than a specified value? Is judged. If YES, 1 is substituted for n_fea [i] (S17), and if NO, 0 is substituted (S18).
  • step (S19) a value obtained by dividing the number of nodes corresponding to the feature spoken by the user by n_all [i] is added to n_fea [i] by the procedure shown in FIG. 3A in the subgraph of node i. .
  • the binomial set ⁇ n_all [i], n_fea [i] ⁇ is set as the selection priority for the node i.
  • the graph structure reflecting the learning result by the image recognition process is used as the calculation reference, and the learning result can be reflected in the selection priority.
  • the node related to the feature is added to the representative node, so the selection calculated in the above step The priority changes.
  • the calculation of the selection priority is not limited to this method.
  • the link weight may be considered.
  • the node (74) and the node (75) are counted with the other nodes having the same weight, and the number of nodes is counted.
  • the node (74) and the node (75) are strongly related to each other. You may count as one node. In this way, the relationship between nodes may be considered.
  • the second term is a value of “1” or more among the node groups arranged in descending order of the value of the first term of the selection priority.
  • the second term is calculated from the relationship with the specified value in step (S16). That is, it is calculated from the non-reference number of the representative node. For example, when the specified value in step (S16) is set to “2”, a representative node to which a plurality of two or more users are linked (that is, a target object of the user once) is selected.
  • the selection priorities expressed by the binomial sets may be normalized and compared as a two-dimensional vector.
  • the feature quantity node in the subgraph related to the representative node, the representative feature quantity within the corresponding class of the node (91) in the example of FIG. 4B (C) (for example, the feature quantity in the Visual Word dictionary 110-10)
  • the selection priority may be calculated in consideration of the distance to
  • the upload of the camera image may be terminated (30-50) on the assumption that the object has been recognized by the user. .
  • the present invention includes an image recognition system 301, a biometric authentication unit 302, an interest graph unit 303, a voice processing unit 304, a situation recognition unit 305, a message storage unit 306, a reproduction processing unit 307, and a user management unit 308.
  • the present invention is not limited thereto, and some of them may be selected and configured.
  • the voice processing unit 304 converts the user's utterance picked up by the headset system 200 worn by the user into an utterance word string using the voice recognition system 320.
  • an output from a reproduction processing unit 306, which will be described later, is notified to the user as voice through the headset system using the voice synthesis system 330.
  • image recognition processing such as general object recognition, specific object recognition, and scene recognition is performed on the image from the headset system 200.
  • the image recognition system 301 includes a general object recognition system 106, a scene recognition system 108, a specific object recognition system 110, an image category database 107, a scene component database 109, and a mother database (hereinafter abbreviated as MDB) 111.
  • the general object recognition system 106 includes a general object recognition unit 106-01, a category detection unit 106-02, a category learning unit 106-03, and a new category registration unit 106-04. 108-01, feature extraction unit 108-02, weight learning unit 108-03, and scene recognition unit 108-04.
  • the specific object recognition system 110 includes a specific object recognition unit 110-01 and an MDB search unit 110. -02, an MDB learning unit 110-03, and a new MDB registration unit 110-04.
  • the image category database 107 is composed of a category classification database 107-01 and unspecified category data 107-02.
  • the element database 109 includes a scene element database 109-01 and metadata. Is composed of a book 109-02, MDB111 consists of detailed design data 111-01, incidental information data 111-02, feature data 111-03, and an unspecified object data 111-04.
  • the functional blocks of the image recognition system 301 are not necessarily limited to these, these representative functions will be briefly described.
  • General object recognition system 106 recognizes an object included in an image with a general name or category.
  • the categories here are hierarchical, even if they are recognized as the same general object, but are further subdivided categories (the same chair has four chairs or a chair with no legs at all) Can be classified and recognized as a global category (including chairs, desks and chests, all of which are broadly classified as “furniture” categories).
  • Category recognition is the “classification” meaning this classification, the proposition of classifying objects into known classes, and categories are also called classes.
  • the general object recognition unit 106-01 extracts local feature amounts from the feature points of the object in the input image, and the local feature amounts are similar to or similar to the description of the predetermined feature amount obtained by learning in advance. Are compared with each other to determine whether the object is a known general object.
  • the category detection unit 106-02 specifies or estimates which category (class) an object that can be recognized as a general object belongs to or collates with the category classification database 107-01. As a result, the category detection unit 106-02 stores the specified category in the database. When an additional feature amount that is to be added or modified is found, the category learning unit 106-03 re-learns and updates the description of the general object in the category classification database 107-01. Also, once it is determined that the feature quantity of the object designated as the unspecified category data 107-02 and its feature quantity are very similar to the feature quantity of another unspecified object detected separately, they are newly discovered. In the new category registration unit 106-04, the feature amount is newly registered in the category classification database 107-01, and a new general name is assigned to the object. Is done.
  • the scene recognition system 108 detects characteristic image components that dominate the whole or a part of the input image using a plurality of feature extraction systems having different properties, and describes them in the scene component database 109.
  • the scene elements database 109-01 and the multi-dimensional space are mutually referenced to obtain a pattern in which each input element group is detected in the specific scene by statistical processing, and control all or part of the image. It is recognized whether or not a certain area is the specific scene.
  • the metadata group attached to the input image is collated with the image components described in the metadata dictionary 109-02 registered in advance in the scene component database 109, thereby further improving the accuracy of scene detection. It becomes possible to improve.
  • the area extraction unit 108-01 divides the entire image into a plurality of areas as necessary, and enables scene discrimination for each area. For example, from a surveillance camera installed on the wall or roof of a building in an urban space, it is possible to overlook a plurality of scenes such as intersections and entrances of many stores.
  • the feature extraction unit 108-02 uses the recognition results obtained from various available image feature amounts such as local feature amounts of the plurality of feature points detected in the designated image region, color information, and object shapes, in the subsequent stage. This is input to the weight learning unit 108-03, the probability that each element co-occurs in a specific scene is obtained, and input to the scene recognition unit 108-04 to perform scene discrimination for the final input image.
  • the specific object recognition system 110 sequentially compares the characteristics of the object detected from the input image with the characteristics of the specific object group stored in the MDB 111 in advance, and finally performs identification processing (Identification) of the object.
  • the total number of specific objects existing on the earth is enormous, and it is not practical to collate with all these specific objects. Therefore, as will be described later, it is necessary to narrow down the category and search range of an object within a certain range in advance of the specific object recognition system.
  • the specific object recognition unit 110-01 compares the local feature amount at the detected image feature point with the feature parameter group in the MDB 111 obtained by learning, and to which specific object the object is applied. Discrimination is determined by statistical processing.
  • the MDB 111 holds detailed data regarding the specific object that is available at that time.
  • Basic information necessary for reconstructing and manufacturing an object, such as finishing, is held in the MDB 111.
  • the incidental information data 111-02 holds various information related to the object such as the name of the object, the manufacturer, the part number, the date and time, the material, the composition, and the processing information.
  • the feature amount data 111-03 holds information related to feature points and feature amounts of individual objects generated based on the design information.
  • the unspecified object data 111-04 is temporarily stored in the MDB 111 for future analysis as data of an unknown object that does not belong to any specific object at that time.
  • the MDB search unit 110-02 provides a function of searching for detailed data corresponding to the specific object, and the MDB learning unit 110-03 describes the object in the MDB 111 through an adaptive and dynamic learning process. Add or modify content.
  • the new MDB registration unit 110-04 sets the object as a new specified object. New registration process.
  • FIG. 6B shows an example of the system configuration and functional blocks of the general object recognition unit 106-01 according to an embodiment of the present invention.
  • the functional blocks of the general object recognition unit 106-01 are not necessarily limited to these, but a general object recognition method when Bag-of-Features (hereinafter abbreviated as BoF) is applied as a representative feature extraction method. Is briefly described below.
  • the general object recognition unit 106-01 includes a learning unit 106-10, a comparison unit 106-11, a vector quantization histogram unit (learning) 110-11, a vector quantization histogram unit (comparison) 110-14, and a vector quantization histogram.
  • the learning unit 110-16 includes an identification unit 110-15.
  • the learning unit 110-16 includes a local feature amount extraction unit (learning) 110-07, a vector quantization unit (learning) 110-08, a Visual Word creation unit 110-09, and a Visual Word. And a dictionary (CodeBook) 110-10.
  • BoF extracts image feature points that appear in an image, expresses the entire object as an aggregate of a plurality of local feature values (Visual Word) without using the relative positional relationship, and displays them as a Visual Word obtained by learning.
  • the object is compared with a dictionary (CodeBook) 110-10 to determine which object is closest to the structure of the local feature values.
  • a multidimensional feature vector obtained by the local feature quantity extraction unit (learning) 110-07 constituting the learning unit 106-10 is converted into a feature vector group having a fixed number of dimensions by the subsequent vector quantization unit (learning) 110-08.
  • the Visual Word creation unit 110-09 generates Visual Word for each feature vector based on each centroid vector.
  • Known clustering methods include the k-means method and the mean-shift method.
  • the generated Visual Word is stored in the Visual Word dictionary (CodeBook) 110-10, the local feature values extracted from the input image are collated with each other, and the vector quantization unit (comparison) 110-13 Vector quantization is performed for each Visual Word. Thereafter, in the vector quantization histogram section (comparison) 110-14, histograms for all Visual Words are generated.
  • the total number (dimension number) of each bin of the histogram is usually thousands to tens of thousands, and there are many histogram bins that do not have feature matching at all depending on the input image, but there are also bins with remarkable feature matching, A normalization process is performed so that the sum of all bin values of the histogram becomes 1 in a lump.
  • the obtained vector quantization histogram is input to the vector quantization histogram discriminating unit 110-15 at the subsequent stage, and as an example, in a support vector machine (hereinafter referred to as SVM) which is a representative discriminator, a class to which an object belongs, That is, it recognizes what kind of general object the object is.
  • SVM support vector machine
  • the recognition result here can also be used as a learning process for the Visual Word dictionary.
  • information obtained from other methods use of metadata and collective intelligence
  • FIG. 6C shows a schematic block diagram of the entire general object recognition system 106 including the general object recognition unit 106-01 according to an embodiment of the present invention.
  • General objects belong to various categories, and they have a multiple hierarchical structure. For example, humans belong to a higher category “mammals”, and mammals belong to a higher category “animals”. Are humans also hair colors, eye colors, adults or children? It is possible to recognize in other categories such as.
  • the existence of the category classification database 107-01 is indispensable for making these recognition judgments. This is a collection of “knowledge” of civilization, and future learning and discovery will add new “knowledge” to it and continue to evolve.
  • the classes identified by the general object recognition unit 106-01 are classified into the category classification database as various multidimensional and hierarchical structures. 107-01.
  • the general object recognized in the continuous learning is collated with the category classification database 107-01, and the category detection unit 106-02 recognizes the belonging category. Thereafter, the recognition result is delivered to the category learning unit 106-03, and consistency with the description in the category classification database 107-01 is checked.
  • An object that has been recognized as a general object sometimes contains a plurality of recognition results.
  • FIG. 6D is a block diagram showing a typical example of the scene recognition system 108 according to the present invention for recognizing and determining a scene included in an input image according to an embodiment of the present invention.
  • a plurality of objects can generally be recognized from a learning image and an input image. For example, if objects such as “trees”, “grass”, and “animals” can be recognized simultaneously with areas representing “sky”, “sun”, “ground”, etc., whether they are “zoos” or “ Whether it is a “grass” can be inferred from the overall landscape and co-occurrence relationships with other objects discovered.
  • the scene recognition system 108 includes an area extraction unit 108-01, a feature extraction unit 108-02, a strong classifier (weight learning unit) 108-03, a scene recognition unit 108-04, and a scene component database 109.
  • the extraction unit 108-02 includes a local feature amount extraction unit 108-05, a color information extraction unit 108-06, an object shape extraction unit 108-07, a context extraction unit 108-08, and weak classifiers 108-09 to 108-12.
  • the scene recognition unit 108-04 includes a scene classification unit 108-13, a scene learning unit 108-14, and a new scene registration unit 108-15.
  • a scene component database 109 is a scene element database 109. -01 and a metadata dictionary 109-02.
  • the region extracting unit 108-01 performs region extraction related to the target image in order to effectively extract the characteristics of the target object without being affected by the background or other objects.
  • a graph-based region segmentation method (Efficient Graph-Based Image Segmentation) or the like is known.
  • the extracted object images are input to the local feature amount extraction unit 108-05, the color information extraction unit 108-06, the object shape extraction unit 108-07, and the context extraction unit 108-08, respectively.
  • the obtained feature quantities are subjected to discrimination processing in the weak classifiers 108-09 to 108-12, and are integratedly modeled as a multidimensional feature quantity group.
  • These modeled feature quantity groups are input to a strong classifier 108-03 having a weighted learning function, and a recognition determination result for a final object image is obtained.
  • a typical example of the weak classifier is SVM, and an example of the strong classifier is AdaBoost.
  • an input image often includes a plurality of objects and a plurality of categories that are a superordinate concept thereof, and a human can imagine a specific scene or situation (context) at a glance from there.
  • a human can imagine a specific scene or situation (context) at a glance from there.
  • a single object or a single category is presented, it is difficult to determine what scene the input image represents.
  • the surrounding situation where these objects exist, their mutual positional relationship, and the co-occurrence relationship of each object or category have an important meaning for discrimination of the scene.
  • the object group and category group that can be image-recognized in the previous section are collated based on the appearance probability of the component group for each scene described in the scene element database 109-01, and the subsequent scene recognition unit 108- In 04, it is determined using a statistical method what scene the input image represents.
  • Metadata attached to images can be a useful information source.
  • metadata attached by humans may be an assumption or obvious error, or an image may be indirectly captured as a metaphor. May not always be the case.
  • a comprehensive judgment is made with reference to co-occurrence events related to the target that can be extracted from the knowledge information processing server system equipped with the image recognition system, and the final object or category recognition process is performed. It is desirable to be done.
  • a plurality of scenes can be obtained from one image. For example, it may be “Summer Sea” and “Beach”. In that case, a plurality of scene names are assigned to the image.
  • FIG. 6E shows a configuration example and function blocks of the entire system of the specific object recognition system 110 according to the embodiment of the present invention.
  • the specific object recognition system 110 includes a general object recognition system 106, a scene recognition system 108, an MDB 111, a specific object recognition unit 110-01, an MDB search unit 110-02, an MDB learning unit 110-03, and a new MDB registration unit 110-04-.
  • the specific object recognition unit 110-01 includes a two-dimensional mapping unit 110-05, an individual image cutout unit 110-06, a local feature amount extraction unit (learning) 110-07, and a vector quantization unit (learning) 110.
  • Visual Word creation unit 110-09 Visual Word dictionary (CodeBook) 110-10, Vector quantization histogram part (learning) 110-11, Local feature quantity extraction part (comparison) 110-12, Vector quantization part ( Comparison) 110-13, vector quantization histogram portion (comparison) 110-14, Vector quantization histogram identifying unit 110-15, shape feature extraction unit 110-16, and a shape comparison section 110-17, the color information extraction unit 110-18, and color comparison unit 110-19.
  • the class (category) to which the target object belongs can be recognized by the general object recognition system 106, can the object be further recognized as a specific object? You can move on to the process of narrowing down. If a class is not specified to some extent, a search from an infinite number of specific objects is forced, and it cannot be said that it is practical in terms of time and cost.
  • the feature quantity obtained from the specific object recognition system 110 can be used for further narrowing down, and unique identification information (a product name, a specific trademark, a logo, etc.) can be recognized on a part of the object. In other cases, or in cases where useful metadata or the like is attached in advance, further pinpointing can be performed.
  • the MDB search unit 110-02 sequentially extracts detailed data and design data related to a plurality of object candidate groups from the MDB 111, and based on them, the matching process with the input image is performed. Executed. Even when an object is not an industrial product or when detailed design data itself does not exist, a certain degree of identification can be made by matching image features and image features that can be detected in detail if there is a photograph, etc. Object recognition is possible. However, there are rare cases where the appearance of the input image and the comparison image is exactly the same, and there are cases where each is recognized as a different object even if they are the same.
  • the two-dimensional mapping unit 110-05 converts the three-dimensional data in the MDB 111 into a two-dimensional image according to the appearance of the input image.
  • Visualization makes it possible to perform highly accurate feature matching processing.
  • the rendering process to the two-dimensional image in the two-dimensional mapping unit 110-05 is performed by mapping all the directions from all viewpoints, the calculation time and the calculation cost are unnecessarily increased. Narrowing processing according to how it looks is necessary.
  • various feature quantities obtained from highly accurate data using the MDB 111 can be obtained in advance in the learning process.
  • the local feature amount of the object is detected by the local feature amount extraction unit 110-07, and each local feature amount is converted into a plurality of similar feature groups by the vector quantization unit (learning) 110-08.
  • the Visual Word creation unit 110-09 converts it into a multi-dimensional feature amount set and registers them in the Visual Word dictionary 110-10. These are continuously performed for a large number of learning images until sufficient recognition accuracy is obtained.
  • the learning image is, for example, a photograph
  • the image resolution is insufficient
  • the influence of noise, the influence of occlusion, the influence of objects other than the target is unavoidable, but if it is based on the MDB 111, there is no noise
  • it is possible to extract the features of the target image in an ideal state based on high-precision data it is possible to configure a recognition system with significantly improved extraction and separation accuracy compared to conventional methods Become.
  • the local feature point and the feature amount are calculated by the local feature amount extraction unit (comparison) 110-12, and learning is performed in advance.
  • the vector quantization histogram identification unit 110-15 identifies and determines whether the object is the same as or similar to the learned object, or not.
  • SVM Small Vector Machine
  • AdaBoost AdaBoost that enables weighting of discriminating judgment after learning is also widely used as an effective discriminator.
  • the shape feature of the object is also useful to use the shape feature of the object for the purpose of further improving the detection accuracy as well as the local feature amount.
  • the object cut out from the input image is input to the shape comparison unit 110-17 via the shape feature quantity extraction unit 110-16, and identification is performed using the shape features of each part of the object.
  • the identification result is fed back to the MDB search unit 110-02, so that the narrowing down process for the MDB 111 becomes possible.
  • the shape feature quantity extraction means HoG (Histograms of Oriented Gradients) and the like are known.
  • the shape feature is also useful for the purpose of significantly reducing rendering processing from multiple viewpoint directions for obtaining a two-dimensional map using the MDB 111.
  • the color characteristics and texture (surface treatment) of the object are useful for the purpose of improving the image recognition accuracy.
  • the extracted input image is input to the color information extraction unit 110-18, and the color comparison unit 110-19 extracts the color information of the object or the texture, and the result is input to the MDB search unit 110-02.
  • the MDB search unit 110-02. By performing feedback, further narrowing processing can be performed in the MDB 111. Through the series of processes, the specific object recognition process can be performed more effectively.
  • the processing procedure 340 of the biometric authentication unit 302 will be described with reference to FIG.
  • the user wears the headset system 200 (341)
  • the following biometric authentication process starts.
  • the biometric authentication system 302.
  • SSL Secure Sockets Layer
  • TLS Transport Layer Security
  • biometric information 345 is acquired from the biometric sensor 204 provided in the headset system.
  • biometric authentication information vein pattern information in the outer ear portion or eardrum of the user wearing the headset system can be used, but these may be selected and combined, but are not limited thereto.
  • the biometric authentication information is sent to the biometric authentication system as a template.
  • Step 355 in FIG. 7 describes the processing on the biometric authentication system side.
  • the template is registered in the knowledge information processing server system 300 as a user.
  • a signature + encryption function f (x, y) is generated from the template, and in step 358, the function is returned to the headset system.
  • x in f (x, y) is data to be signature-encrypted
  • y is biometric authentication information used for signature encryption.
  • decision 345 it is confirmed whether or not the function has been obtained. If YES, the function is used for communication between the headset system and the knowledge information processing server system (346). If the determination 345 is NO, it is determined whether or not the determination 345 is NO (349). If YES, an authentication error is notified to the user (350). If the determination 349 is NO, the process is repeated from step 344. Thereafter, after waiting for a specified time in step (347), the loop (343) is repeated. When the user removes the headset system or when the authentication error occurs, the encrypted communication path with the biometric authentication system is disconnected (348).
  • FIG. 8A shows a configuration example of the interest graph unit 303 in one embodiment of the present invention.
  • access to the graph database 365 is described as direct access to the graph database 365 and the user database 366.
  • an interest graph applied to a user who is using the system is applied.
  • the graph storage unit 360 includes only necessary portions from the graph structure data stored in the graph database 365, and necessary portion information related to the user described in the user database 366. Can be selectively read out on its own high-speed memory and cached internally.
  • the graph calculation unit 361 extracts a partial graph from the graph storage unit 360 or calculates an interest graph related to the user.
  • the relevance calculation unit 362 performs n (> 1) next node extraction, filtering processing, link generation / breakage between nodes, and the like regarding relevance between nodes.
  • the statistical information processing unit 363 processes nodes and link data in the graph database as statistical information, and finds a new relationship. For example, when a certain subgraph is close in information distance to another subgraph and similar subgraphs can be classified in the same cluster, it can be determined that the new subgraph has a high probability of being included in the cluster.
  • the user database 366 is a database that holds information about the user, and is used by the biometric authentication unit 302.
  • a graph structure centered on a node corresponding to the user in the user database is handled as the interest graph of the user.
  • FIG. 8B (A) shows a basic access method for the graph database (365).
  • the value (371) is obtained from the key (370) by the locate operation (372).
  • the key (370) is derived by calculating the value (373) with a hash function.
  • a hash function For example, when the SHA-1 algorithm is used for the hash function, the key (370) is 160 bits long.
  • the Locate operation (372) a distributed hash table method can be used.
  • the relationship between the key and value is expressed by (key, ⁇ value ⁇ ) and is used as a storage unit in the graph database.
  • the node n1 (375) is (n1,1 ⁇ node n1 ⁇ ) and the node n2 (376) is (n2, ⁇ node n2 ⁇ ).
  • n1 and n2 are the keys of the node n1 (375) and the node n2 (376), respectively, and the hash calculation is performed on the node entity n1 (375) and the node entity n2 (376) to obtain the respective keys.
  • the link l1 (377) is expressed by (l1, ⁇ n1, n2 ⁇ ) similarly to the node, and ⁇ n1, n2 ⁇ is hash-calculated to obtain the key (l1) 377.
  • FIG. 8B (D) is an example of the components of the graph database.
  • the node management unit 380) manages the nodes
  • the link management unit 381 manages the links
  • the data management unit 382 manages data related to the node so as to be recorded in the data storage unit 386.
  • the history management unit 410 in FIG. 9A manages the usage history in the network communication system 100 for each user. For example, it is possible to leave the focus on the object as a footprint. Or how far have you played the same message or tweet so that you don't play it repeatedly? Record. Alternatively, when the playback of a message or tweet is stopped halfway, the portion where the playback is stopped is recorded for subsequent continuous playback.
  • FIG. 9B shows a part of the graph structure recorded in the graph database 365 as an example.
  • the user (417) node, the target (415) node, and the message and tweet (416) node are each connected by a link.
  • the reproduction of the message or tweet related to the target (415) of the user (417) is resumed from the reproduction position recorded as the node (418).
  • the usage history in the present embodiment is not limited to these methods, and other methods that can be expected to have the same effect may be used.
  • the message selection unit 411 is managed for each user, and selects a suitable message and tweet when a plurality of messages and tweet are recorded in the target focused by the user. For example, it may be played back in order of recorded time. It is also possible to selectively select and play back topics of interest of the user from the interest graph for the user. Also, messages and tweets that explicitly specify the user may be played preferentially. Note that the procedure for selecting messages and tweets in the present embodiment is not limited to these.
  • the current interest 412 is managed and stored for each user as a node group representing the current interest of the user in the interest graph unit 303.
  • the message selection unit searches the graph structure from the node group corresponding to the current interest of the user in the current interest, so that the user selects a node group having a high degree of interest at the time, and the conversation described later.
  • the input elements of the engine 430 are converted into a series of sentences and reproduced.
  • the object and degree of interest of the user are obtained from, for example, a graph structure in FIG. In FIG. 17, the user (1001) node has links to the node (1005) and the node (1002). That is, from this link, it is assumed that “wine” and “car” are interested. Whether the user is more interested in “wine” or “car” is compared with the graph structure connected from the node “wine” and the graph structure connected from the node “car”, and more interested in those who have more nodes May be high, or from the attention history related to the node, the interest may be higher due to the number of times of attention, or the user may specify the strength of his / her own interest, and the present invention is limited to these. Not.
  • the message storage unit 306 according to an embodiment of the present invention will be described with reference to FIG.
  • Messages and tweets 391 uttered by the user and / or images 421 taken by the headset system 200 are recorded in the message database 420 by the message storage unit.
  • the message node generation unit 422 acquires the message and information to be tweeted from the interest graph unit 303, and generates a message node.
  • the message management unit 423 associates the message or tweet with the message node, and records the message or tweet in the graph database 365.
  • the image 421 taken by the headset system may be recorded in the graph database 365 in the same manner. Note that the same service on the network may be used via the network for recording the message or tweet.
  • the playback processing unit 307 according to an embodiment of the present invention will be described with reference to FIG.
  • the user's utterance including the user's message and tweet 391 is recognized by the voice recognition system 320 and converted into a single or a plurality of word strings.
  • the word string is "A user is currently paying attention to something?" "Indicating spatiotemporal information?"
  • the situation identifier is given and sent to the conversation engine 430 which is a component of the playback processing unit 306.
  • the identifier as the output of the situation recognition unit 304 is not limited to each situation described above, and may be configured by a method that does not use the identifier.
  • the playback processing unit 307 is composed of the conversation engine 430, the target processing unit 431, the command processing unit 432, and the user message playback unit 433. However, these may be selected and added with new functions. However, the present invention is not limited to this configuration.
  • the target processing unit is executed when an identifier indicating that the target is being focused is given from the situation recognition unit, and performs a series of processes illustrated in FIG. 3A.
  • the user message reproduction unit reproduces a message or tweet left in the target and / or an associated image.
  • the user management unit 308 according to an embodiment of the present invention will be described with reference to FIG.
  • the user management unit manages an ACL (access control list) of authorized users in a graph structure.
  • FIG. 12A shows a state in which one user (451) node has a link with a permission (450) node. This gives the user permission for the node linked to the permission node. If the node is a message or tweet, you can play them.
  • FIG. 12B is an example in which permission is given to a specific user group.
  • the permission (452) node collectively gives permission to the user 1 (454) node, the user 2 (455) node, and the user 3 (456) node linked to the user group (453) node. Is shown.
  • FIG. 12C is an example in which a permission (457) node is given to all (458) nodes in a lump.
  • a specific user (460) node is given a permission (459) node only for a specific time or time zone (461) node and a specific place / region (462) node. It shows a state.
  • the ACL in this embodiment may have a configuration other than that shown in FIG.
  • a configuration may be adopted in which a non-permitted node is introduced to explicitly indicate a user who is not permitted.
  • the permission node may be further refined to introduce a reproduction permission node and a recording permission node, so that the form of permission may be changed depending on whether the message or tweet is reproduced or recorded.
  • FIG. 13A is used to explain an example of a use case scenario centered on a user who uses the network communication system 100 according to an embodiment of the present invention.
  • the shootable range of the camera provided in the headset system 200 worn by the user is referred to as a visual field 503, and the direction in which the user is mainly viewing is the user's subjective visual field: subjective vision 502. Call.
  • the user wears the network terminal 220, picks up the user's speech (506 or 507) with the microphone 201 incorporated in the headset system, and incorporates the camera incorporated in the headset system that reflects the user's subjective view. It is uploaded to the knowledge information processing server system 300 side together with the video imaged by 203. From the knowledge information processing server system side, it is possible to return audio information, video / text information, etc. to the earphone 202 or the network terminal 220 incorporated in the headset system.
  • FIG. 13A it is assumed that the user 500 is looking at the object group 505, and the user 501 is looking at the scene 504.
  • the object group 505 is photographed in the visual field 503 of the user's camera according to the procedure shown in FIG. 3A, and the image is uploaded to the knowledge information processing server system 300 side.
  • the image recognition system 301 extracts a specific object and / or a general object that can be recognized therefrom.
  • the user 500 is focused by the voice of the user such as “upper right” or “wine”.
  • the pointing operation is performed to notify the image recognition system that the user is currently paying attention to the object 508.
  • the knowledge information processing server system side issues a reconfirmation inquiry including a co-occurrence event not explicitly indicated by the user such as “Wine in ice bucket?”
  • the headset system 200 can be notified by voice. If the content of the reconfirmation notification is different from the user's intention, for example, the user utters “different” and issues a user's additional target selection instruction to the server system side by voice, It may be possible to enable a process for seeking re-detection. Alternatively, the user may directly specify or modify the target of interest using the GUI on the network terminal.
  • the user 501 is viewing the scene 504, and uploads a camera image reflecting the user's subjective visual field 503 to the knowledge information processing server system side provided with the image recognition engine, so that the server system side
  • the image recognition system incorporated in FIG. 2 infers that the target scene 504 is probably a “mountain landscape”.
  • the user 501 utters his / her message or tweet to the scene, for example, “nostalgic satoyama” by voice, and the message or tweet is transmitted to the server system side via the user's headset system 200. Recorded with camera video.
  • the user 501's tweet “nostalgic satoyama” is sent from the server system side to the user via the network. It is possible to send the user with audio information.
  • user communication related to the shared experience is evoked for common and impressive scenes that anyone can imagine, such as "sunset", even if the actual scenery and its location are different. Things are possible.
  • a message or tweet left by the user 500 or the user 501 with respect to a specific target is specified according to a condition set in advance by the user by an instruction by the user's voice or a direct operation on the network terminal 220. Only the user, the specific user group, or all the users can be selectively left.
  • a message or tweet left by the user 500 or the user 501 with respect to a specific target is specified according to a condition preset by the user by a user's voice instruction or direct operation on the network terminal 220. It is possible to leave selectively for a time or a time zone and / or a specific place, a specific region, and / or a specific user, a specific user group, or all users.
  • FIG. 13B is used to explain an example of network communication induced by visual curiosity for a common object, derived from the use case scenario.
  • network communication induced by the visual curiosity a description will be given of a situation in which a plurality of users are watching “sakura” in different situations in different time spaces.
  • User 1 (550) who accidentally saw cherry blossoms (560) tweeted as "beautiful cherry blossoms” and user 2 (551) tweeted as "Cherry blossoms are in full bloom” (561) in another time and space. Yes.
  • the user 4 (553) who sees the petals flowing on the water surface at a distant place murmurs, “Is the cherry blossom petal?”.
  • the target 601 includes a user (600) node, a keyword (602) node, a target image feature (603) node, a time / time zone (604) node, a location / region (605) node, a message, It is linked to each node of the tweet 607.
  • ACL (606) is linked to the target 601.
  • An ACL (608) node, a time / time zone (609) node, and a location / region (610) node are linked to the message and tweet (607) node. That is, FIG. 14 shows the target of the user, its time / time zone, place / region, extracted in the process of step 30-01 shown in FIG.
  • the graph structure shown in FIG. 14 is configured so that information not limited to the described time / time zone, location / region, and ACL can be recorded by adding or deleting nodes. Also good.
  • the general object recognition system 106 detects (901) the category to which the target belongs.
  • a category node is searched from the graph database 365 (902), and it is confirmed whether the category exists in the graph database 365 (903). If it does not exist, a new category node is added and recorded in the graph database (904).
  • a specific object is detected by the specific object recognition system 110 (905), and it is confirmed whether it already exists on the graph database (907). If it does not exist, a new specific object node is added (908) and recorded on the graph database (909).
  • the scene recognition system 108 detects a scene (910), searches a scene node from the graph database 365 (911), and checks whether the scene exists in the graph database ( 912). If not, a node related to the scene is generated and added to the graph database (913). When these series of processes are completed, the time stamp information obtained by the above process is additionally recorded on the graph database in the category node, the specific object node, or the scene node (914), and the process is terminated.
  • the new node group generation for registration in the graph database 365 described in FIG. 15 may be performed during the reconfirmation process by the user illustrated in FIG. 3A.
  • the word string extracted by the voice recognition system can be associated with various features extracted on the knowledge information processing server system side provided with the image recognition system.
  • the server system side asks the user to confirm by voice that “is it a red bus?” If the answer is "Taxi,” the server system will eventually recognize the taxi 50 by performing additional image feature extraction processing, and the user will be told that a yellow taxi on the left has been detected. Issue a reconfirmation by, and the user responds “yes” to it.
  • all the detected feature groups related to the taxi 50 are set as related node groups related to the view (scene), together with the node groups related to the words “taxi” and “yellow” confirmed by the user, the graph database 365. Can be registered within.
  • the time stamp linked to the category node, the specific object node, or the scene node shown in FIG. 15 can be associated with the user.
  • the attention history of the user can be configured as a partial graph of the acquired interest graph.
  • the state of interest of the user in a specific space-time focused on the target and the situation relating to other node groups associated therewith can be obtained via the user's voice or the GUI on the network terminal 220 via the image. It is possible to make an inquiry to the knowledge information processing server system 300 provided with the recognition system.
  • various states related to the object of interest in a specific space-time that can be derived from the acquired interest graph subgraph from the server system side are given to the user by voice, text, photo, graphic It is possible to notify with information.
  • the history of interest includes the spatiotemporal information, user information, and the like.
  • the target image information are stored as a graph structure in the graph database 365. Accordingly, it is possible to configure the attention history so that the graph structure can be directly referred to and analyzed.
  • the graph structure (1000) is an interest graph of the user (1001) node at a certain time.
  • the user is interested in the vehicle type A (1003) node and the vehicle type B (1004) node as specific objects, and they belong to the category “car” (1002) node.
  • the user is also interested in three target (specific objects 1006 to 1008) nodes, which belong to the wine (1005) node.
  • target vehicle type X (1011) node is assumed that the user pays attention to the target vehicle type X (1011) node.
  • the server system generates a link (1040) connecting the graph structure (1010) including the target vehicle type X (1011) node to the car (1002) node.
  • the server system can propose an enclosure (1020) to the user.
  • FIG. 17 shows a snapshot example of the graph structure centering on the user (1001) node in a state where the growth of the interest graph shown in FIG. 16 is further advanced.
  • the figure represents the following state.
  • the user (1001) node is interested in a specific scene (1030) node in addition to the car (1002) node and the wine (1005) node.
  • the car (1002) node is particularly interested in each node of the car type A (1003), the car type B (1004), and the car type X (1011) as specific objects, and the wine (1005) node has five kinds of wine (1006, 1007, 1008, 1021, and 1022) Interested in nodes.
  • the specific scene (1030) node is a scene represented by the image (1031) node, and is shot at the specific location (1034) node at the specific time (1033) node and listed in the ACL (1032) node. Playback is allowed only for the selected user.
  • the vehicle type X (1011) node is represented by an image (1012) node, where various user messages and tweets (1013) nodes are left, and for the users listed in the ACL (1036) node Only they are allowed to play.
  • the engine specification and color are described as nodes.
  • similar attributes are described for five types of wine (1006, 1007, 1008, 1021, and 1022) nodes. Note that some of these nodes may be directly linked from another user 2 (1036).
  • FIG. 18A is used to explain means for recording or reproducing a user's message or tweet as voice in one embodiment of the present invention.
  • the user specifies (1101) the target by the procedure shown in FIG.
  • a recipient who can receive the message or tweet is specified (ACL) and bound to the variable A.
  • whether to record or reproduce is selected (1105), and in the case of recording processing, the recording procedure of the message or tweet is executed (1106).
  • a necessary node group is generated from the four variables (O, T, P, A) and recorded in the graph database 365 (1107).
  • the selection (1105) is a reproduction process
  • a corresponding node group is extracted from the graph database 365 from the four variables (O, T, P, A) (1108), and a message remaining on the node or A procedure for reproducing a tweet is executed (1109), and a series of processes is terminated.
  • FIG. 18B explains step 1102 at the time of reproduction in FIG. 18A in detail.
  • the user selects whether the time / time zone is designated by voice or the time / time zone is designated directly by the GUI on the network terminal 220 (1111).
  • the user utters time / time zone (1112), and the voice recognition system 320 performs recognition processing (1113). Whether the result is a time / time zone is confirmed (1114), and if the result is correct, the designated time / time zone data is stored in the variable T (1116). If not, return the time / time zone to the utterance (1112).
  • the process is interrupted (QUIT)
  • the process ends by utterance.
  • the time / time zone is designated by the GUI of the network terminal (1115)
  • the input time / time zone is directly stored in the variable T (1116), and a series of end processing is performed.
  • FIG. 18C explains in detail the step 1103 during reproduction in FIG. 18A.
  • the user selects whether to specify the location / area by voice or to directly specify the location / area using the GUI on the network terminal 220.
  • the user utters a place / region (1122), and the voice recognition system 320 performs voice recognition processing (1123). It is confirmed whether the result is the place / region where the utterance was made (1124). If the result is correct, it is converted into latitude / longitude data (1127) and stored in the variable P (1128). If not, go back to Talking (1122) Place / Region.
  • the process is interrupted (QUIT), the process ends by utterance.
  • the map is displayed on the GUI of the network terminal (1125) and the location / area is directly specified on the screen of the network terminal (1126)
  • the latitude / longitude data is stored in the variable P, A series of processing ends (1128).
  • FIG. 19 a time or a time zone when a message is received by a recipient from among a plurality of messages and tweets left for a specific target, and / or A procedure for narrowing playback by making it possible to specify the remaining place or area and / or the remaining user name will be described.
  • the user who is the reception target focuses on the target according to the procedure described in FIG. 3A, and each corresponding node group is selected in advance (1140).
  • the time / time zone and place / region to be reproduced for the target are specified by the procedure shown in FIG. 18B and FIG. 18C (1201).
  • it is designated who will reproduce the message or tweet left (1202).
  • the ACL is checked (1203), and data is extracted from the node corresponding to the message or tweet that meets the specified condition and / or the node corresponding to the video (1204).
  • the next processing is repeatedly applied to all the nodes (1205).
  • the head information worn by the reception target user is obtained from the graph database 365 by using the message information related to the node and the user information leaving the tweet, and using the reproduction processing unit 306 shown in FIG.
  • the set system 200 and / or the network terminal 220 associated with the reception target user is notified by voice and / or text (1208).
  • the notification content is voice
  • it is played from the earphone built in the headset system.
  • the notification content is text, photo, and / or figure
  • information other than the voice is sent to the message or tweet on the network terminal. Synchronously displayed (1209).
  • the playback processing unit 306 is used to attach the user to be received.
  • the headset system 200 and / or the network terminal 220 associated with the user to be received as audio and / or image information that does not include the user information leaving the message or tweet. (1207)
  • the series of processing is repeatedly ended for all the extracted nodes.
  • the processing is repeatedly performed for all the nodes extracted in the loop (1205), but other means may be used.
  • an appropriate message or tweet may be selected for the user to be received using the situation recognition unit 305, and only the message or tweet and / or the accompanying video information may be reproduced.
  • time / time zone and location / region (1201) for the purpose of receiving the space-time retrospectively with respect to messages and tweets recorded in the past and image information based on them, although an example of designating a specific time / time zone and location / region has been shown, conversely, a future time / time zone and location / region may be designated. In that case, it becomes possible to deliver the message, the tweet, and the video information that is the basis of the message, to the designated future space-time in a “time capsule”.
  • the target of interest may be displayed on the network terminal in synchronization with the reproduction of the message or tweet.
  • the message or tweet is left to the recipient user by means of voice information from the knowledge information processing server system equipped with the image recognition system toward the subject outside the subjective view of the recipient user.
  • An instruction to move the head toward the target or move in the direction in which the target exists is given, and as a result, when the target user catches the target within the subjective field of view, it remains on the target. It may be configured to play back messages and tweets. Further, another means that can obtain a similar effect may be used.
  • the history management unit 410 which is a component of the situation recognition unit, records the playback position at that time in the corresponding node.
  • Image information obtained from a camera image incorporated in the user's headset system when the user points directly to the target of interest or directly touches the target with a finger regardless of the user's voice instruction An embodiment in which the image recognition system side analyzes the image in real time to identify the target of interest will be described based on the above.
  • FIG. 20A shows an example of the user's subjective view (1300).
  • wine (1301), ice bucket (1304), and two other objects 1302, 1303 are detected.
  • the user's finger 1310) directly indicates the wine.
  • the user can also directly touch the wine of interest (1301).
  • pointing with a finger it may be pointed using a stick-like tool close to the user, or a light beam such as a laser pointer may be directly irradiated onto the object.
  • FIG. 20B illustrates a target pointing procedure using fingers (1310).
  • the screen of FIG. 20A is an image from a camera reflecting the subjective visual field of the user.
  • the user's hand (1311) including the finger (1310) is detected from the screen.
  • the camera image is image-analyzed by the image recognition system, the main orientation (1312) is obtained from the shape characteristics of the finger (1310) and the hand (1311) detected therefrom, and the direction indicated by the finger (1310) is extracted.
  • the detection of the orientation (1312) may be executed locally by the image recognition engine 224 incorporated on the network terminal 220 side.
  • the orientation is detected (1322), there is a high possibility that an object pointed to by the user exists on the vector line.
  • an object existing on the vector line is detected from the image of FIG. 20A by a cooperative operation with the image recognition system 301 (1323), and image recognition processing of the target object is executed (1324).
  • the image detection and recognition processing can be performed on the recognition engine 224 which is one component on the user's network terminal 220 side, and the load on the network side can be greatly reduced. Further, high-speed tracking processing with less latency (time delay) can be performed even for a quick pointing operation by the user.
  • the final image recognition result is confirmed by making an inquiry to the knowledge information processing server system 300 provided with the image recognition system via the network, and the user is notified of the name of the recognition target (1325). If the image recognition result of the pointing object matches the user's intention, the pointing process is terminated (1325). If the result is different from the user's intention, an additional instruction request is issued (1327) and the process proceeds to step (1322). Return and continue pointing operation. Similarly, if the user has not explicitly confirmed the pointing of interest, it is assumed that the detection result is not as intended, and the above processing is repeated, or it is regarded as silent consent.
  • interactive communication can be performed between the knowledge information processing server system 300 including the image recognition system and the user.
  • the intelligent server system confirms to the user that “the target is 1302?” However, it is possible to ask the question again, "Yes, but what is this?”
  • the position information sensor 208 provided in the headset system 200 is used to detect the movement state of the headset system each time, so that the user wearing the headset system can A procedure for detecting the possibility of starting to pay attention to a certain object will be described.
  • FIG. 21 shows a state transition regarding the operation of the headset system 200.
  • the operation start (1400) state is a state in which the headset system starts to move from a certain stationary state. In the movement of the headset system, in addition to the parallel movement of the headset system itself (up / down, left / right, front / rear), the position of the headset system itself remains unchanged, and the direction of the headset system is changed by the user's swinging motion ( (See left and right, see up and down) motion.
  • Stop (1403) is a state where the headset system is stationary.
  • the short-time stationary (1404) state is a state where the headset system is temporarily stationary.
  • the long-time stationary (1405) state is a state where the headset system is stationary for a while.
  • the headset system transitions (1410) to a stopped (1403) state.
  • the stop (1403) state continues for a predetermined time or longer, the state transits (1411) to the short-time stationary (1404) state.
  • the short-time control state (1404) continues for a certain period of time and then remains stationary for a long time, the state transitions (1413) to the long-time stationary state (1405).
  • the headset system starts moving again from the short-time stationary state (1404) or the long-time stationary state (1405), the state transitions again to the operation start (1400) state (1412 or 1414).
  • the headset when the headset is in a short-time stationary (1404) state, it is determined that there is a possibility that the user has started to focus on an object in front of the eyes, and the knowledge information processing provided with the image recognition system
  • the camera incorporated in the headset system can be automatically put into the shooting start state to prepare for the subsequent series of processing.
  • the user's wearing reaction of the headset system may include actions such as tilting the neck (question), shaking the neck left and right (denial), and shaking the head up and down (consent). It is also possible to detect from data that can be detected from the position information sensor 208 provided in the.
  • the swing gestures frequently used by these users may vary depending on local customs and habits of each user. Therefore, it is necessary for the server system side to acquire the individual user's individual or region-specific gestures after learning, and to retain and reflect the attribute.
  • FIG. 22 shows an example of photo extraction in one embodiment of the present invention.
  • the photographic image is a closed area surrounded by a rectangular area affine-transformed according to the viewpoint position
  • the size of the object detected from within that area exists on a scale that is significantly different from the size of the object outside that area.
  • a general object that should originally be a three-dimensional object included in a specific area, or each feature point extracted from the specific object does not cause a relative position shift accompanying the user's viewpoint movement, and the specific closed area
  • information on the distance to an object that can be acquired from a camera that can directly detect the depth information of an image, or depth information of an object that can be acquired from binocular parallax from multiple camera images can be acquired.
  • the closed area is likely to be a flat printed matter or a photograph. It is possible that. In a similar situation, the scenery outside the window can satisfy the same conditions, but it may be possible to estimate to some extent from the surrounding situation whether it is a window or a flat image.
  • the photographs themselves are regarded as one specific object, and similar information is obtained by inquiring the knowledge information processing server system 300 provided with the image recognition system. Search for photos. As a result, if a similar or similar photographic image is found, connecting other users who are viewing, have seen, or are likely to view the same or similar photographic image in different time spaces. Is possible.
  • 23A and 23B will be used to describe a conversation with a target of interest in one embodiment of the present invention.
  • the camera captures the user's image of interest (1600).
  • the target image is recognized from the camera image reflecting the user's subjective field of view by the cooperative operation with the image recognition system 301 on the network through the target extraction process shown in FIG. 3A (1602).
  • a graph structure related to the target of interest is extracted from the graph database 365, and a node group related to messages and tweets remaining in the target of interest is extracted (1603).
  • the ACL specifying the recipient of the message or tweet is confirmed (1604).
  • the message or tweet associated with the target node group is displayed as the headset system 200 or the network terminal 220 of the user.
  • it can be notified by voice, image, figure, illustration, or character information (1605).
  • a mechanism for the user to speak further toward the target of interest by speaking (1606) in response to the message or tweet.
  • the utterance content is recognized by a cooperative operation with the voice recognition system 320 (1607) and converted into an utterance character string.
  • the character string is sent to the conversation engine 430, and based on the interest graph related to the user, the conversation engine 430 on the knowledge information processing server system 300 side selects an optimal topic from time to time (1608), and the voice It can be delivered as voice information to the headset system 201 of the user via the synthesis system 330. Thereby, the user can continue continuous voice communication with the server system.
  • the knowledge information processing server system 300 sends a response to the question as detailed information in the MDB 111 or a related node related to the subject of interest. Pull out from the group and notify the user by voice information.
  • continuous topics can be extracted from the server system side by tracing the related nodes related to the topic on the basis of the user's interest graph, and can be provided in a timely manner.
  • the continuous conversation is repeated by returning to step 1606 as long as the utterance by the user continues, and continues until the utterance of the user disappears (1609), and then ends.
  • the interactive conversation between the wide range of users and the knowledge information processing server system 300 described above can also play an important role as a learning path of the interest graph unit 303 itself.
  • the user prompts frequent conversation with respect to a specific target or topic, it is assumed that the user has a very strong interest in the target or topic, and a node related to the interest and a node related to the user It is possible to add weights to direct or indirect links.
  • the user refuses continuous conversation for a specific target or topic, the user may have lost interest in the target or topic. It is also possible to reduce the weight for the direct or indirect link between the node and the node related to the user.
  • this embodiment may be configured such that a two-way conversation between the user and the knowledge information processing server system 300 is started from an intermediate step.
  • FIG. 23B shows a configuration example of the conversation engine 430 in an embodiment of the present invention.
  • the input to the conversation engine is a graph structure 1640 centered on the target node and an utterance character string 1641 from the speech recognition system 320.
  • the former takes out information related to the object through related node extraction 1651 and sends it to keyword extraction 1650.
  • a plurality of keyword groups are extracted with reference to the ontology dictionary 1652 based on the utterance character string and the information.
  • topic extraction 1653 selects one from the plurality of keyword groups.
  • topic history management is performed so as not to repeat the same conversation.
  • the keyword extraction may be configured such that a new keyword group that is frequently referred to by other users or is highly interested in the user is preferentially extracted.
  • a reaction sentence converted into a natural colloquial form is created 1642 while referring to the conversation pattern dictionary 1655 in the reaction sentence generation 1654 and delivered to the subsequent speech synthesis system 330.
  • the conversation pattern dictionary 1655 describes the rules of sentences recalled from the keyword group. For example, for a user utterance with “Hello!”, Reply “I'm fine tank you. And you?”, For a user utterance with “I”, reply “you”, or “I “Like it.” describes typical conversation rules such as “Would you like to talk about?”. Response rules may include variables. In that case, the variable is assigned from the user's utterance.
  • the knowledge information processing server system 300 side selects a keyword group according to the user's interest from the contents described in the interest graph unit 303 stored in the server system, and based on the interest graph.
  • the user By generating an appropriate response sentence, it becomes possible for the user to continue to have a strong motivation to continue the conversation, and at the same time, it can be configured to have a feeling of having a conversation with the target.
  • the graph database 365 records a specific user including itself, a specific user group, or a node group corresponding to the entire user. Groups and nodes that record messages and tweets left for them are linked together to form a graph structure.
  • the statistical information processing unit 363 extracts a keyword group related to the message or tweet, and the situation recognition unit 305 selectively transmits the related voice, image, figure, You may comprise a present Example so that it may notify by an illustration or character information.
  • FIG. 24 a cooperative operation between the headset systems when two or more head systems 200 are connected to one network terminal 220 will be described as an embodiment of the present invention.
  • four users each wear the headset system 200, and the direction in which each user is looking is shown.
  • a marker or the like for position calibration is displayed on the shared network terminal (1701 to 1704), and is constantly monitored by a camera incorporated in each user's headset system. It is possible to grasp the mutual positional relationship and movement of users.
  • a time-axis modulated image pattern is displayed on the display device of the shared network terminal, captured by a camera image provided in each user's headset system, and demodulated, and the same positional relationship is obtained. You may ask.
  • the network terminal can automatically perform calibration of the field of view and line of sight of each camera, calibration of each user's headset system and the shared network terminal, and tracking processing. You can always know the position. Thereby, the network terminal side can recognize which user the input operation is related to the GUI operation on the shared network terminal. Thereby, on the shared display device of the shared network terminal, it is possible to display a sub-screen group having an alignment for each user in consideration of the position of each user.
  • the user asked a question related to the target on the network. It is possible to keep the target object that has become unknown by allowing other users to send new information and answers to those unknown objects via the network. A procedure for selecting and learning necessary information from the inside will be described.
  • the procedure 1800 starts from a voice input trigger 1801 by the user.
  • the voice input trigger may be uttered by a specific word by a user, a sudden change in sound pressure level picked up by a microphone, or a GUI of the network terminal unit 220. Moreover, it is not restrict
  • uploading of the camera image is started (1802), and a voice command is waited (1803).
  • voice recognition processing 1804
  • the user utters a command group for extracting the target of interest by voice
  • they are subjected to voice recognition processing (1804). For example, the pointing processing of the target of interest by voice using the means shown in FIG. It is determined whether it is completed (1805).
  • a camera image related to the inquiry target and a question or comment by the user's voice are set and issued on the network (1809).
  • Wiki if there is a new information provision or answer from Wiki, it is collected (1810), and the content is verified by the user or a large number of users and / or the knowledge information processing server system 300 side (1811).
  • the verification process the validity of the received answer is determined. If the verification is passed, the target is newly registered (1812). In the new registration, each node group corresponding to the question, comment, information, and answer is generated, associated as a node group related to the target, and recorded in the graph database 365. If the verification fails, a hold process 1822 is performed. In the hold process, the fact that the inquiry process to Wiki in step 1808 or 1818 is incomplete is recorded, and the information / answer collection process from Wiki in step 1810 is backed up until an answer that passes the verification is collected. Continue on the ground.
  • the process proceeds to the target image recognition process (1813).
  • the specific object recognition is performed by the specific object recognition system 110 in the present embodiment.
  • the general object recognition is performed by the general object recognition system 106. Shows the scene recognition performed by the scene recognition system 108, but these image recognition processes themselves are not necessarily executed serially as in the present example, and each of them is individually or in parallel. The recognition units may be further executed in parallel. Or you may combine, after optimizing each of them.
  • a voice reconfirmation message is issued to the user (1820), and if the user can confirm it correctly, the upload of the camera image is terminated. (1821) and the series of target image recognition processing is terminated (1823).
  • the target remains unconfirmed (1817), and an inquiry to Wiki on the network is started (1818).
  • the contents and validity of the new information and answer group received from Wiki are verified (1811). If the verification is passed, the target is registered (1812). In the registration, a node group corresponding to the question / comment and information / answer is generated and recorded in the graph database 365 in association with the node group related to the target.
  • the position information sensor may be a GPS (Global Positioning System), but is not limited thereto.
  • the position information detected by the position information sensor and the absolute time are added to the image taken by the camera 203 provided in the headset system, and uploaded to the knowledge information processing server system 300 provided with the image recognition system.
  • the information recorded in the graph database 365 can be calibrated.
  • FIG. 26A is an example of a graph structure related to the graph database image 504 (FIG. 13A) before the upload. Since the “sun” is “just above”, it can be estimated that the time zone is around noon.
  • FIG. 26B shows an example of the graph structure after the image upload.
  • the time corresponding to the image can be accurately determined. Further, it is possible to calibrate the error inherent in the position information itself detected by the position information sensor 208 based on the recognition result by the server system from the image captured by the camera.
  • the server system can be configured to add a useful information group obtained therefrom to the graph structure related to the image 504.
  • an object in the uploaded image is determined as a suspicious object in the knowledge information processing server system 300 provided with the image recognition system
  • information that can be obtained by image analysis of the suspicious object is obtained. It can be recorded in the graph database 365 as an information group related to such a suspicious object.
  • the presence or discovery of the suspicious object may be automatically and promptly notified to a specific user or organization that can be set in advance.
  • collation with a previously registered suspicious object or an object in a normal state can be performed by a cooperative operation with the graph database 365.
  • the present system may be configured so that the suspicious situation or scene can be detected.
  • a GUI on the user's network terminal 220 may be used to specify the discovery target.
  • the knowledge information processing server system 300 provided with the image recognition system pushes data related to a specific discovery target image and a necessary detection filter group onto the network terminal of the user, and the server system side specifies You may comprise so that a discovery object can be searched jointly among a wide range of users.
  • a node related to the specified discovery object from the graph database 365 in the server system
  • the specific image detection filter group optimized to detect the target is obtained. You may comprise so that it can do.
  • the headset system 200 and the network terminal 220 worn by the user may be configured as an integral unit.
  • a wireless communication system that can be directly connected to the network in the headset system, and a translucent display that covers a part of the user's field of view are incorporated, and a part of the network terminal is incorporated in the headset system itself, or The whole function may be incorporated and configured as an integral unit.
  • the display unit 222 can be integrated into the image output device 207.
  • the wireless communication device 211 in the headset system is responsible for communication between the network terminals, they can also be integrated into the network communication unit 223.
  • the other image feature detection unit 224, CPU 225, and storage unit 226 can be incorporated into the headset.
  • FIG. 28 shows an example of processing performed by the network terminal 220 alone in a situation where the network connection with the server is temporarily disconnected. Temporary interruptions in network connections can occur frequently when moving into tunnels, concrete-covered buildings, or on aircraft. In addition, when the radio wave condition deteriorates for various reasons, or when the maximum number of connected cells set for each radio base station is exceeded, the network connection speed tends to decrease significantly.
  • 28A and 28F show the main functional block configurations of the headset system 200 worn by the user and the network terminal 220 of the user.
  • various applications can be resident in the form of software that can be downloaded through a network by a CPU 226 incorporated therein.
  • the executable program scale, the amount of information that can be referred to, or the amount of data itself is significantly limited as compared with the configuration on the server, the knowledge information processing server system 300 provided with the image recognition system has a problem.
  • FIG. 28D shows the main functional unit configuration of the image recognition system 301 constructed on the server side.
  • the specific object recognition system 110 the general object recognition system 106, and the scene recognition system 108
  • all of the existing or present existing image recognition targets that have been required up to now are included. Covers objects, people, photographs, or entire scenes that can have proper / generic nouns attached. These infinite types and targets must be prepared originally, and further learning will be necessary in the future to discover objects and events and increase the number of items to be recognized, and the overall execution environment itself is extremely It does not reach the hands of network terminals that have limited information processing capacity and memory capacity. Their comprehensive functions are placed on powerful computer resources on the server side and huge database systems via the network. It will be taken.
  • the image recognition program 229 that can be executed on the network terminal 220 shown in (A) a feature data group that has been learned that is made resident on the recognition engine 224 after being downloaded from the server side and that is necessary for each recognition target.
  • a feature data group that has been learned that is made resident on the recognition engine 224 after being downloaded from the server side and that is necessary for each recognition target.
  • the interactive voice conversation function with the knowledge information processing server system 300 provided with the image recognition system is executed under certain restrictions by the voice recognition program 230 and the voice synthesis program 231 on the network terminal 220. It becomes possible.
  • the speech recognition system 320, the speech synthesis system 330, and the speech synthesis system 330 in the conversation engine 430 constituting the server system are included. It is necessary to download in advance the minimum necessary execution program group and data set from the speech recognition dictionary database 321 and the conversation pattern dictionary 1655, which are corresponding knowledge databases, into the storage unit 227 on the user's network terminal 220. .
  • the speech candidate system is voiced by the voice synthesis system 330 on the network in advance, and then the compressed voice is recorded.
  • Data may be downloaded onto the storage unit 227 on the user's network terminal 220 as data.
  • various images are transmitted from a network terminal represented by a PC or a smartphone with a camera, or the headset system to the knowledge information processing server system 300 provided with the image recognition system via the Internet.
  • a network terminal represented by a PC or a smartphone with a camera, or the headset system
  • various image components that can be recognized by the server system from the image or a specific object, general object, person, or scene included in the image
  • a group of nodes corresponding to the group, and / or metadata attached to the image and / or a keyword group that can be extracted from a message or tweet of the user related to the image, and / or communication between users related to the image, It can be extracted as a node group.
  • an image related to a specific target or scene that can be specified by the user, or a specific place or region can be selected and extracted, and based on them, an album that collects similar or similar objects and scenes can be created, or an image group related to a certain place or region can be extracted.
  • videos from a plurality of viewpoint directions when they are taken from a specific object, videos from a plurality of viewpoint directions, Alternatively, it can be aggregated as videos taken in different environments, or if they are a group of images related to a specific place or region, it is possible to move various viewpoints by connecting them to continuous and / or discrete panoramic images. Make it possible.
  • the metadata that is a component group of panoramic images that can identify the location and area, the metadata that is attached to each image uploaded via the Internet, or the knowledge information processing server system 300 that includes the image recognition system Estimate or acquire the time or period when the object existed by inquiring a wide range of users via the Internet or various knowledge databases on the Internet. It is possible to sort the image group along the time axis based on the time axis information, and to reconstruct the panoramic image at an arbitrary time point or period that can be designated by the user based on the sorted image group. . As a result, the user can specify an arbitrary “time-space” including an arbitrary place and region, and can move the viewpoint of the real-world video existing on the “time-space” as the panoramic image. You will be able to enjoy it.
  • a group of users who are highly interested in the object or deeply related to the specific place or region are displayed in the graph database.
  • 365 based on the target, or network communication organized for each specific place or area by those users, and from there, various kinds of information related to the specific object or specific place or area It becomes possible to construct a network communication system that enables sharing of comments, messages and tweets, provision of new information by participating users, or a search request for specific unknown / missing / missing information.
  • FIG. 29 three photos and photos (A) extracted by designating a specific “time-space” from the image group uploaded on the server system in one embodiment according to the present invention.
  • a photograph (B) and a photograph (C) are shown as examples.
  • the state of the Tokyo Nihonbashi neighborhood in the first half of 1900 is shown.
  • the “Nomura Securities” headquarters building which is known as a landmark building in the center of the left side of the screen, can recognize a specific object. It shows a building that looks like a “warehouse” and two “trams” on a bridge that can recognize general objects.
  • the above-described series of image recognition processing is executed by a cooperative operation with the specific object recognition system 110, the general object recognition 106, and the scene recognition system 108 provided in the image recognition system 301.
  • the user specifies only the spatio-temporal information, and only the image group photographed in the spatio-temporal space is extracted.
  • a spatio-temporal movement display system that can be reconstructed into a continuous or discrete panoramic image and the user can freely move the viewpoint in the space, or can freely move in the space. This will be described using a schematic implementation example.
  • image upload (2200) is started via the network terminal 220 of the user to the knowledge information processing server system 300 provided with the image recognition system via the Internet.
  • the uploaded image starts image recognition processing in the image recognition system 301 (2201). If metadata has been assigned to the image file in advance, metadata extraction processing (2204) is executed. When character information is found in the image, character information extraction processing (2203) is performed using OCR (Optical Character Recognition), etc., and then metadata extraction processing (2204) is performed. Get useful metadata.
  • the image on each object in the image is clipped by the user's GUI on the network terminal 220 or the pointing processing of the target of interest by the voice described in FIG. 3A.
  • the processing is performed, the object is narrowed down by the MDB search unit 110-02 according to the class information recognized by the general object recognition system 106 and the scene recognition system 108 for the target, and detailed information about the image
  • the specific object recognition system 110 performs comparison and collation processing with the object by referring to the MDB 111 in which the reference object is described.
  • the metadata group is referred to and the time axis information is added to the image. Is there? It is determined whether or not (2205).
  • the time information in which the object group in the image exists is extracted from the description in the MDB 111, and it is determined whether or not the object exists within the time on reference (2206). To do. If the existence is confirmed, whether there is an object that cannot be recognized in the same time with respect to another object other than the object that can be recognized as an image (2207), similarly from the description in the MDB 111 When the determination is made and all the consistency is confirmed, the process of estimating the shooting time (2208) for the image is performed. In other cases, the time information is unknown (2209) and the node information is updated.
  • information related to the location exists in the image (2210)
  • information related to the location where the object group in the image exists is extracted from the description in the MDB 111, and the object exists at the location on reference. It is determined whether or not (2210). If the existence is confirmed, it is determined whether there is any object that cannot be recognized in the same place for other objects other than the object (2211), similarly from the description in the MDB 111. Then, at the time when all the consistency is confirmed, an estimation (2212) process of the place where the image is taken is performed. In other cases, the location information is unknown (2213) and the node information is updated.
  • the metadata group that can be extracted from the image itself that can be acquired or attached to the image itself and the spatiotemporal information that can be estimated are collated again,
  • the acquisition (2214) of the spatiotemporal information relating to the entire image is completed, and the spatiotemporal information is linked to the node relating to the image (2215). If there is a flaw in the above consistency, it is assumed that there is an error in the metadata itself, a recognition error in the image recognition system, or an error or deficiency in the contents described in the MDB 111, and a subsequent re-verification process is prepared.
  • the user can specify an arbitrary spatio-temporal and extract an image group that matches the condition (2216).
  • an image group captured at an arbitrary place (2217) and an arbitrary time (2218) is extracted from a large number of image groups by tracing a node related to the designated time space (2219).
  • a panoramic image can be reconstructed by continuously connecting the detected specific feature points (2220). It becomes possible.
  • it can be reconstructed as a discrete panoramic image by performing extensive estimation processing from available information such as a map, drawing, or design drawing described in the MDB 111. It becomes possible.
  • the knowledge information processing server system 300 including the image recognition system continuously performs the learning process for acquiring the series of spatiotemporal information on a large number of uploaded images (including moving images).
  • a continuous panoramic image having spatiotemporal information can be acquired.
  • the user can specify an arbitrary time / space and enjoy an image experience (2221) related to an arbitrary viewpoint movement or an arbitrary time in the same space.
  • a GUI operation on the user's network terminal or
  • the result recognized by the server system by the selective extraction process related to the specific object, the general object, the person, or the scene focused on by the user by the pointing operation by the sound processing can be designated in advance including the user together with the input image.
  • the configuration of a network communication system that can be shared among a wide range of users will be described.
  • the image 2101 uploaded by the user is subjected to selection / extraction processing 2103 in the server system.
  • the user may execute the selection / extraction process according to the procedure shown in FIG. 3A, or the selection / extraction command shown in FIG. 30 may be executed by operating the GUI 2104. May be.
  • the image cut out by the selection / extraction process is recognized by the image recognition system 301.
  • the result is analyzed / classified / accumulated in the interest graph unit 303 and recorded in the graph database 365 together with the keyword group and spatiotemporal information.
  • the user may perform writing using a message, a tweet 2106, or character information 2105 when uploading an image. These messages, tweets, and text information issued by the user are also analyzed / classified / accumulated in the interest graph section.
  • the user, the user group including the user, or the entire user can select an image recorded from the interest graph section based on the keyword group and / or spatiotemporal information (2106) related to the target. It is possible to induce a wide range of network communication related to the image. Further, the communication between the wide range of users is observed / accumulated on the server system side and analyzed by the statistical information processing unit 363 which is one component of the interest graph unit 303, so that the user-specific or specific Link dynamic interests and curiosity that are specific to a user group or common to all users and their transitions between the broad user group, keywords that can be extracted, and nodes related to various subjects of interest. It can be acquired as a dynamic interest graph.
  • the system according to the present invention can be configured as a more convenient system by combining with various existing technologies. Examples are given below.
  • a user's utterance is picked up by a microphone incorporated in the headset system 200, and a word string and a syntax included in the utterance are extracted by the voice recognition system 320, and then automatically translated on a network.
  • the voice recognition system 320 By utilizing the system, it is possible to translate the translated word string into a different language, convert the translated word string into speech by the speech synthesis system 330, and then transmit it to other users as a message or tweet of the user.
  • the voice information from the knowledge information processing server system 300 provided with the image recognition system can be received in a language that can be specified by the user.
  • the signal source when a specific image modulation pattern is extracted together with a predetermined recognition marker from an image captured by a camera incorporated in a user's headset system in the field of view, the signal source If the signal source is in the display device or in the vicinity thereof when the presence is alerted to the user, the modulated image pattern is demodulated by a cooperative operation with the recognition engine 224, and address information such as URL obtained therefrom Can be sent via the user's headset system by referring to the user via the Internet. This makes it possible to effectively send audio information related to the display image to the user from various display devices that the user has accidentally seen. As a result, the effectiveness of digital signage as an electronic advertising medium can be further enhanced.
  • voice information is sent all at once from all digital signage that users can see, it may be felt as unnecessary noise in some cases, so the above-mentioned related to each user Based on the interest graph, only an advertisement or the like that reflects a different preference for each user may be selected and sent as different audio information for each individual user.
  • the server system 300 can be configured so as to be prepared for a situation where the biometric information value of the user suddenly changes when an event is encountered or when the possibility of encounter is increased.
  • the biometric information that can be acquired includes the user's body temperature, heart rate, blood pressure, sweating, skin surface condition, myoelectric potential, brain waves, eye movement, vocalization, head movement, body movement, and the like.
  • the server system side starts accumulating / analyzing related biological information at the same time.
  • the analysis of the camera image is started, and the image component group that can be extracted from the camera image can be registered in the graph database 365 and the user database 366 as a cause element group that may be related to the situation. To do.
  • a specific object, a general object, a person, a photograph, or a scene that can be predicted to be a cause of an abnormal change in the biometric information value that is different for each individual user.
  • the server system side promptly informs the user of the possibility by voice and / or text, image, vibration, etc. via the network. It is possible to configure the server system to notify.
  • the knowledge information processing server system 300 side including the recognition system can be configured.
  • biometric authentication system when biometric authentication is possible by acquiring a user-specific voiceprint, vein pattern, retinal pattern, or the like from the headset system that can be worn by the user on the head, the user and the image
  • This system can be configured such that the knowledge information processing server system 300 provided with the recognition system is uniquely bound.
  • the biometric authentication device can be incorporated into the headset system of the user, it can be configured to automatically log in and log out when the headset system is attached or detached. By constantly monitoring the association using the biometric information on the server system side, unauthorized login and unauthorized use by different users can be eliminated.
  • the user authentication is normally performed, the following information group is bound to the user.
  • (1) User profile that can be set by the user (2) User voice (3) Camera image (4) Spatio-temporal information (5) Biometric information (6) Other sensor information
  • the face part and / or the user is specified for each user.
  • Specific portions of possible images are extracted and detected by the image recognition system 301 incorporated in the knowledge information processing server system 300 provided with the image recognition system, and the specific image areas are indistinguishable. It can be configured to automatically filter. This makes it possible to set certain browsing restrictions including privacy protection.
  • a plurality of cameras can be installed in a headset system that a user can wear on the head.
  • imaging parallax can be provided in a plurality of cameras as one embodiment.
  • a three-dimensional camera that can directly measure the depth (distance) to the target object using a plurality of imaging elements having different properties can be incorporated.
  • a specific target designated by the server system or a surrounding area for a specific user designated by the server system in response to a voice instruction from the knowledge information processing server system 300 provided with the image recognition system By requesting the server system to shoot the user from various viewpoints, the server system side can grasp the subject in three dimensions or the surrounding situation.
  • the server system can be configured so that the related database group including the MDB 111 in the server system can be updated based on the image recognition result.
  • a depth sensor having directivity can be incorporated in a headset system that can be worn by a user on the head. As a result, it is possible to detect a movement of a living body or an object including a person approaching the user wearing the headset system and notify the user of the situation concerned by voice.
  • the camera and the image recognition engine incorporated in the user's headset system are automatically activated, and the portion of the user's network terminal where real-time processing is required so as to be able to immediately respond to sudden approach of an unexpected object
  • the knowledge information processing server system 300 equipped with the image recognition system can execute and execute a part that requires advanced information processing, a specific object that approaches the user It is possible to quickly identify / analyze a specific person, a specific animal, etc., and promptly call the result to the user by voice information or vibration.
  • an imaging system capable of photographing all directions including a periphery around the user or an upper part and a lower part of the headset system that can be worn on the head of the user.
  • a plurality of cameras capable of photographing a field of view from the rear or side that is outside the user's subjective field of view to the user's headset system.
  • an environmental sensor group capable of measuring the following environmental values can be arbitrarily incorporated in a headset system that can be worn by the user on the head.
  • Ambient brightness (luminosity) (2) Color temperature of illumination and external light
  • Ambient environmental noise (4) Ambient sound pressure level This makes it possible to reduce the ambient environmental noise and cope with the optimal camera exposure state, and to recognize the image It becomes possible to improve the recognition accuracy of the system and the recognition accuracy of the speech recognition system.
  • a translucent display device can be incorporated in a headset system that can be worn on the head of a user so as to cover a part of the field of view of the user.
  • the head head system can be configured integrally with a display display as a head mounted display (HMD) or a scouter.
  • An apparatus that enables such a display system includes an image projection system called retinal sensing that directly scans and projects image information onto the user's retina, or a device that projects an image on a translucent reflector placed in front of the eye. It has been known.
  • a part or all of the image displayed on the display screen of the user's network terminal can be displayed on the display device. Communication with the knowledge information processing server system 300 provided with the image recognition system directly via the Internet is possible without taking it out in front of the eyes.
  • a line-of-sight detection sensor may be provided in the form of the HMD that can be mounted on the head by the user, the scouter, or the same.
  • An optical sensor array may be used for the line-of-sight detection sensor, and the position of the user's pupil is detected and the line-of-sight position of the user is extracted at high speed by measuring the reflected light of the light beam emitted from the optical sensor array. I can do it.
  • a dotted line frame 2001 is a visual field image of the scouter 2002 worn by the user.
  • the viewpoint marker 2003 may be superimposed and displayed on the target in the direction of the user's line of sight. In that case, it is possible to enable calibration by an instruction by the user's voice so that the position of the viewpoint marker is displayed at the same position as the target.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
PCT/JP2012/076303 2011-10-14 2012-10-11 画像認識システムを備えた知識情報処理サーバシステム Ceased WO2013054839A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/351,484 US20140289323A1 (en) 2011-10-14 2012-10-11 Knowledge-information-processing server system having image recognition system
EP12840365.6A EP2767907A4 (en) 2011-10-14 2012-10-11 SERVER SYSTEM FOR PROCESSING KNOWLEDGE INFORMATION WITH PICTURE IDENTIFICATION SYSTEM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2011226792A JP5866728B2 (ja) 2011-10-14 2011-10-14 画像認識システムを備えた知識情報処理サーバシステム
JP2011-226792 2011-10-14

Publications (1)

Publication Number Publication Date
WO2013054839A1 true WO2013054839A1 (ja) 2013-04-18

Family

ID=48081892

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/076303 Ceased WO2013054839A1 (ja) 2011-10-14 2012-10-11 画像認識システムを備えた知識情報処理サーバシステム

Country Status (4)

Country Link
US (1) US20140289323A1 (https=)
EP (1) EP2767907A4 (https=)
JP (1) JP5866728B2 (https=)
WO (1) WO2013054839A1 (https=)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015118496A (ja) * 2013-12-18 2015-06-25 株式会社日本総合研究所 カタログ出力装置、カタログ出力方法、およびプログラム
WO2015130383A3 (en) * 2013-12-31 2015-12-10 Microsoft Technology Licensing, Llc Biometric identification system
JP2016211955A (ja) * 2015-05-08 2016-12-15 古河電気工業株式会社 橋梁点検支援装置、橋梁点検支援方法、橋梁点検支援システム、およびプログラム
US10448762B2 (en) 2017-09-15 2019-10-22 Kohler Co. Mirror
US10663938B2 (en) 2017-09-15 2020-05-26 Kohler Co. Power operation of intelligent devices
WO2020188626A1 (ja) * 2019-03-15 2020-09-24 和夫 金子 視覚支援装置
US10887125B2 (en) 2017-09-15 2021-01-05 Kohler Co. Bathroom speaker
US11099540B2 (en) 2017-09-15 2021-08-24 Kohler Co. User identity in household appliances
CN116246322A (zh) * 2023-02-16 2023-06-09 支付宝(杭州)信息技术有限公司 一种活体防攻击方法、装置、存储介质及电子设备
US11921794B2 (en) 2017-09-15 2024-03-05 Kohler Co. Feedback for water consuming appliance

Families Citing this family (152)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2203865A2 (en) 2007-09-24 2010-07-07 Apple Inc. Embedded authentication systems in an electronic device
US8600120B2 (en) 2008-01-03 2013-12-03 Apple Inc. Personal computing device control using face detection and recognition
GB2503163B (en) 2011-03-22 2019-05-29 Nant Holdings Ip Llc Reasoning Engines
US11165963B2 (en) 2011-06-05 2021-11-02 Apple Inc. Device, method, and graphical user interface for accessing an application in a locked device
US9002322B2 (en) 2011-09-29 2015-04-07 Apple Inc. Authentication with secondary approver
WO2013126753A1 (en) * 2012-02-22 2013-08-29 Master Lock Company Safety lockout systems and methods
US20130232412A1 (en) * 2012-03-02 2013-09-05 Nokia Corporation Method and apparatus for providing media event suggestions
US9589000B2 (en) 2012-08-30 2017-03-07 Atheer, Inc. Method and apparatus for content association and history tracking in virtual and augmented reality
US9424472B2 (en) * 2012-11-26 2016-08-23 Ebay Inc. Augmented reality information system
US9620107B2 (en) * 2012-12-31 2017-04-11 General Electric Company Voice inspection guidance
US9479470B2 (en) * 2013-01-25 2016-10-25 Ayo Talk Inc. Method and system of providing an instant messaging service
US9898749B2 (en) * 2013-01-30 2018-02-20 Wal-Mart Stores, Inc. Method and system for determining consumer positions in retailers using location markers
US9547917B2 (en) * 2013-03-14 2017-01-17 Paypay, Inc. Using augmented reality to determine information
US10541997B2 (en) * 2016-12-30 2020-01-21 Google Llc Authentication of packetized audio signals
US10318583B2 (en) * 2013-03-15 2019-06-11 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for recommending relationships within a graph database
US10262462B2 (en) 2014-04-18 2019-04-16 Magic Leap, Inc. Systems and methods for augmented and virtual reality
US20140379336A1 (en) * 2013-06-20 2014-12-25 Atul Bhatnagar Ear-based wearable networking device, system, and method
WO2014208575A1 (ja) * 2013-06-28 2014-12-31 日本電気株式会社 映像監視システム、映像処理装置、映像処理方法および映像処理プログラム
US10295338B2 (en) 2013-07-12 2019-05-21 Magic Leap, Inc. Method and system for generating map data from an image
JP5784077B2 (ja) * 2013-07-12 2015-09-24 ヤフー株式会社 情報処理装置及び方法
GB2517212B (en) * 2013-08-16 2018-04-25 Toshiba Res Europe Limited A Computer Generated Emulation of a subject
JP5787949B2 (ja) * 2013-08-28 2015-09-30 ヤフー株式会社 情報処理装置、特定方法および特定プログラム
US9898642B2 (en) 2013-09-09 2018-02-20 Apple Inc. Device, method, and graphical user interface for manipulating user interfaces based on fingerprint sensor inputs
WO2015041641A1 (en) * 2013-09-18 2015-03-26 Intel Corporation Automated image cropping and sharing
KR102120864B1 (ko) * 2013-11-06 2020-06-10 삼성전자주식회사 영상 처리 방법 및 장치
US20150162000A1 (en) * 2013-12-10 2015-06-11 Harman International Industries, Incorporated Context aware, proactive digital assistant
US10986223B1 (en) * 2013-12-23 2021-04-20 Massachusetts Mutual Life Insurance Systems and methods for presenting content based on user behavior
US10362112B2 (en) 2014-03-06 2019-07-23 Verizon Patent And Licensing Inc. Application environment for lighting sensory networks
US10885095B2 (en) * 2014-03-17 2021-01-05 Verizon Media Inc. Personalized criteria-based media organization
JP2015207181A (ja) * 2014-04-22 2015-11-19 ソニー株式会社 情報処理装置、情報処理方法及びコンピュータプログラム
US9483763B2 (en) 2014-05-29 2016-11-01 Apple Inc. User interface for payments
US10325205B2 (en) * 2014-06-09 2019-06-18 Cognitive Scale, Inc. Cognitive information processing system environment
US9396698B2 (en) * 2014-06-30 2016-07-19 Microsoft Technology Licensing, Llc Compound application presentation across multiple devices
JP6418820B2 (ja) * 2014-07-07 2018-11-07 キヤノン株式会社 情報処理装置、表示制御方法、及びコンピュータプログラム
JP2016024282A (ja) * 2014-07-17 2016-02-08 Kddi株式会社 語学教材生成システム、語学教材生成装置、携帯端末、語学教材生成プログラム、および語学教材生成方法
JP5871088B1 (ja) * 2014-07-29 2016-03-01 ヤマハ株式会社 端末装置、情報提供システム、情報提供方法およびプログラム
JP5887446B1 (ja) * 2014-07-29 2016-03-16 ヤマハ株式会社 情報管理システム、情報管理方法およびプログラム
KR102024867B1 (ko) * 2014-09-16 2019-09-24 삼성전자주식회사 예제 피라미드에 기초하여 입력 영상의 특징을 추출하는 방법 및 얼굴 인식 장치
CN107210950A (zh) 2014-10-10 2017-09-26 沐择歌有限责任公司 用于共享用户交互的设备
US20160124521A1 (en) * 2014-10-31 2016-05-05 Freescale Semiconductor, Inc. Remote customization of sensor system performance
US10127214B2 (en) * 2014-12-09 2018-11-13 Sansa Al Inc. Methods for generating natural language processing systems
WO2016093553A1 (ko) * 2014-12-12 2016-06-16 서울대학교 산학협력단 이벤트 데이터를 수집하는 시스템, 이벤트 데이터를 수집하는 방법, 이벤트 데이터를 수집하는 서비스 서버 및 카메라
KR102290419B1 (ko) * 2015-01-13 2021-08-18 삼성전자주식회사 디지털 컨텐츠의 시각적 내용 분석을 통해 포토 스토리를 생성하는 방법 및 장치
EP3051810B1 (en) 2015-01-30 2021-06-30 Nokia Technologies Oy Surveillance
US9886633B2 (en) * 2015-02-23 2018-02-06 Vivint, Inc. Techniques for identifying and indexing distinguishing features in a video feed
CN106033418B (zh) * 2015-03-10 2020-01-31 阿里巴巴集团控股有限公司 语音添加、播放方法及装置、图片分类、检索方法及装置
JP6578693B2 (ja) * 2015-03-24 2019-09-25 日本電気株式会社 情報抽出装置、情報抽出方法、及び、表示制御システム
US10078651B2 (en) 2015-04-27 2018-09-18 Rovi Guides, Inc. Systems and methods for updating a knowledge graph through user input
JP6241449B2 (ja) * 2015-05-21 2017-12-06 横河電機株式会社 データ管理システム及びデータ管理方法
US10534810B1 (en) * 2015-05-21 2020-01-14 Google Llc Computerized systems and methods for enriching a knowledge base for search queries
JP6609994B2 (ja) 2015-05-22 2019-11-27 富士通株式会社 表示制御方法、情報処理装置及び表示制御プログラム
KR102404790B1 (ko) 2015-06-11 2022-06-02 삼성전자주식회사 카메라의 초점을 변경하는 방법 및 장치
CN108027823B (zh) 2015-07-13 2022-07-12 帝人株式会社 信息处理装置、信息处理方法以及计算机可读取的存储介质
US20170061218A1 (en) * 2015-08-25 2017-03-02 Hon Hai Precision Industry Co., Ltd. Road light monitoring device and monitoring system and monitoring method using same
US10306267B2 (en) * 2015-08-31 2019-05-28 International Business Machines Corporation System, method, and recording medium for compressing aerial videos
JP6450852B2 (ja) * 2015-09-17 2019-01-09 株式会社日立国際電気 落下物検知追跡システム
US9875081B2 (en) 2015-09-21 2018-01-23 Amazon Technologies, Inc. Device selection for providing a response
US10618521B2 (en) * 2015-09-21 2020-04-14 Ford Global Technologies, Llc Wearable in-vehicle eye gaze detection
US12602661B2 (en) * 2015-11-30 2026-04-14 FAMA Technologies, Inc. System for searching and correlating online activity with individual classification factors
CN105898137A (zh) * 2015-12-15 2016-08-24 乐视移动智能信息技术(北京)有限公司 图像采集、信息推送方法、装置及手机
JP2017136142A (ja) * 2016-02-02 2017-08-10 セイコーエプソン株式会社 情報端末、動作評価システム、動作評価方法、動作評価プログラム、及び記録媒体
US10044710B2 (en) 2016-02-22 2018-08-07 Bpip Limited Liability Company Device and method for validating a user using an intelligent voice print
US10306311B1 (en) 2016-03-24 2019-05-28 Massachusetts Mutual Life Insurance Company Intelligent and context aware reading systems
US10826933B1 (en) 2016-03-31 2020-11-03 Fireeye, Inc. Technique for verifying exploit/malware at malware detection appliance through correlation with endpoints
US10893059B1 (en) 2016-03-31 2021-01-12 Fireeye, Inc. Verification and enhancement using detection systems located at the network periphery and endpoint devices
JP6668907B2 (ja) * 2016-04-13 2020-03-18 沖電気工業株式会社 環境音声配信システム、環境音声処理方法及び環境音声処理プログラム
JP6885402B2 (ja) 2016-06-22 2021-06-16 ソニーグループ株式会社 情報処理装置、情報処理方法、及び、プログラム
JP2017228080A (ja) 2016-06-22 2017-12-28 ソニー株式会社 情報処理装置、情報処理方法、及び、プログラム
US9973522B2 (en) 2016-07-08 2018-05-15 Accenture Global Solutions Limited Identifying network security risks
WO2018016464A1 (ja) * 2016-07-19 2018-01-25 富士フイルム株式会社 画像表示システム、並びにヘッドマウントディスプレイの制御装置とその作動方法および作動プログラム
JP6721832B2 (ja) * 2016-08-24 2020-07-15 富士通株式会社 データ変換プログラム、データ変換装置及びデータ変換方法
CN109691074A (zh) * 2016-09-23 2019-04-26 苹果公司 用于增强的用户交互的图像数据
DK179978B1 (en) 2016-09-23 2019-11-27 Apple Inc. Image data for enhanced user interactions
US10452688B2 (en) 2016-11-08 2019-10-22 Ebay Inc. Crowd assisted query system
US20180182375A1 (en) * 2016-12-22 2018-06-28 Essential Products, Inc. Method, system, and apparatus for voice and video digital travel companion
KR102788911B1 (ko) * 2016-12-26 2025-03-31 삼성전자주식회사 객체의 인식 결과를 제공하는 방법 및 전자 장치
US10198413B2 (en) * 2016-12-30 2019-02-05 Dropbox, Inc. Image annotations in collaborative content items
JP6427807B2 (ja) 2017-03-29 2018-11-28 本田技研工業株式会社 物体認証装置および物体認証方法
CN107391983B (zh) 2017-03-31 2020-10-16 创新先进技术有限公司 一种基于物联网的信息处理方法及装置
EP3611690A4 (en) * 2017-04-10 2020-10-28 Fujitsu Limited RECOGNITION DEVICE, PROCESS AND PROGRAM
US20180314408A1 (en) * 2017-04-28 2018-11-01 General Electric Company Systems and methods for managing views of computer-aided design models
EP4366317A3 (en) 2017-05-16 2024-08-07 Apple Inc. Emoji recording and sending
US10482904B1 (en) 2017-08-15 2019-11-19 Amazon Technologies, Inc. Context driven device arbitration
CN110020101B (zh) * 2017-08-25 2023-09-12 淘宝(中国)软件有限公司 实时搜索场景的还原方法、装置和系统
CN107393541B (zh) * 2017-08-29 2021-05-07 百度在线网络技术(北京)有限公司 信息验证方法和装置
JP2019047234A (ja) * 2017-08-31 2019-03-22 ソニーセミコンダクタソリューションズ株式会社 情報処理装置、情報処理方法、およびプログラム
KR102143148B1 (ko) 2017-09-09 2020-08-10 애플 인크. 생체측정 인증의 구현
KR102185854B1 (ko) 2017-09-09 2020-12-02 애플 인크. 생체측정 인증의 구현
US10955283B2 (en) * 2017-12-18 2021-03-23 Pepper Life Inc. Weight-based kitchen assistant
US10599640B2 (en) 2017-12-19 2020-03-24 At&T Intellectual Property I, L.P. Predictive search with context filtering
WO2019133698A1 (en) * 2017-12-29 2019-07-04 DMAI, Inc. System and method for personalizing dialogue based on user's appearances
WO2019133694A1 (en) 2017-12-29 2019-07-04 DMAI, Inc. System and method for intelligent initiation of a man-machine dialogue based on multi-modal sensory inputs
US11504856B2 (en) 2017-12-29 2022-11-22 DMAI, Inc. System and method for selective animatronic peripheral response for human machine dialogue
US10664512B1 (en) * 2018-02-13 2020-05-26 Snap Inc. Query matching to media collections in a messaging system
WO2019160613A1 (en) 2018-02-15 2019-08-22 DMAI, Inc. System and method for dynamic program configuration
US10339622B1 (en) 2018-03-02 2019-07-02 Capital One Services, Llc Systems and methods for enhancing machine vision object recognition through accumulated classifications
JP7424285B2 (ja) * 2018-04-25 2024-01-30 ソニーグループ株式会社 情報処理システム、情報処理方法、および記録媒体
US10699140B2 (en) 2018-05-04 2020-06-30 Qualcomm Incorporated System and method for capture and distribution of information collected from signs
DK180078B1 (en) 2018-05-07 2020-03-31 Apple Inc. USER INTERFACE FOR AVATAR CREATION
US12033296B2 (en) 2018-05-07 2024-07-09 Apple Inc. Avatar creation user interface
CN108764462A (zh) * 2018-05-29 2018-11-06 成都视观天下科技有限公司 一种基于知识蒸馏的卷积神经网络优化方法
US11170085B2 (en) 2018-06-03 2021-11-09 Apple Inc. Implementation of biometric authentication
JP6989450B2 (ja) * 2018-06-21 2022-01-05 株式会社東芝 画像解析装置、画像解析方法及びプログラム
JP7021036B2 (ja) * 2018-09-18 2022-02-16 株式会社東芝 電子機器及び通知方法
US11100349B2 (en) 2018-09-28 2021-08-24 Apple Inc. Audio assisted enrollment
US10860096B2 (en) 2018-09-28 2020-12-08 Apple Inc. Device control using gaze information
US10346541B1 (en) * 2018-10-05 2019-07-09 Capital One Services, Llc Typifying emotional indicators for digital messaging
US11107261B2 (en) 2019-01-18 2021-08-31 Apple Inc. Virtual avatar animation based on facial feature movement
KR102747289B1 (ko) * 2019-01-18 2024-12-31 삼성전자주식회사 전자 장치 및 이의 제어 방법
US11458040B2 (en) 2019-01-23 2022-10-04 Meta Platforms Technologies, Llc Corneal topography mapping with dense illumination
CN110246001B (zh) * 2019-04-24 2023-04-07 维沃移动通信有限公司 一种图像显示方法及终端设备
US11354546B2 (en) 2019-05-03 2022-06-07 Verily Life Sciences Llc Insect singulation and classification
SG11202112149QA (en) * 2019-05-03 2021-11-29 Verily Life Sciences Llc Predictive classification of insects
DK201970531A1 (en) 2019-05-06 2021-07-09 Apple Inc Avatar integration with multiple applications
AU2020329148A1 (en) * 2019-08-09 2022-03-17 Clearview Ai, Inc. Methods for providing information about a person based on facial recognition
JP7244390B2 (ja) * 2019-08-22 2023-03-22 株式会社ソニー・インタラクティブエンタテインメント 情報処理装置、情報処理方法およびプログラム
KR102086600B1 (ko) * 2019-09-02 2020-03-09 브이에이스 주식회사 상품 구매 정보 제공 장치 및 방법
KR20210065698A (ko) * 2019-11-27 2021-06-04 삼성전자주식회사 전자 장치 및 이의 제어 방법
KR20210066291A (ko) * 2019-11-28 2021-06-07 주식회사 피제이팩토리 멀티 뎁스 이미지 생성 방법 및 이를 위한 프로그램을 기록한 기록매체
US11538238B2 (en) * 2020-01-10 2022-12-27 Mujin, Inc. Method and system for performing image classification for object recognition
CN113111899A (zh) * 2020-01-10 2021-07-13 牧今科技 基于图像分类的物体识别或物体注册的方法及计算系统
CN111402928B (zh) * 2020-03-04 2022-06-14 华南理工大学 基于注意力的语音情绪状态评估方法、装置、介质及设备
JP7454965B2 (ja) 2020-03-11 2024-03-25 本田技研工業株式会社 情報処理装置、情報処理システムおよび情報処理方法
CN113449548B (zh) * 2020-03-24 2025-07-01 华为技术有限公司 更新物体识别模型的方法和装置
US11537701B2 (en) * 2020-04-01 2022-12-27 Toyota Motor North America, Inc. Transport related n-factor authentication
CN113837172A (zh) * 2020-06-08 2021-12-24 同方威视科技江苏有限公司 货物图像局部区域处理方法、装置、设备及存储介质
JP6932821B1 (ja) * 2020-07-03 2021-09-08 株式会社ベガコーポレーション 情報処理システム、方法及びプログラム
KR20220018760A (ko) * 2020-08-07 2022-02-15 삼성전자주식회사 단말에 3d 캐릭터 이미지를 제공하는 엣지 데이터 네트워크 및 그 동작 방법
JP2022045248A (ja) * 2020-09-08 2022-03-18 株式会社日立ソリューションズ 対象機器利用申請承認システムおよび方法、コンピュータプログラム
WO2022070337A1 (ja) * 2020-09-30 2022-04-07 日本電気株式会社 情報処理装置、ユーザ端末、制御方法、非一時的なコンピュータ可読媒体、及び情報処理システム
US12361296B2 (en) 2020-11-24 2025-07-15 International Business Machines Corporation Environment augmentation based on individualized knowledge graphs
US11488371B2 (en) * 2020-12-17 2022-11-01 Concat Systems, Inc. Machine learning artificial intelligence system for producing 360 virtual representation of an object
JP7596775B2 (ja) * 2020-12-22 2024-12-10 株式会社Jvcケンウッド 勤怠管理システム
EP4264460B1 (en) 2021-01-25 2025-12-24 Apple Inc. Implementation of biometric authentication
US11546669B2 (en) 2021-03-10 2023-01-03 Sony Interactive Entertainment LLC Systems and methods for stream viewing with experts
US11553255B2 (en) 2021-03-10 2023-01-10 Sony Interactive Entertainment LLC Systems and methods for real time fact checking during stream viewing
JP7641799B2 (ja) * 2021-03-30 2025-03-07 本田技研工業株式会社 情報処理装置、移動体の制御装置、情報処理装置の制御方法、移動体の制御方法、及びプログラム
CN115242569B (zh) * 2021-04-23 2023-12-05 海信集团控股股份有限公司 智能家居中的人机交互方法和服务器
US12216754B2 (en) 2021-05-10 2025-02-04 Apple Inc. User interfaces for authenticating to perform secure operations
US11985246B2 (en) * 2021-06-16 2024-05-14 Meta Platforms, Inc. Systems and methods for protecting identity metrics
CN113891046B (zh) * 2021-09-29 2023-05-02 重庆电子工程职业学院 一种无线视频监控系统及方法
CN113989245B (zh) * 2021-10-28 2023-01-24 杭州中科睿鉴科技有限公司 多视角多尺度图像篡改检测方法
JP7377565B2 (ja) * 2022-01-05 2023-11-10 キャディ株式会社 図面検索装置、図面データベース構築装置、図面検索システム、図面検索方法、及びプログラム
JP2024091181A (ja) * 2022-12-23 2024-07-04 富士通株式会社 情報処理プログラム、情報処理方法および情報処理装置
CN115993365B (zh) * 2023-03-23 2023-06-13 山东省科学院激光研究所 一种基于深度学习的皮带缺陷检测方法及系统
JP7794172B2 (ja) * 2023-05-19 2026-01-06 トヨタ自動車株式会社 移動物体の追跡システム
JP7794171B2 (ja) * 2023-05-19 2026-01-06 トヨタ自動車株式会社 移動物体の追跡システム
US12293301B2 (en) 2023-07-03 2025-05-06 Red Atlas Inc. Systems and methods for developing a knowledge base comprised of multi-modal data from myriad sources
JP7758820B2 (ja) * 2023-09-19 2025-10-22 ソフトバンクグループ株式会社 システム
CN117610105B (zh) * 2023-12-07 2024-06-07 上海烜翊科技有限公司 一种面向系统设计结果自动生成的模型视图结构设计方法
CN117389745B (zh) * 2023-12-08 2024-05-03 荣耀终端有限公司 一种数据处理方法、电子设备及存储介质
JP7839520B1 (ja) * 2025-05-14 2026-04-02 健斗 笹埜 情報処理装置、情報処理方法および情報処理プログラム

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0512246A (ja) 1991-07-04 1993-01-22 Nec Corp 音声文書作成装置
JP2008278088A (ja) * 2007-04-27 2008-11-13 Hitachi Ltd 動画コンテンツに関するコメント管理装置
JP2009020264A (ja) 2007-07-11 2009-01-29 Hitachi Ltd 音声合成装置及び音声合成方法並びにプログラム
JP2009077443A (ja) 2006-12-11 2009-04-09 Dowango:Kk コメント配信システム、端末装置、コメント配信方法、及びプログラム
JP2009265754A (ja) 2008-04-22 2009-11-12 Ntt Docomo Inc 情報提供装置、情報提供方法及び情報提供プログラム
WO2011004608A1 (ja) * 2009-07-09 2011-01-13 頓智ドット株式会社 視界情報に仮想情報を付加して表示できるシステム

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6317039B1 (en) * 1998-10-19 2001-11-13 John A. Thomason Wireless video audio data remote system
US20040034784A1 (en) * 2002-08-15 2004-02-19 Fedronic Dominique Louis Joseph System and method to facilitate separate cardholder and system access to resources controlled by a smart card
JP2005196481A (ja) * 2004-01-07 2005-07-21 Fuji Xerox Co Ltd 画像形成装置、画像形成方法、およびプログラム
JP2005223499A (ja) * 2004-02-04 2005-08-18 Hitachi Ltd 情報処理装置
US7725484B2 (en) * 2005-11-18 2010-05-25 University Of Kentucky Research Foundation (Ukrf) Scalable object recognition using hierarchical quantization with a vocabulary tree
US20080147730A1 (en) * 2006-12-18 2008-06-19 Motorola, Inc. Method and system for providing location-specific image information
JP4673862B2 (ja) * 2007-03-02 2011-04-20 株式会社ドワンゴ コメント配信システム、コメント配信サーバ、端末装置、コメント配信方法、及びプログラム
US20100211576A1 (en) * 2009-02-18 2010-08-19 Johnson J R Method And System For Similarity Matching
JP2011137638A (ja) * 2009-12-25 2011-07-14 Toshiba Corp ナビゲーションシステム、観光スポット検出装置、ナビゲーション装置、観光スポット検出方法、ナビゲーション方法、観光スポット検出プログラム及びナビゲーションプログラム
JP5828456B2 (ja) * 2009-12-28 2015-12-09 サイバーアイ・エンタテインメント株式会社 コメント付与及び配信システム、及び端末装置
US20130021448A1 (en) * 2011-02-24 2013-01-24 Multiple Interocular 3-D, L.L.C. Stereoscopic three-dimensional camera rigs

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0512246A (ja) 1991-07-04 1993-01-22 Nec Corp 音声文書作成装置
JP2009077443A (ja) 2006-12-11 2009-04-09 Dowango:Kk コメント配信システム、端末装置、コメント配信方法、及びプログラム
JP2008278088A (ja) * 2007-04-27 2008-11-13 Hitachi Ltd 動画コンテンツに関するコメント管理装置
JP2009020264A (ja) 2007-07-11 2009-01-29 Hitachi Ltd 音声合成装置及び音声合成方法並びにプログラム
JP2009265754A (ja) 2008-04-22 2009-11-12 Ntt Docomo Inc 情報提供装置、情報提供方法及び情報提供プログラム
WO2011004608A1 (ja) * 2009-07-09 2011-01-13 頓智ドット株式会社 視界情報に仮想情報を付加して表示できるシステム

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
AKSHAY JAVA; XIAODAN SONG; TIM FININ; BELLE TSENG: "Why We Twitter: Understanding Microblogging Usage and Communities", JOINT 9TH WEBKDD AND 1ST SNA-KDD WORKSHOP '07, 2007
DAVID G. LOWE: "Object Recognition from Local Scale-Invariant Features", PROC. IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, 1999, pages 1150 - 1157
G. CSURKA; C. BRAY; C. DANCE; L. FAN: "Visual categorization with bags of keypoints", PROC. ECCV WORKSHOP ON STATISTICAL LEARNING IN COMPUTER VISION, 2004, pages 1 - 22
J. SIVIC; A. ZISSERMAN: "Video google: A text retrieval approach to object matching in videos", PROC. ICCV2003, vol. 2, 2003, pages 1470 - 1477, XP055277077, DOI: doi:10.1109/ICCV.2003.1238663
KEIJI YANAI ET AL.: "The Current State and Future Directions on Generic Object Recognition", TRANSACTIONS OF INFORMATION PROCESSING SOCIETY OF JAPAN, vol. 48, 15 November 2007 (2007-11-15), pages 1 - 24, XP055148832 *
KEIJI YANAI: "The Current State and Future Directions on Generic Object Recognition", INFORMATION PROCESSING SOCIETY JOURNAL, vol. 48, no. SIG 16, 2007, pages 1 - 24, XP055148832
MING ZHAO; JAY YAGNIK; HARTWIG ADAM; DAVID BAU: "FG '08:8th IEEE International Conference on Automatic Face & Gesture Recognition", 2008, GOOGLE INC., article "Large scale learning and recognition of faces in web videos"
PINAR DUYGULU; KOBUS BARNARD; NANDO DE FREITAS; DAVID FORSYTH: "Object Recognition as Machine Translation: Learning a lexicon for a fixed image vocabulary", EUROPEAN CONFERENCE ON COMPUTER VISION (ECCV, 2002, pages 97 - 112
R. FERGUS; P. PERONA; A. ZISSERMAN: "Object Class Recognition by Unsupervised Scale-invariant Learning", IEEE CONF. ON COMPUTER VISION AND PATTERN RECOGNITION, 2003, pages 264 - 271, XP010644682, DOI: doi:10.1109/CVPR.2003.1211479
See also references of EP2767907A4

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015118496A (ja) * 2013-12-18 2015-06-25 株式会社日本総合研究所 カタログ出力装置、カタログ出力方法、およびプログラム
WO2015130383A3 (en) * 2013-12-31 2015-12-10 Microsoft Technology Licensing, Llc Biometric identification system
JP2016211955A (ja) * 2015-05-08 2016-12-15 古河電気工業株式会社 橋梁点検支援装置、橋梁点検支援方法、橋梁点検支援システム、およびプログラム
US10887125B2 (en) 2017-09-15 2021-01-05 Kohler Co. Bathroom speaker
US10663938B2 (en) 2017-09-15 2020-05-26 Kohler Co. Power operation of intelligent devices
US10448762B2 (en) 2017-09-15 2019-10-22 Kohler Co. Mirror
US11099540B2 (en) 2017-09-15 2021-08-24 Kohler Co. User identity in household appliances
US11314215B2 (en) 2017-09-15 2022-04-26 Kohler Co. Apparatus controlling bathroom appliance lighting based on user identity
US11314214B2 (en) 2017-09-15 2022-04-26 Kohler Co. Geographic analysis of water conditions
US11892811B2 (en) 2017-09-15 2024-02-06 Kohler Co. Geographic analysis of water conditions
US11921794B2 (en) 2017-09-15 2024-03-05 Kohler Co. Feedback for water consuming appliance
US11949533B2 (en) 2017-09-15 2024-04-02 Kohler Co. Sink device
US12135535B2 (en) 2017-09-15 2024-11-05 Kohler Co. User identity in household appliances
WO2020188626A1 (ja) * 2019-03-15 2020-09-24 和夫 金子 視覚支援装置
CN116246322A (zh) * 2023-02-16 2023-06-09 支付宝(杭州)信息技术有限公司 一种活体防攻击方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
EP2767907A4 (en) 2015-07-01
EP2767907A1 (en) 2014-08-20
JP2013088906A (ja) 2013-05-13
JP5866728B2 (ja) 2016-02-17
US20140289323A1 (en) 2014-09-25

Similar Documents

Publication Publication Date Title
JP5866728B2 (ja) 画像認識システムを備えた知識情報処理サーバシステム
US11637797B2 (en) Automated image processing and content curation
CN114270257B (zh) 带有激光投影系统的可穿戴多媒体设备和云计算平台
KR102354428B1 (ko) 이미지를 분석하기 위한 웨어러블기기 및 방법
KR101992424B1 (ko) 증강현실용 인공지능 캐릭터의 제작 장치 및 이를 이용한 서비스 시스템
KR101832693B1 (ko) 직관적 컴퓨팅 방법들 및 시스템들
RU2408067C2 (ru) Идентификация медиаданных
US20150193507A1 (en) Emotion-related query processing
TW202301080A (zh) 輔助系統的多裝置調解
JP2010224715A (ja) 画像表示システム、デジタルフォトフレーム、情報処理システム、プログラム及び情報記憶媒体
US12271982B2 (en) Generating modified user content that includes additional text content
US20210398539A1 (en) Systems and methods for processing audio and video
CN110111795A (zh) 一种语音处理方法及终端设备
JP2015104078A (ja) 撮像装置、撮像システム、サーバ、撮像方法、及び撮像プログラム
US11928167B2 (en) Determining classification recommendations for user content
US20250218087A1 (en) Generating modified user content that includes additional text content
CN119415712A (zh) 基于ivvr的融媒内容分享方法、设备、介质及产品
US12020709B2 (en) Wearable systems and methods for processing audio and video based on information from multiple individuals
Berger et al. Mobile AR Solution for Deaf People: Correlation Between Face Detection and Speech Recognition
KR20230163045A (ko) 메타버스 환경에서 수집된 멀티미디어의 리소스 변환 매칭을 이용한 영상 콘텐츠 제작 서비스 제공 방법 및 기록매체
TWI880590B (zh) 以影音實境觸發智能對話的方法與系統
JP7839520B1 (ja) 情報処理装置、情報処理方法および情報処理プログラム
US20260075014A1 (en) Artificial intelligence-based system and method for generating and recommending personalized graphics for messaging applications
CN121963704A (zh) eVTOL的语音交互方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12840365

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14351484

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2012840365

Country of ref document: EP