US20230410221A1 - Information processing apparatus, control method, and program - Google Patents

Information processing apparatus, control method, and program Download PDF

Info

Publication number
US20230410221A1
US20230410221A1 US18/240,160 US202318240160A US2023410221A1 US 20230410221 A1 US20230410221 A1 US 20230410221A1 US 202318240160 A US202318240160 A US 202318240160A US 2023410221 A1 US2023410221 A1 US 2023410221A1
Authority
US
United States
Prior art keywords
relevant
account
similar
content data
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US18/240,160
Inventor
Masahiro Tani
Kazufumi KOJIMA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to US18/240,160 priority Critical patent/US20230410221A1/en
Publication of US20230410221A1 publication Critical patent/US20230410221A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/758Involving statistics of pixels or of feature values, e.g. histogram matching
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques

Definitions

  • the present invention relates to a user account.
  • Some services such as a social networking service (SNS) provide an environment in which a user can take various types of actions by using a user account. For example, a picture, a moving image, or a text message can be uploaded in association with a user account.
  • SNS social networking service
  • NPL 1 discloses a technique for determining whether a plurality of user accounts are owned by the same person, based on a similarity degree among user names of a plurality of the user accounts.
  • a user name registered in a user account can be any name unrelated to a real name of a user.
  • a person who creates a plurality of user accounts can set user names registered in the user accounts to be not similar to each other. Then, in the technique in NPL 1, it is difficult to determine that a plurality of user accounts in which user names being not similar to each other in such a manner are registered are owned by the same person.
  • the invention of the present application has been made in view of the above-described problem, and is to provide a technique capable of accurately detecting whether user accounts being compared are owned by the same person even when user names of the user accounts are not similar to each other.
  • An information processing apparatus includes 1) a determination unit that determines, for a first relevant account associated with a first target account and a second relevant account associated with a second target account, whether first content data associated with the first relevant account and second content data associated with the second relevant account are similar, and 2) a processing execution unit that executes predetermined processing when it is determined that the first content data and the second content data are similar.
  • a control method is executed by a computer.
  • the control method includes 1) a determination step of determining, for a first relevant account associated with a first target account and a second relevant account associated with a second target account, whether first content data associated with the first relevant account and second content data associated with the second relevant account are similar, and 2) a processing execution step of executing predetermined processing when it is determined that the first content data and the second content data are similar.
  • a program according to the present invention causes a computer to execute each step included in the control method according to the present invention.
  • the present invention provides a technique capable of accurately detecting whether user accounts being compared are owned by the same person even when user names of the user accounts are not similar to each other.
  • FIG. 1 is a diagram schematically illustrating processing executed by an information processing apparatus according to a present example embodiment.
  • FIG. 2 is a diagram illustrating a functional configuration of an information processing apparatus according to an example embodiment 1.
  • FIG. 3 is a diagram illustrating a computer for achieving the information processing apparatus.
  • FIG. 4 is a flowchart illustrating a flow of processing executed by the information processing apparatus according to the example embodiment 1.
  • FIG. 5 is a diagram illustrating a histogram generated for a relevant account.
  • FIG. 6 is a diagram illustrating a histogram of a topic.
  • FIG. 7 is a diagram illustrating a histogram of a frequency of appearance of a keyword.
  • FIG. 8 is a diagram illustrating a histogram of a frequency of appearance of a speaker.
  • FIG. 9 is a diagram illustrating a notification displayed on a display apparatus.
  • each block diagram represents a configuration of a functional unit instead of a configuration of a hardware unit unless otherwise described.
  • FIG. 1 is a diagram schematically illustrating processing executed by an information processing apparatus 2000 according to the present example embodiment.
  • the information processing apparatus 2000 infers whether owners having user accounts different from each other are the same person.
  • user information being information related to a user himself/herself and information (hereinafter, a content) such as image data and text data being registered in association with a user account are associated with the account.
  • the user information is, for example, a name, an address, a phone number, an e-mail address, or the like.
  • the information processing apparatus 2000 infers user accounts different from each other being owned by the same person by using a content associated with another user account associated with a user account.
  • an account to be determined whether to be owned by the same person is expressed as a target account, and another account associated with the target account is referred to as a relevant account.
  • a relevant account For example, in the SNS, a function of associating user accounts with each other as friends is often provided.
  • an account associated as a friend of a target account is used as a relevant account. Note that which account is handled as a target account will be described below.
  • the information processing apparatus 2000 determines, for two target accounts that are a target account 10 - 1 and a target account 10 - 2 , whether the target accounts are accounts owned by the same person.
  • a target account 10 - 1 a plurality of relevant accounts 20 are present.
  • the relevant account 20 associated with the target account 10 - 1 is expressed as a relevant account 20 - 1 .
  • a content associated with the relevant account 20 - 1 is expressed as a content 30 - 1 .
  • the content 30 - 1 is image data uploaded in association with the relevant account 20 - 1 , and the like.
  • a relevant account of the target account 10 - 2 is expressed as a relevant account 20 - 2
  • a content associated with the relevant account 20 - 2 is expressed as a content 30 - 2
  • the “content 30 associated with the relevant account 20 ” is also simply expressed as the “content 30 of the relevant account 20 ”.
  • the information processing apparatus 2000 determines whether the content 30 - 1 of the relevant account 20 - 1 and the content 30 - 2 of the relevant account 20 - 2 are similar. When the content 30 - 1 and the content 30 - 2 are similar, the target account 10 - 1 and the target account 10 - 2 can be inferred to belong to the same person. Thus, when the content 30 - 1 and the content 30 - 2 are similar, the information processing apparatus 2000 executes predetermined processing related to the target account 10 - 1 and the target account 10 - 2 . For example, the information processing apparatus 2000 outputs, as the predetermined processing, a notification indicating that the target account 10 - 1 and the target account 10 - 2 belong to the same person.
  • the information processing apparatus 2000 determines a similarity degree between the content 30 - 1 of the relevant account 20 - 1 associated with the target account 10 - 1 and the content 30 - 2 of the relevant account 20 - 2 associated with the target account 10 - 2 .
  • the similarity degree is high, the target account 10 - 1 and the target account 10 - 2 can be inferred to be owned by the same person. The reason will be described below.
  • the relevant account 20 - 1 associated with the target account 10 - 1 conceivably belongs to a person who has some sort of connection with an owner of the target account 10 - 1 , such as a friend of the owner of the target account 10 - 1 , for example.
  • an owner of the target account 10 - 1 such as a friend of the owner of the target account 10 - 1 , for example.
  • a content including some sort of information related to the target account 10 - 1 is present among the contents 30 - 1 uploaded and the like in association with the relevant accounts 20 - 1 by owners of the relevant accounts 20 - 1 .
  • a picture and a moving image uploaded by the relevant account 20 - 1 include the owner of the target account 10 - 1 , property (such as a vehicle) of the owner of the target account 10 - 1 , a landmark representing a place where the target account 10 - 1 has visited, and the like.
  • property such as a vehicle
  • text data and voice data uploaded by the relevant account 20 - 1 also include some sort of information related to the target account 10 - 1 .
  • the information processing apparatus 2000 infers that there is a high probability that the owner of the target account 10 - 1 and an owner of the target account 10 - 2 are the same person. In this way, even when it is not clear whether the target account 10 - 1 and the target account 10 - 2 are owned by the same person just by comparing the user information of the target account 10 - 1 with the user information of the target account 10 - 2 , whether the target account 10 - 1 and the target account 10 - 2 are accounts owned by the same person can be inferred.
  • FIG. 2 is a diagram illustrating a functional configuration of the information processing apparatus 2000 according to the example embodiment 1.
  • the information processing apparatus 2000 includes a determination unit 2020 and a processing execution unit 2040 .
  • the determination unit 2020 determines whether the content 30 - 1 of the relevant account 20 - 1 associated with the target account 10 - 1 and the content 30 - 2 of the relevant account 20 - 2 associated with the target account 10 - 2 are similar.
  • the processing execution unit 2040 executes predetermined processing related to the target account 10 - 1 and the target account 10 - 2 .
  • Each functional component unit of the information processing apparatus 2000 may be achieved by hardware (for example, a hard-wired electronic circuit and the like) that achieves each functional component unit, and may be achieved by a combination (for example, a combination of an electronic circuit and a program that controls the electronic circuit, and the like) of hardware and software.
  • hardware for example, a hard-wired electronic circuit and the like
  • a combination for example, a combination of an electronic circuit and a program that controls the electronic circuit, and the like
  • FIG. 3 is a diagram illustrating a computer 1000 for achieving the information processing apparatus 2000 .
  • the computer 1000 is any computer.
  • the computer 1000 is a personal computer (PC), a server machine, or the like.
  • the computer 1000 may be a dedicated computer designed for achieving the information processing apparatus 2000 , and may be a general-purpose computer.
  • the computer 1000 includes a bus 1020 , a processor 1040 , a memory 1060 , a storage device 1080 , an input/output interface 1100 , and a network interface 1120 .
  • the bus 1020 is a data transmission path for allowing the processor 1040 , the memory 1060 , the storage device 1080 , the input/output interface 1100 , and the network interface 1120 to transmit and receive data with one another.
  • a method of connecting the processor 1040 and the like to each other is not limited to a bus connection.
  • the processor 1040 is various types of processors such as a central processing unit (CPU), a graphics processing unit (GPU), and a field-programmable gate array (FPGA).
  • the memory 1060 is a main storage achieved by using a random access memory (RAM) and the like.
  • the storage device 1080 is an auxiliary storages achieved by using a hard disk, a solid state drive (SSD), a memory card, a read only memory (ROM), or the like.
  • the input/output interface 1100 is an interface for connecting the computer 1000 and an input/output device.
  • an input apparatus such as a keyboard and an output apparatus such as a display apparatus are connected to the input/output interface 1100 .
  • the network interface 1120 is an interface for connecting the computer 1000 to a communication network.
  • the communication network is, for example, a local area network (LAN) and a wide area network (WAN).
  • a method of connection to the communication network by the network interface 1120 may be a wireless connection or a wired connection.
  • the storage device 1080 stores a program module that achieves each functional component unit of the information processing apparatus 2000 .
  • the processor 1040 achieves a function associated with each program module by reading each of the program modules to the memory 1060 and executing the read program module.
  • FIG. 4 is a flowchart illustrating a flow of processing executed by the information processing apparatus 2000 according to the example embodiment 1.
  • the determination unit 2020 acquires the content 30 - 1 of each of the relevant accounts 20 - 1 associated with the target account 10 - 1 (S 102 ).
  • the determination unit 2020 acquires the content 30 - 2 of each of the relevant accounts 20 - 2 associated with the target account 10 - 2 (S 104 ).
  • the determination unit 2020 determines whether the content 30 - 1 and the content 30 - 2 are similar (S 106 ). When the content 30 - 1 and the content 30 - 2 are similar (S 106 : YES), the processing execution unit 2040 executes predetermined processing (S 108 ). On the other hand, when the content 30 - 1 and the content 30 - 2 are not similar (S 106 : NO), the processing in FIG. 4 ends.
  • the target account 10 and the relevant account 20 are user accounts created by a user in a service such as the SNS, for example.
  • a user account is created by registering user information such as a name, and is continuously used.
  • a user account handled by the information processing apparatus 2000 is not limited to a user account created by registering user information in such a manner. For example, on a bulletin board and the like on a Web page, when a user posts (uploads text data, and the like) a content, an identifier is assigned to the post. The information processing apparatus 2000 may handle the identifier as a user account. In this case, for example, when a certain user posts a content on a bulletin board site and another user comments on the post, one of the former and the latter can be handled as the target account 10 and the other can be handled as the relevant account 20 .
  • the information processing apparatus 2000 infers, for two accounts of the target account 10 - 1 and the target account 10 - 2 , whether the accounts belong to the same person.
  • the target account 10 - 1 and the target account 10 - 2 may be user accounts for using the same service (for example, the SNS), or may be user accounts for using services different from each other.
  • the information processing apparatus 2000 receives a specification of a user account handled as the target account 10 from a user of the information processing apparatus 2000 .
  • the user account specified by a user may be two, or may be three or more.
  • the information processing apparatus 2000 executes, for each combination (n ⁇ 2 combination) of any two user accounts creatable for the specified user accounts, processing handling two user accounts included in the combination as the target accounts 10 .
  • processing handling A and B as the target accounts 10
  • processing handling A and C as the target accounts 10
  • processing handling B and C as the target accounts 10
  • processing handling B and C as the target accounts 10 are each executed.
  • the information processing apparatus 2000 receives, from a user, an input that specifies one user account handled as the target account 10 .
  • the information processing apparatus 2000 handles the user account specified by a user as the target account 10 - 1 .
  • the information processing apparatus 2000 handles, as the target account 10 - 2 , another user account having user information similar to user information of the target account 10 - 1 .
  • the similarity between pieces of user information herein refers to, for example, a part of various pieces of information (a part of a user ID, a part of a name, a part of a birth date, a part of an e-mail address, or the like) being common.
  • the information processing apparatus 2000 handles each of the plurality of user accounts as the target account 10 - 2 .
  • the information processing apparatus 2000 may operate in cooperation with a monitoring system for monitoring a user account, and receive a specification of a user account from the monitoring system. For example, the monitoring system monitors a usage aspect (such as a content of an uploaded content and a frequency of uploading) of a user account, and determines a user account whose usage aspect violates common sense, a user policy of a service, law, or the like (that is, determines a user account to beware of). The monitoring system notifies the determined user account to the information processing apparatus 2000 . The information processing apparatus 2000 executes, for each combination of any two user accounts creatable for the plurality of user accounts notified from the monitoring system, processing handling the two user accounts included in the combination as the target accounts 10 . Note that, when the monitoring system notifies user accounts one by one, the information processing apparatus 2000 executes the above-described processing on a plurality of user accounts indicated by a plurality of notifications received during a predetermined period of time, for example.
  • a usage aspect such as a content of an uploaded
  • the relevant account 20 is another account associated with the target account 10 , and is an account in a friendship with the target account 10 in the SNS, for example.
  • the determination unit 2020 may acquire the content 30 for all of the relevant accounts 20 , and may acquire the content 30 for some of the relevant accounts 20 .
  • the determination unit 2020 arbitrarily (for example, randomly) selects a predetermined number of the relevant accounts 20 from the plurality of relevant accounts 20 , for example.
  • the determination unit 2020 acquires the content 30 - 1 associated with the relevant account 20 - 1 and the content 30 - 2 associated with the relevant account 20 - 2 (S 102 and S 104 ). For example, the determination unit 2020 automatically collects, for each of the relevant accounts 20 , each of the contents 30 from Web pages on which the contents 30 of the relevant accounts 20 are opened, by successively accessing the Web pages.
  • an application programming interface (API) for acquiring a content associated with a user account may be provided in a service such as the SNS.
  • the determination unit 2020 may acquire the content 30 of the relevant account 20 by using the API provided in a service used by the relevant account 20 .
  • the determination unit 2020 may acquire all of the contents 30 associated with the relevant account 20 , and may acquire only the content 30 of a predetermined type. For example, when a target of a similarity determination is only image data, the determination unit 2020 acquires image data associated with the relevant account 20 as the content 30 .
  • the determination unit 2020 compares content data of the relevant account 20 - 1 with content data of the relevant account 20 - 2 , and infers that, when a similarity degree between the pieces of the content data is high, the target account 10 - 1 and the target account 10 - 2 are owned by the same person.
  • the processing may adopt various variations in points that 1) what kind of content data is to be compared and 2) what kind of comparison is performed. Hereinafter, a comparison between pieces of content data will be described while focusing on the two points.
  • Image data are conceivable as a type of the content data to be compared.
  • image data of a picture of a person, a building, scenery, or the like are uploaded by using a user account.
  • the determination unit 2020 handles image data uploaded by using a user account in such a manner as a content associated with the user account.
  • a user may make a post that refers to (links) a Web page including image data, and make a post that refers to image data uploaded by another user.
  • the determination unit 2020 may also handle image data referred by a user in such a manner as content data associated with an account of the user. Note that a moving image frame constituting moving image data is also included in image data.
  • image data has an advantage that similarity between the content 30 - 1 and the content 30 - 2 is easily determined even when a language used in the relevant account 20 - 1 is different from a language used in the relevant account 20 - 2 .
  • image data has an advantage that similarity between the content 30 - 1 and the content 30 - 2 is easily determined even when a language used in the relevant account 20 - 1 is different from a language used in the relevant account 20 - 2 .
  • the determination unit 2020 focuses on a similarity degree between an object detected from image data associated with the relevant account 20 - 1 and an object detected from image data associated with the relevant account 20 - 2 .
  • the determination unit 2020 calculates the similarity degree between the object detected from the image data associated with the relevant account 20 - 1 and the object detected from the image data associated with the relevant account 20 - 2 . Then, when the number of groups (namely, groups of objects inferred to be the same) of objects having a similarity degree equal to or more than a predetermined value is equal to or more than a predetermined number, the determination unit 2020 determines that the similarity degree between the content data of the relevant account 20 - 1 and the content data of the relevant account 20 - 2 is high.
  • the determination unit 2020 determines that the similarity degree between the content data of the relevant account 20 - 1 and the content data of the relevant account 20 - 2 is not high.
  • the predetermined number described above is previously stored in a storage apparatus that can be accessed from the determination unit 2020 .
  • an object detected from image data 32 may be an object of any kind, and may be an object of a specific kind. In a case of the latter, for example, only a person among objects included in the image data 32 is to be detected.
  • an existing technique can be used as a technique for detecting an object from image data and a technique for determining a similarity degree between detected objects.
  • the determination unit 2020 generates, for each of the relevant account 20 - 1 and the relevant account 20 - 2 , a histogram representing a distribution of a frequency of appearance of an object in image data associated thereto, and determines a similarity degree between the histograms.
  • FIG. 5 is a diagram illustrating a histogram generated for the relevant account 20 .
  • a plurality of pieces of image data 32 are associated with the relevant account 20 .
  • a histogram 40 is a distribution of a frequency of appearance of an object detected from the image data 32 .
  • the image data 32 associated with the relevant account 20 - 1 are expressed as image data 32 - 1
  • the histogram 40 generated for the image data 32 - 1 is expressed as a histogram 40 - 1
  • the image data 32 associated with the relevant account 20 - 2 are expressed as image data 32 - 2
  • the histogram 40 generated for the image data 32 - 2 is expressed as a histogram 40 - 2 .
  • the determination unit 2020 determines a similarity degree between the histogram 40 - 1 and the histogram 40 - 2 .
  • the determination unit 2020 calculates the similarity degree between the histogram 40 - 1 and the histogram 40 - 2 , and, when the calculated similarity degree is equal to or more than a predetermined value, the determination unit 2020 determines that a similarity degree between the content 30 - 1 and the content 30 - 2 is high.
  • the similarity degree between the histogram 40 - 1 and the histogram 40 - 2 is less than the predetermined value, the determination unit 2020 determines that the similarity degree between the content 30 - 1 and the content 30 - 2 is not high.
  • an existing technique can be used as a technique for calculating a similarity degree between two histograms.
  • the predetermined value described above is stored in a storage apparatus that can be accessed from the determination unit 2020 .
  • the histogram 40 - 1 and the histogram 40 - 2 are generated as follows, for example.
  • the determination unit 2020 recognizes an object included in each piece of the image data 32 - 1 by performing object recognition processing on each piece of the image data 32 - 1 as a target. Furthermore, the determination unit 2020 generates the histogram 40 - 1 representing a distribution of a frequency of appearance of an object by counting the number of appearances of each object.
  • the determination unit 2020 assigns an identifier to each object detected from the image data 32 - 1 .
  • the determination unit 2020 makes each object identifiable by assigning the same identifier to the same object, and can thus count the number of appearances of the object.
  • a determination (identification of an object) of whether each object detected from the image data 32 is the same is needed.
  • the determination unit 2020 assigns an identifier to an object detected from the image data 32 , and the object is the same as another object being already detected
  • the determination unit 2020 assigns the same identifier as an identifier assigned to the object being already detected.
  • the determination unit 2020 assigns a new identifier that is not assigned to any object.
  • the determination unit 2020 generates the histogram 40 - 2 by also performing similar processing on the image data 32 - 2 . At this time, for an object detected from the image data 32 - 2 , not only identification with an object detected from the other piece of image data 32 - 2 but also identification with an object detected from the image data 32 - 1 are performed. In other words, when the same object as an object detected from the image data 32 - 2 is already detected from the image data 32 - 1 , the determination unit 2020 also assigns, to the object detected from the image data 32 - 2 , an identifier assigned to the object being already detected. Various types of existing techniques can be used for identification of an object.
  • a comparison between the histogram 40 - 1 and the histogram 40 - 2 may be performed by using only a part of the histogram 40 - 1 and a part of the histogram 40 - 2 .
  • the determination unit 2020 calculates a similarity degree between the histogram 40 - 1 and the histogram 40 - 2 by comparing a frequency of appearance of objects in top N places (N is a natural number of two or more) in the histogram 40 - 1 with a frequency of appearance of objects in top N places in the histogram 40 - 2 .
  • a comparison related to image data may be achieved by a comparison between topics of the image data instead of a comparison between objects detected from the image data.
  • a topic in a certain piece of data refers to a main matter or event expressed by the data.
  • a topic such as work, food, sports, traveling, games, or politics is conceivable.
  • the determination unit 2020 classifies each piece of the image data 32 associated with the relevant account 20 by topic.
  • an existing technique can be used as a technique for classifying image data by topic.
  • the determination unit 2020 generates a histogram of a frequency of appearance of a topic for each of the image data 32 - 1 and the image data 32 - 2 .
  • FIG. 6 is a diagram illustrating a histogram of a topic.
  • the determination unit 2020 determines that the similarity degree between the content 30 - 1 and the content 30 - 2 is not high.
  • the determination unit 2020 may perform a comparison similar to the above-described comparison related to the image data 32 on text data associated with the relevant account 20 .
  • text data representing information such as a thought of a user and a recent state of a user are uploaded in association with a user account.
  • the determination unit 2020 handles, for example, text data uploaded by a user in such a manner as the content 30 .
  • a user may also make a post that refers to a Web page, a post that refers to text data uploaded by another user, a post of a comment on a content of another user, and the like.
  • the determination unit 2020 may also handle, as content data associated with an account of the user, the text data included in the Web page referred by the user in such a manner, the text data uploaded by the another user, and the text data representing the comment on the content of the another user.
  • content data associated with an account of the user the text data included in the Web page referred by the user in such a manner, the text data uploaded by the another user, and the text data representing the comment on the content of the another user.
  • the determination unit 2020 performs extraction of a keyword from text data associated with the relevant account 20 - 1 and text data associated with the relevant account 20 - 2 . For example, when the number of keywords that appear commonly to both pieces of the text data is equal to or more than a predetermined number, the determination unit 2020 determines that a similarity degree between the content 30 - 1 and the content 30 - 2 is high. On the other hand, when the number of keywords that appear commonly to both pieces of the text data is less than the predetermined number, the determination unit 2020 determines that the similarity degree between the content 30 - 1 and the content 30 - 2 is not high.
  • a keyword extracted from text data may be any word, and may be a specific word. In a case of the latter, for example, a list of words to be adopted as a keyword is previously prepared, and only a word included in the list is extracted as a keyword. Note that an existing technique can be used as a technique for extracting a keyword from text data.
  • the determination unit 2020 may perform, on a keyword extracted from text data associated with the relevant account 20 , a comparison similar to the comparison related to a histogram of a frequency of appearance of an object detected from image data associated with the relevant account 20 . Specifically, the determination unit 2020 generates, for each of the relevant account 20 - 1 and the relevant account 20 - 2 , a histogram representing a distribution of a frequency of appearance of a keyword in associated text data, and determines a similarity degree between the histograms.
  • FIG. 7 is a diagram illustrating a histogram of a frequency of appearance of a keyword.
  • a histogram 50 is generated for text data 34 associated with the relevant account 20 .
  • the text data 34 associated with the relevant account 20 - 1 is expressed as text data 34 - 1
  • the histogram 50 generated from the text data 34 - 1 is expressed as a histogram 50 - 1 .
  • the text data 34 associated with the relevant account 20 - 2 is expressed as text data 34 - 2
  • the histogram 50 generated from the text data 34 - 2 is expressed as a histogram 50 - 2 .
  • the determination unit 2020 calculates a similarity degree between the histogram 50 - 1 and the histogram 50 - 2 , and, when the similarity degree is equal to or more than a predetermined value, the determination unit 2020 determines that a similarity degree between the content 30 - 1 and the content 30 - 2 is high. On the other hand, when the similarity degree between the histogram 50 - 1 and the histogram 50 - 2 is less than the predetermined value, the determination unit 2020 determines that the similarity degree between the content 30 - 1 and the content 30 - 2 is not high.
  • the predetermined value described above is previously stored in a storage apparatus that can be accessed from the determination unit 2020 .
  • a comparison between the histogram 50 - 1 and the histogram 50 - 2 may be performed by using only a part (for example, up to the top N place) of the histogram similarly to the comparison between the histogram 40 - 1 and the histogram 40 - 2 .
  • the determination unit 2020 may determine a similarity degree between the content 30 - 1 and the content 30 - 2 by a comparison between frequencies of appearance of a topic extracted from the pieces of the text data 34 .
  • a method of comparing frequencies of appearance of a topic extracted from the pieces of the text data 34 is similar to the above-described comparison between frequencies of appearance of a topic extracted from pieces of image data. Note that an existing technique can be used as a technique for extracting a topic from text data.
  • the determination unit 2020 may handle voice data associated with the relevant account as the content 30 .
  • the voice data herein include not only data generated by voice alone, but also data about voice included in moving image data. Hereinafter, comparison methods related to voice data are illustrated.
  • the determination unit 2020 extracts a keyword from each piece of voice data associated with the relevant account 20 - 1 and voice data associated with the relevant account 20 - 2 . Then, the determination unit 2020 determines a similarity degree between the content 30 - 1 and the content 30 - 2 by handling the keywords extracted from the pieces of the voice data similarly to the keywords extracted from the pieces of the text data described above. In other words, the determination unit 2020 determines a similarity degree between the content 30 - 1 and the content 30 - 2 by comparing the numbers of common keywords and histograms representing a frequency of appearance of a keyword.
  • the determination unit 2020 determines a similarity degree between the content 30 - 1 and the content 30 - 2 by comparing a frequency of appearance of a topic extracted from voice data associated with the relevant account 20 - 1 and a frequency of appearance of a topic extracted from voice data associated with the relevant account 20 - 2 .
  • a method of comparing frequencies of appearance of a topic is similar to the above-described comparison between frequencies of appearance of a topic extracted from image data. Note that an existing technique can be used as a technique for extracting a topic from voice data.
  • the determination unit 2020 performs extraction of a speaker from each piece of voice data associated with the relevant account 20 - 1 and voice data associated with the relevant account 20 - 2 .
  • An existing technique such as voice print identification, for example, can be used as a technique for performing extraction of a speaker from voice data.
  • voice print identification for example, there is a technique for identifying a speaker by generating sound spectrogram data representing a voice print from voice data, and using the sound spectrogram data as identification information.
  • the determination unit 2020 generates, for each of the relevant account 20 - 1 and the relevant account 20 - 2 , a histogram of a frequency of appearance of a speaker extracted from associated voice data.
  • FIG. 8 is a diagram illustrating a histogram of a frequency of appearance of a speaker.
  • a histogram 60 of a frequency of appearance of a speaker is generated for voice data 36 associated with the relevant account 20 .
  • the voice data 36 associated with the relevant account 20 - 1 is expressed as voice data 36 - 1
  • the histogram generated from the voice data 36 - 1 is expressed as a histogram 60 - 1
  • the voice data 36 associated with the relevant account 20 - 2 is expressed as voice data 36 - 2
  • the histogram 60 generated from the voice data 36 - 2 is expressed as a histogram 60 - 2 .
  • the determination unit 2020 calculates a similarity degree between the histogram 60 - 1 and the histogram 60 - 2 , and, when the similarity degree is equal to or more than a predetermined value, the determination unit 2020 determines that a similarity degree between the content 30 - 1 and the content 30 - 2 is high. On the other hand, when the similarity degree between the histogram 60 - 1 and the histogram 60 - 2 is less than the predetermined value, the determination unit 2020 determines that the similarity degree between the content 30 - 1 and the content 30 - 2 is not high.
  • the predetermined value described above is previously stored in a storage apparatus that can be accessed from the determination unit 2020 .
  • a comparison between the histogram 60 - 1 and the histogram 60 - 2 may be performed by using only a part (for example, up to the top N place) of the histogram similarly to the comparison of the histogram 40 and the comparison of the histogram 50 .
  • a comparison based on a speaker extracted from the voice data 36 is not limited to a comparison between histograms.
  • the determination unit 2020 may use a comparison method similar to the method described in “Comparison Method 1 Related to Text Data”. In other words, when the number of speakers who appear commonly in the voice data 36 associated with the relevant account 20 - 1 and the voice data 36 associated with the relevant account 20 - 2 is equal to or more than a predetermined number, the determination unit 2020 determines that a similarity degree between the content 30 - 1 and the content 30 - 2 is high. On the other hand, when the number of speakers who appear commonly to both pieces of the voice data 36 is less than the predetermined number, the determination unit 2020 determines that the similarity degree between the content 30 - 1 and the content 30 - 2 is not high.
  • the processing execution unit 2040 executes predetermined processing on the target account 10 - 1 and the target account 10 - 2 .
  • a variation of the processing executed by the processing execution unit 2040 is illustrated.
  • the processing execution unit 2040 when it is determined that the similarity degree between the content 30 - 1 and the content 30 - 2 is high, the processing execution unit 2040 outputs information representing that there is a high probability that the target account 10 - 1 and the target account 10 - 2 are owned by the same person.
  • the information is output, and thus a user of the information processing apparatus 2000 who acquires the information can easily realize a group of the target accounts 10 having a high probability of being owned by the same person.
  • the processing execution unit 2040 causes a display apparatus connected to the information processing apparatus 2000 to display a notification representing that there is a high probability that the target account 10 - 1 and the target account 10 - 2 are owned by the same person.
  • FIG. 9 is a diagram illustrating a notification displayed on the display apparatus.
  • the processing execution unit 2040 may transmit the notification described above to another computer communicably connected to the information processing apparatus 2000 , or store the notification described above in a storage apparatus communicably connected to the information processing apparatus 2000 .
  • the information processing apparatus 2000 performs a determination by the determination unit 2020 on a plurality of combinations of the target account and the target account 10 - 2 .
  • a plurality of combinations of the target accounts having a high probability of being owned by the same person may be found.
  • the processing execution unit 2040 may generate a list indicating one or more combinations of the target accounts 10 having a high probability of being owned by the same person, and output the list by various methods described above. By outputting such a list, a user of the information processing apparatus 2000 can easily realize the plurality of groups of the target accounts 10 having a high probability of being owned by the same person.
  • the processing execution unit 2040 outputs information related to the content 30 - 1 and the content 30 - 2 .
  • the information is referred to as similar content information.
  • a user of the information processing apparatus 2000 can acquire, for the target account 10 - 1 and the target account 10 - 2 inferred to have a high probability of being owned by the same person, information as grounds for the inference.
  • a variation of the similar content information is illustrated.
  • the processing execution unit 2040 includes, in the similar content information, the histogram 40 (see FIG. 5 ) representing a frequency of appearance of an object being generated for the image data 32 .
  • an image of each object indicated in the histogram 40 may be included together with the histogram 40 in the similar content information.
  • the processing execution unit 2040 includes, in the similar content information, a combination of images of objects determined to be similar to each other among objects extracted from the image data 32 - 1 and objects extracted from the image data 32 - 2 . Note that, when an image of an object is included in the similar content information, the entire image data 32 in which the object is included may be included in the similar content information.
  • the processing execution unit 2040 may execute analysis processing on an image of an object to be included in the similar content information, and include a result of the analysis processing in the similar content information. For example, when there is an image of a person among object images to be included in the similar content information, the processing execution unit 2040 may infer an attribute (age, height, body shape, and gender) of the person of the image, and include a result of the inference in the similar content information, or may calculate a feature of an accessory object (such as glasses, clothing, and baggage) of the person of the image, and include information related to the feature in the similar content information.
  • an attribute age, height, body shape, and gender
  • an accessory object such as glasses, clothing, and baggage
  • the processing execution unit 2040 may extract an image of a part (such as a face, a mole, a tattoo, a nail, or a fingerprint) representing a feature of a person from the image of the person, and include the image of the part in the similar content information.
  • a part such as a face, a mole, a tattoo, a nail, or a fingerprint
  • the processing execution unit 2040 determines a maker of the vehicle, a type of the vehicle, a number of a number plate, and the like, and includes the determined information in the similar content information.
  • the processing execution unit 2040 when there is an image of a landmark (such as a building, a marking, a mountain, a river, and the sea) usable for identifying a capturing place (a place where the image data 32 is generated) among object images to be included in the similar content information, the processing execution unit 2040 includes a name of the landmark in the similar content information. Further, the processing execution unit 2040 may identify a location of the landmark, and include information (an address or global positioning system (GPS) coordinates) representing the location in the similar content information. Note that a location of a landmark can be identified by using map information and the like, for example.
  • GPS global positioning system
  • the processing execution unit 2040 includes, in the similar content information, the histogram (see FIG. 7 ) generated for a keyword. At this time, each keyword indicated in the histogram may be included in the similar content information. In addition, for example, the processing execution unit 2040 includes, in the similar content information, a keyword determined to coincide among keywords extracted from the content 30 - 1 and keywords extracted from the content 30 - 2 .
  • the processing execution unit 2040 may include, in the similar content information, not only a keyword determined to coincide, but also a sentence and the entire text data in which the keyword is included. Further, when a keyword is extracted from voice data, the processing execution unit 2040 may include, in the similar content information, not only a keyword determined to coincide, but also voice data of a statement in which the keyword is included and the entire voice data from which the keyword is extracted.
  • the determination unit 2020 performs extraction of a speaker from voice data.
  • the determination unit 2020 includes, in the similar content information, the histogram 60 (see FIG. 8 ) representing a frequency of appearance of a speaker.
  • sound spectrogram data of each speaker indicated in the histogram may be included in the similar content information.
  • the determination unit 2020 includes, in the similar content information, sound spectrogram data of a speaker determined to coincide among speakers extracted from the voice data 36 - 1 and speakers extracted from the voice data 36 - 2 .
  • the processing execution unit 2040 includes, in the similar content information, the histogram (see FIG. 6 ) representing a frequency of appearance of a topic extracted from the content 30 .
  • the processing execution unit 2040 includes, in the similar content information, information (such as a name of a topic) representing a topic determined to coincide among topics extracted from the content 30 - 1 and topics extracted from the content 30 - 2 .
  • the information processing apparatus 2000 may infer that an “owner of the target account 10 - 1 and an owner of the target account 10 - 2 belong to the same group” instead of inferring that “the target account and the target account 10 - 2 are owned by the same person”.
  • the processing execution unit 2040 outputs “information representing that there is a high probability that the owner of the target account 10 - 1 and the owner of the target account 10 - 2 belong to the same group” instead of “information representing that there is a high probability that the target account and the target account 10 - 2 are owned by the same person”.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An information processing apparatus (2000) determines whether a content (30-1) of a relevant account (20-1) associated with a target account (10-1) and a content (30-2) of a relevant account (20-2) associated with a target account (10-2) are similar. When the content (30-1) and the content (30-2) are similar, the information processing apparatus (2000) executes predetermined processing related to the target account (10-1) and the target account (10-2).

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application is a Continuation of U.S. application Ser. No. 17/043,291 filed on Sep. 29, 2020, which is a National Stage of International Application No. PCT/JP2018/013880 filed on Mar. 30, 2018.
  • TECHNICAL FIELD
  • The present invention relates to a user account.
  • BACKGROUND ART
  • Some services, such as a social networking service (SNS), provide an environment in which a user can take various types of actions by using a user account. For example, a picture, a moving image, or a text message can be uploaded in association with a user account.
  • Herein, the same person may own a plurality of accounts. With regard to this point, NPL 1 discloses a technique for determining whether a plurality of user accounts are owned by the same person, based on a similarity degree among user names of a plurality of the user accounts.
  • CITATION LIST Non Patent Literature
    • [NPL1] Y Li, Y Peng, W. Ji, Z. Zhang, and Q. Xu, “User Identification Based on Display Names Across Online Social Networks”, IEEE Access, vol. 5, pp. 17342 to 17353, Aug. 25, 2017
    SUMMARY OF INVENTION Technical Problem
  • In general, a user name registered in a user account can be any name unrelated to a real name of a user. Thus, a person who creates a plurality of user accounts can set user names registered in the user accounts to be not similar to each other. Then, in the technique in NPL 1, it is difficult to determine that a plurality of user accounts in which user names being not similar to each other in such a manner are registered are owned by the same person.
  • The invention of the present application has been made in view of the above-described problem, and is to provide a technique capable of accurately detecting whether user accounts being compared are owned by the same person even when user names of the user accounts are not similar to each other.
  • Solution to Problem
  • An information processing apparatus according to the present invention includes 1) a determination unit that determines, for a first relevant account associated with a first target account and a second relevant account associated with a second target account, whether first content data associated with the first relevant account and second content data associated with the second relevant account are similar, and 2) a processing execution unit that executes predetermined processing when it is determined that the first content data and the second content data are similar.
  • A control method according to the present invention is executed by a computer. The control method includes 1) a determination step of determining, for a first relevant account associated with a first target account and a second relevant account associated with a second target account, whether first content data associated with the first relevant account and second content data associated with the second relevant account are similar, and 2) a processing execution step of executing predetermined processing when it is determined that the first content data and the second content data are similar.
  • A program according to the present invention causes a computer to execute each step included in the control method according to the present invention.
  • Advantageous Effects of Invention
  • The present invention provides a technique capable of accurately detecting whether user accounts being compared are owned by the same person even when user names of the user accounts are not similar to each other.
  • BRIEF DESCRIPTION OF DRAWINGS
  • The above-described object, the other objects, features, and advantages will become more apparent from suitable example embodiments described below and the following accompanying drawings.
  • FIG. 1 is a diagram schematically illustrating processing executed by an information processing apparatus according to a present example embodiment.
  • FIG. 2 is a diagram illustrating a functional configuration of an information processing apparatus according to an example embodiment 1.
  • FIG. 3 is a diagram illustrating a computer for achieving the information processing apparatus.
  • FIG. 4 is a flowchart illustrating a flow of processing executed by the information processing apparatus according to the example embodiment 1.
  • FIG. 5 is a diagram illustrating a histogram generated for a relevant account.
  • FIG. 6 is a diagram illustrating a histogram of a topic.
  • FIG. 7 is a diagram illustrating a histogram of a frequency of appearance of a keyword.
  • FIG. 8 is a diagram illustrating a histogram of a frequency of appearance of a speaker.
  • FIG. 9 is a diagram illustrating a notification displayed on a display apparatus.
  • EXAMPLE EMBODIMENT
  • Hereinafter, example embodiments of the present invention will be described with reference to the drawings. Note that, in all of the drawings, the same components have the same reference signs, and description thereof will not be repeated as appropriate. Further, in each block diagram, each block represents a configuration of a functional unit instead of a configuration of a hardware unit unless otherwise described.
  • Example Embodiment 1 <Outline>
  • FIG. 1 is a diagram schematically illustrating processing executed by an information processing apparatus 2000 according to the present example embodiment. The information processing apparatus 2000 infers whether owners having user accounts different from each other are the same person. For example, user information being information related to a user himself/herself and information (hereinafter, a content) such as image data and text data being registered in association with a user account are associated with the account. The user information is, for example, a name, an address, a phone number, an e-mail address, or the like.
  • In general, when a user account is created in a social networking service (SNS) and the like, a user inputs various types of user information described above. At this time, there are many cases where authenticity of a content of the input user information is not required to be proven. In such a case, the content of the user information can be even falsified. Thus, the same person can create a plurality of accounts having contents of user information different from each other. In other words, the same person can own a plurality of accounts. For a plurality of user accounts having a characteristic that “pieces of user information different from each other are registered even though actual owners are the same person” in such a manner, it is difficult to recognize a fact that the user accounts are owned by the same person from only the user information and a content of the user accounts.
  • Further, since a plurality of services such as the SNS are present, there is also a case where the same person creates user accounts with different account names in the plurality of services. In this case, even though a user registers user information without falsehood, when user information is private, it is difficult to recognize a fact that the plurality of user accounts are owned by the same person.
  • Thus, the information processing apparatus 2000 according to the present example embodiment infers user accounts different from each other being owned by the same person by using a content associated with another user account associated with a user account. Hereinafter, an account to be determined whether to be owned by the same person is expressed as a target account, and another account associated with the target account is referred to as a relevant account. For example, in the SNS, a function of associating user accounts with each other as friends is often provided. Thus, for example, an account associated as a friend of a target account is used as a relevant account. Note that which account is handled as a target account will be described below.
  • In the example in FIG. 1 , the information processing apparatus 2000 determines, for two target accounts that are a target account 10-1 and a target account 10-2, whether the target accounts are accounts owned by the same person. For the target account 10-1, a plurality of relevant accounts 20 are present. Herein, the relevant account 20 associated with the target account 10-1 is expressed as a relevant account 20-1. In FIG. 1 , only one of the plurality of relevant accounts 20-1 is provided with a reference sign for simplifying the diagram. A content associated with the relevant account 20-1 is expressed as a content 30-1. For example, the content 30-1 is image data uploaded in association with the relevant account 20-1, and the like. Similarly, a relevant account of the target account 10-2 is expressed as a relevant account 20-2, and a content associated with the relevant account 20-2 is expressed as a content 30-2. Hereinafter, the “content 30 associated with the relevant account 20” is also simply expressed as the “content 30 of the relevant account 20”.
  • The information processing apparatus 2000 determines whether the content 30-1 of the relevant account 20-1 and the content 30-2 of the relevant account 20-2 are similar. When the content 30-1 and the content 30-2 are similar, the target account 10-1 and the target account 10-2 can be inferred to belong to the same person. Thus, when the content 30-1 and the content 30-2 are similar, the information processing apparatus 2000 executes predetermined processing related to the target account 10-1 and the target account 10-2. For example, the information processing apparatus 2000 outputs, as the predetermined processing, a notification indicating that the target account 10-1 and the target account 10-2 belong to the same person.
  • Advantageous Effect
  • The information processing apparatus 2000 according to the present example embodiment determines a similarity degree between the content 30-1 of the relevant account 20-1 associated with the target account 10-1 and the content 30-2 of the relevant account 20-2 associated with the target account 10-2. Herein, when the similarity degree is high, the target account 10-1 and the target account 10-2 can be inferred to be owned by the same person. The reason will be described below.
  • The relevant account 20-1 associated with the target account 10-1 conceivably belongs to a person who has some sort of connection with an owner of the target account 10-1, such as a friend of the owner of the target account 10-1, for example. Thus, there is a high probability that a content including some sort of information related to the target account 10-1 is present among the contents 30-1 uploaded and the like in association with the relevant accounts 20-1 by owners of the relevant accounts 20-1. In other words, there is a high probability that some sort of information related to the target account 10-1 is revealed in information opened by the relevant account 20-1. For example, there is a high probability that a picture and a moving image uploaded by the relevant account 20-1 include the owner of the target account 10-1, property (such as a vehicle) of the owner of the target account 10-1, a landmark representing a place where the target account 10-1 has visited, and the like. Further, there is a high probability that text data and voice data uploaded by the relevant account 20-1 also include some sort of information related to the target account 10-1.
  • Similarly, there is a high probability that a content including some sort of information related to the target account 10-2 is present among the contents 30-2 uploaded and the like in association with the relevant accounts 20-2 by owners of the relevant accounts 20-2. For this reason, it can be said that there is a high probability that the content 30-1 of the relevant account 20-1 and the content 30-2 of the relevant account 20-2 being similar indicates that information related to the target account 10-1 included in the content 30-1 and information related to the target account 10-2 included in the content 30-2 are similar.
  • Thus, when the content 30-1 and the content 30-2 are similar, the information processing apparatus 2000 infers that there is a high probability that the owner of the target account 10-1 and an owner of the target account 10-2 are the same person. In this way, even when it is not clear whether the target account 10-1 and the target account 10-2 are owned by the same person just by comparing the user information of the target account 10-1 with the user information of the target account 10-2, whether the target account 10-1 and the target account 10-2 are accounts owned by the same person can be inferred.
  • Note that the above-described description with reference to FIG. 1 is exemplification for facilitating understanding of the information processing apparatus 2000, and does not limit the function of the information processing apparatus 2000. Hereinafter, the information processing apparatus 2000 according to the present example embodiment will be described in more detail.
  • <Example of Functional Configuration of Information Processing Apparatus 2000>
  • FIG. 2 is a diagram illustrating a functional configuration of the information processing apparatus 2000 according to the example embodiment 1. The information processing apparatus 2000 includes a determination unit 2020 and a processing execution unit 2040. The determination unit 2020 determines whether the content 30-1 of the relevant account 20-1 associated with the target account 10-1 and the content 30-2 of the relevant account 20-2 associated with the target account 10-2 are similar. When the content 30-1 and the content 30-2 are similar, the processing execution unit 2040 executes predetermined processing related to the target account 10-1 and the target account 10-2.
  • <Hardware Configuration of Information Processing Apparatus 2000>
  • Each functional component unit of the information processing apparatus 2000 may be achieved by hardware (for example, a hard-wired electronic circuit and the like) that achieves each functional component unit, and may be achieved by a combination (for example, a combination of an electronic circuit and a program that controls the electronic circuit, and the like) of hardware and software. Hereinafter, a case where each functional component unit of the information processing apparatus 2000 is achieved by the combination of hardware and software will be further described.
  • FIG. 3 is a diagram illustrating a computer 1000 for achieving the information processing apparatus 2000. The computer 1000 is any computer. For example, the computer 1000 is a personal computer (PC), a server machine, or the like. The computer 1000 may be a dedicated computer designed for achieving the information processing apparatus 2000, and may be a general-purpose computer.
  • The computer 1000 includes a bus 1020, a processor 1040, a memory 1060, a storage device 1080, an input/output interface 1100, and a network interface 1120. The bus 1020 is a data transmission path for allowing the processor 1040, the memory 1060, the storage device 1080, the input/output interface 1100, and the network interface 1120 to transmit and receive data with one another. However, a method of connecting the processor 1040 and the like to each other is not limited to a bus connection.
  • The processor 1040 is various types of processors such as a central processing unit (CPU), a graphics processing unit (GPU), and a field-programmable gate array (FPGA). The memory 1060 is a main storage achieved by using a random access memory (RAM) and the like. The storage device 1080 is an auxiliary storages achieved by using a hard disk, a solid state drive (SSD), a memory card, a read only memory (ROM), or the like.
  • The input/output interface 1100 is an interface for connecting the computer 1000 and an input/output device. For example, an input apparatus such as a keyboard and an output apparatus such as a display apparatus are connected to the input/output interface 1100. The network interface 1120 is an interface for connecting the computer 1000 to a communication network. The communication network is, for example, a local area network (LAN) and a wide area network (WAN). A method of connection to the communication network by the network interface 1120 may be a wireless connection or a wired connection.
  • The storage device 1080 stores a program module that achieves each functional component unit of the information processing apparatus 2000. The processor 1040 achieves a function associated with each program module by reading each of the program modules to the memory 1060 and executing the read program module.
  • <Flow of Processing>
  • FIG. 4 is a flowchart illustrating a flow of processing executed by the information processing apparatus 2000 according to the example embodiment 1. The determination unit 2020 acquires the content 30-1 of each of the relevant accounts 20-1 associated with the target account 10-1 (S102). The determination unit 2020 acquires the content 30-2 of each of the relevant accounts 20-2 associated with the target account 10-2 (S104). The determination unit 2020 determines whether the content 30-1 and the content 30-2 are similar (S106). When the content 30-1 and the content 30-2 are similar (S106: YES), the processing execution unit 2040 executes predetermined processing (S108). On the other hand, when the content 30-1 and the content 30-2 are not similar (S106: NO), the processing in FIG. 4 ends.
  • <With Regard to User Account>
  • As described above, the target account 10 and the relevant account 20 are user accounts created by a user in a service such as the SNS, for example. In general, such a user account is created by registering user information such as a name, and is continuously used.
  • However, a user account handled by the information processing apparatus 2000 is not limited to a user account created by registering user information in such a manner. For example, on a bulletin board and the like on a Web page, when a user posts (uploads text data, and the like) a content, an identifier is assigned to the post. The information processing apparatus 2000 may handle the identifier as a user account. In this case, for example, when a certain user posts a content on a bulletin board site and another user comments on the post, one of the former and the latter can be handled as the target account 10 and the other can be handled as the relevant account 20.
  • <With Regard to Target Account 10>
  • The information processing apparatus 2000 infers, for two accounts of the target account 10-1 and the target account 10-2, whether the accounts belong to the same person. Herein, the target account 10-1 and the target account 10-2 may be user accounts for using the same service (for example, the SNS), or may be user accounts for using services different from each other.
  • Herein, there are various methods as a method of determining which user account among a plurality of user accounts is handled as the target account 10. Hereinafter, a variation of the methods is illustrated.
  • <<Method 1 of Determining Target Account 10>>
  • For example, the information processing apparatus 2000 receives a specification of a user account handled as the target account 10 from a user of the information processing apparatus 2000. The user account specified by a user may be two, or may be three or more. When three or more user accounts are specified, for example, the information processing apparatus 2000 executes, for each combination (n−2 combination) of any two user accounts creatable for the specified user accounts, processing handling two user accounts included in the combination as the target accounts 10. In other words, when user accounts of A, B, and C are specified, processing handling A and B as the target accounts 10, processing handling A and C as the target accounts 10, and processing handling B and C as the target accounts 10 are each executed.
  • <<Method 2 of Determining Target Account 10>>
  • For example, the information processing apparatus 2000 receives, from a user, an input that specifies one user account handled as the target account 10. The information processing apparatus 2000 handles the user account specified by a user as the target account 10-1. Furthermore, the information processing apparatus 2000 handles, as the target account 10-2, another user account having user information similar to user information of the target account 10-1. The similarity between pieces of user information herein refers to, for example, a part of various pieces of information (a part of a user ID, a part of a name, a part of a birth date, a part of an e-mail address, or the like) being common. When a plurality of other user accounts having user information similar to the user information of the target account 10-1 are present, the information processing apparatus 2000 handles each of the plurality of user accounts as the target account 10-2.
  • <<Method 3 of Determining Target Account 10>>
  • The information processing apparatus 2000 may operate in cooperation with a monitoring system for monitoring a user account, and receive a specification of a user account from the monitoring system. For example, the monitoring system monitors a usage aspect (such as a content of an uploaded content and a frequency of uploading) of a user account, and determines a user account whose usage aspect violates common sense, a user policy of a service, law, or the like (that is, determines a user account to beware of). The monitoring system notifies the determined user account to the information processing apparatus 2000. The information processing apparatus 2000 executes, for each combination of any two user accounts creatable for the plurality of user accounts notified from the monitoring system, processing handling the two user accounts included in the combination as the target accounts 10. Note that, when the monitoring system notifies user accounts one by one, the information processing apparatus 2000 executes the above-described processing on a plurality of user accounts indicated by a plurality of notifications received during a predetermined period of time, for example.
  • <With Regard to Relevant Account 20>
  • As described above, the relevant account 20 is another account associated with the target account 10, and is an account in a friendship with the target account 10 in the SNS, for example. When the plurality of relevant accounts 20 are associated with the target account 10, the determination unit 2020 may acquire the content 30 for all of the relevant accounts 20, and may acquire the content 30 for some of the relevant accounts 20. When the content 30 is acquired for some of the relevant accounts 20, the determination unit 2020 arbitrarily (for example, randomly) selects a predetermined number of the relevant accounts 20 from the plurality of relevant accounts 20, for example.
  • <Acquisition of Content 30: S102 and S104>
  • The determination unit 2020 acquires the content 30-1 associated with the relevant account 20-1 and the content 30-2 associated with the relevant account 20-2 (S102 and S104). For example, the determination unit 2020 automatically collects, for each of the relevant accounts 20, each of the contents 30 from Web pages on which the contents 30 of the relevant accounts 20 are opened, by successively accessing the Web pages.
  • Further, an application programming interface (API) for acquiring a content associated with a user account may be provided in a service such as the SNS. Thus, the determination unit 2020 may acquire the content 30 of the relevant account 20 by using the API provided in a service used by the relevant account 20.
  • Note that the determination unit 2020 may acquire all of the contents 30 associated with the relevant account 20, and may acquire only the content 30 of a predetermined type. For example, when a target of a similarity determination is only image data, the determination unit 2020 acquires image data associated with the relevant account 20 as the content 30.
  • <Comparison between Pieces of Content Data: S106>
  • The determination unit 2020 compares content data of the relevant account 20-1 with content data of the relevant account 20-2, and infers that, when a similarity degree between the pieces of the content data is high, the target account 10-1 and the target account 10-2 are owned by the same person. The processing may adopt various variations in points that 1) what kind of content data is to be compared and 2) what kind of comparison is performed. Hereinafter, a comparison between pieces of content data will be described while focusing on the two points.
  • <<Comparison between Pieces of Image Data>>
  • Image data are conceivable as a type of the content data to be compared. For example, in the SNS, image data of a picture of a person, a building, scenery, or the like are uploaded by using a user account. The determination unit 2020 handles image data uploaded by using a user account in such a manner as a content associated with the user account. Further, a user may make a post that refers to (links) a Web page including image data, and make a post that refers to image data uploaded by another user. The determination unit 2020 may also handle image data referred by a user in such a manner as content data associated with an account of the user. Note that a moving image frame constituting moving image data is also included in image data. Using image data has an advantage that similarity between the content 30-1 and the content 30-2 is easily determined even when a language used in the relevant account 20-1 is different from a language used in the relevant account 20-2. Hereinafter, a few specific comparison methods related to image data are illustrated.
  • <<<Comparison Method 1 Related to Image Data>>>
  • The determination unit 2020 focuses on a similarity degree between an object detected from image data associated with the relevant account 20-1 and an object detected from image data associated with the relevant account 20-2. For example, the determination unit 2020 calculates the similarity degree between the object detected from the image data associated with the relevant account 20-1 and the object detected from the image data associated with the relevant account 20-2. Then, when the number of groups (namely, groups of objects inferred to be the same) of objects having a similarity degree equal to or more than a predetermined value is equal to or more than a predetermined number, the determination unit 2020 determines that the similarity degree between the content data of the relevant account 20-1 and the content data of the relevant account 20-2 is high. On the other hand, when the number of groups of objects having a similarity degree equal to or more than the predetermined value is less than the predetermined number, the determination unit 2020 determines that the similarity degree between the content data of the relevant account 20-1 and the content data of the relevant account 20-2 is not high. The predetermined number described above is previously stored in a storage apparatus that can be accessed from the determination unit 2020.
  • Herein, an object detected from image data 32 may be an object of any kind, and may be an object of a specific kind. In a case of the latter, for example, only a person among objects included in the image data 32 is to be detected.
  • Note that an existing technique can be used as a technique for detecting an object from image data and a technique for determining a similarity degree between detected objects.
  • <<<Comparison Method 2 Related to Image Data>>>
  • The determination unit 2020 generates, for each of the relevant account 20-1 and the relevant account 20-2, a histogram representing a distribution of a frequency of appearance of an object in image data associated thereto, and determines a similarity degree between the histograms. FIG. 5 is a diagram illustrating a histogram generated for the relevant account 20. In FIG. 5 , a plurality of pieces of image data 32 are associated with the relevant account 20. A histogram 40 is a distribution of a frequency of appearance of an object detected from the image data 32. Hereinafter, the image data 32 associated with the relevant account 20-1 are expressed as image data 32-1, and the histogram 40 generated for the image data 32-1 is expressed as a histogram 40-1. Similarly, the image data 32 associated with the relevant account 20-2 are expressed as image data 32-2, and the histogram 40 generated for the image data 32-2 is expressed as a histogram 40-2.
  • The determination unit 2020 determines a similarity degree between the histogram 40-1 and the histogram 40-2. For example, the determination unit 2020 calculates the similarity degree between the histogram 40-1 and the histogram 40-2, and, when the calculated similarity degree is equal to or more than a predetermined value, the determination unit 2020 determines that a similarity degree between the content 30-1 and the content 30-2 is high. On the other hand, when the similarity degree between the histogram 40-1 and the histogram 40-2 is less than the predetermined value, the determination unit 2020 determines that the similarity degree between the content 30-1 and the content 30-2 is not high. Herein, an existing technique can be used as a technique for calculating a similarity degree between two histograms. Further, the predetermined value described above is stored in a storage apparatus that can be accessed from the determination unit 2020.
  • The histogram 40-1 and the histogram 40-2 are generated as follows, for example. First, the determination unit 2020 recognizes an object included in each piece of the image data 32-1 by performing object recognition processing on each piece of the image data 32-1 as a target. Furthermore, the determination unit 2020 generates the histogram 40-1 representing a distribution of a frequency of appearance of an object by counting the number of appearances of each object.
  • Herein, the determination unit 2020 assigns an identifier to each object detected from the image data 32-1. At this time, for example, the determination unit 2020 makes each object identifiable by assigning the same identifier to the same object, and can thus count the number of appearances of the object. In order to achieve this, a determination (identification of an object) of whether each object detected from the image data 32 is the same is needed. In other words, when the determination unit 2020 assigns an identifier to an object detected from the image data 32, and the object is the same as another object being already detected, the determination unit 2020 assigns the same identifier as an identifier assigned to the object being already detected. On the other hand, when the object is different from any objects being already detected, the determination unit 2020 assigns a new identifier that is not assigned to any object.
  • The determination unit 2020 generates the histogram 40-2 by also performing similar processing on the image data 32-2. At this time, for an object detected from the image data 32-2, not only identification with an object detected from the other piece of image data 32-2 but also identification with an object detected from the image data 32-1 are performed. In other words, when the same object as an object detected from the image data 32-2 is already detected from the image data 32-1, the determination unit 2020 also assigns, to the object detected from the image data 32-2, an identifier assigned to the object being already detected. Various types of existing techniques can be used for identification of an object.
  • Herein, a comparison between the histogram 40-1 and the histogram 40-2 may be performed by using only a part of the histogram 40-1 and a part of the histogram 40-2. For example, the determination unit 2020 calculates a similarity degree between the histogram 40-1 and the histogram 40-2 by comparing a frequency of appearance of objects in top N places (N is a natural number of two or more) in the histogram 40-1 with a frequency of appearance of objects in top N places in the histogram 40-2.
  • <<<Comparison Method 3 Related to Image Data>>>
  • A comparison related to image data may be achieved by a comparison between topics of the image data instead of a comparison between objects detected from the image data. Herein, a topic in a certain piece of data refers to a main matter or event expressed by the data. For example, a topic such as work, food, sports, traveling, games, or politics is conceivable. The determination unit 2020 classifies each piece of the image data 32 associated with the relevant account 20 by topic. Herein, an existing technique can be used as a technique for classifying image data by topic.
  • For example, the determination unit 2020 generates a histogram of a frequency of appearance of a topic for each of the image data 32-1 and the image data 32-2. FIG. 6 is a diagram illustrating a histogram of a topic. When a similarity degree between a histogram of a topic generated from the image data 32-1 and a histogram of a topic generated from the image data 32-2 is equal to or more than a predetermined value, the determination unit 2020 determines that a similarity degree between the content 30-1 and the content 30-2 is high. On the other hand, when the similarity degree between the histogram of the topic generated from the image data 32-1 and the histogram of the topic generated from the image data 32-2 is less than the predetermined value, the determination unit 2020 determines that the similarity degree between the content 30-1 and the content 30-2 is not high.
  • <<Comparison Related to Text Data>>
  • The determination unit 2020 may perform a comparison similar to the above-described comparison related to the image data 32 on text data associated with the relevant account 20. For example, in the SNS, text data representing information such as a thought of a user and a recent state of a user are uploaded in association with a user account. The determination unit 2020 handles, for example, text data uploaded by a user in such a manner as the content 30.
  • In addition, for example, a user may also make a post that refers to a Web page, a post that refers to text data uploaded by another user, a post of a comment on a content of another user, and the like. The determination unit 2020 may also handle, as content data associated with an account of the user, the text data included in the Web page referred by the user in such a manner, the text data uploaded by the another user, and the text data representing the comment on the content of the another user. Hereinafter, a few specific comparison methods related to text data are illustrated.
  • <<<Comparison Method 1 Related to Text Data>>>
  • For example, the determination unit 2020 performs extraction of a keyword from text data associated with the relevant account 20-1 and text data associated with the relevant account 20-2. For example, when the number of keywords that appear commonly to both pieces of the text data is equal to or more than a predetermined number, the determination unit 2020 determines that a similarity degree between the content 30-1 and the content 30-2 is high. On the other hand, when the number of keywords that appear commonly to both pieces of the text data is less than the predetermined number, the determination unit 2020 determines that the similarity degree between the content 30-1 and the content 30-2 is not high.
  • Herein, a keyword extracted from text data may be any word, and may be a specific word. In a case of the latter, for example, a list of words to be adopted as a keyword is previously prepared, and only a word included in the list is extracted as a keyword. Note that an existing technique can be used as a technique for extracting a keyword from text data.
  • <<<Comparison Method 2 Related to Text Data>>>
  • For example, the determination unit 2020 may perform, on a keyword extracted from text data associated with the relevant account 20, a comparison similar to the comparison related to a histogram of a frequency of appearance of an object detected from image data associated with the relevant account 20. Specifically, the determination unit 2020 generates, for each of the relevant account 20-1 and the relevant account 20-2, a histogram representing a distribution of a frequency of appearance of a keyword in associated text data, and determines a similarity degree between the histograms.
  • FIG. 7 is a diagram illustrating a histogram of a frequency of appearance of a keyword. In FIG. 7 , a histogram 50 is generated for text data 34 associated with the relevant account 20. Hereinafter, the text data 34 associated with the relevant account 20-1 is expressed as text data 34-1, and the histogram 50 generated from the text data 34-1 is expressed as a histogram 50-1. Similarly, the text data 34 associated with the relevant account 20-2 is expressed as text data 34-2, and the histogram 50 generated from the text data 34-2 is expressed as a histogram 50-2.
  • For example, the determination unit 2020 calculates a similarity degree between the histogram 50-1 and the histogram 50-2, and, when the similarity degree is equal to or more than a predetermined value, the determination unit 2020 determines that a similarity degree between the content 30-1 and the content 30-2 is high. On the other hand, when the similarity degree between the histogram 50-1 and the histogram 50-2 is less than the predetermined value, the determination unit 2020 determines that the similarity degree between the content 30-1 and the content 30-2 is not high. The predetermined value described above is previously stored in a storage apparatus that can be accessed from the determination unit 2020.
  • Herein, a comparison between the histogram 50-1 and the histogram 50-2 may be performed by using only a part (for example, up to the top N place) of the histogram similarly to the comparison between the histogram 40-1 and the histogram 40-2.
  • <<<Comparison Method 3 Related to Text Data>>>
  • The determination unit 2020 may determine a similarity degree between the content 30-1 and the content 30-2 by a comparison between frequencies of appearance of a topic extracted from the pieces of the text data 34. A method of comparing frequencies of appearance of a topic extracted from the pieces of the text data 34 is similar to the above-described comparison between frequencies of appearance of a topic extracted from pieces of image data. Note that an existing technique can be used as a technique for extracting a topic from text data.
  • <<Comparison Related to Voice Data>>
  • The determination unit 2020 may handle voice data associated with the relevant account as the content 30. The voice data herein include not only data generated by voice alone, but also data about voice included in moving image data. Hereinafter, comparison methods related to voice data are illustrated.
  • <<<Comparison Method 1 Related to Voice Data>>>
  • The determination unit 2020 extracts a keyword from each piece of voice data associated with the relevant account 20-1 and voice data associated with the relevant account 20-2. Then, the determination unit 2020 determines a similarity degree between the content 30-1 and the content 30-2 by handling the keywords extracted from the pieces of the voice data similarly to the keywords extracted from the pieces of the text data described above. In other words, the determination unit 2020 determines a similarity degree between the content 30-1 and the content 30-2 by comparing the numbers of common keywords and histograms representing a frequency of appearance of a keyword.
  • <<<Comparison Method 2 Related to Voice Data>>>
  • The determination unit 2020 determines a similarity degree between the content 30-1 and the content 30-2 by comparing a frequency of appearance of a topic extracted from voice data associated with the relevant account 20-1 and a frequency of appearance of a topic extracted from voice data associated with the relevant account 20-2. A method of comparing frequencies of appearance of a topic is similar to the above-described comparison between frequencies of appearance of a topic extracted from image data. Note that an existing technique can be used as a technique for extracting a topic from voice data.
  • <<<Comparison Method 3 Related to Voice Data>>>
  • The determination unit 2020 performs extraction of a speaker from each piece of voice data associated with the relevant account 20-1 and voice data associated with the relevant account 20-2. An existing technique such as voice print identification, for example, can be used as a technique for performing extraction of a speaker from voice data. For example, there is a technique for identifying a speaker by generating sound spectrogram data representing a voice print from voice data, and using the sound spectrogram data as identification information.
  • For example, the determination unit 2020 generates, for each of the relevant account 20-1 and the relevant account 20-2, a histogram of a frequency of appearance of a speaker extracted from associated voice data. FIG. 8 is a diagram illustrating a histogram of a frequency of appearance of a speaker. In FIG. 8 , a histogram 60 of a frequency of appearance of a speaker is generated for voice data 36 associated with the relevant account 20. Hereinafter, the voice data 36 associated with the relevant account 20-1 is expressed as voice data 36-1, and the histogram generated from the voice data 36-1 is expressed as a histogram 60-1. Similarly, the voice data 36 associated with the relevant account 20-2 is expressed as voice data 36-2, and the histogram 60 generated from the voice data 36-2 is expressed as a histogram 60-2.
  • For example, the determination unit 2020 calculates a similarity degree between the histogram 60-1 and the histogram 60-2, and, when the similarity degree is equal to or more than a predetermined value, the determination unit 2020 determines that a similarity degree between the content 30-1 and the content 30-2 is high. On the other hand, when the similarity degree between the histogram 60-1 and the histogram 60-2 is less than the predetermined value, the determination unit 2020 determines that the similarity degree between the content 30-1 and the content 30-2 is not high. The predetermined value described above is previously stored in a storage apparatus that can be accessed from the determination unit 2020.
  • Herein, a comparison between the histogram 60-1 and the histogram 60-2 may be performed by using only a part (for example, up to the top N place) of the histogram similarly to the comparison of the histogram 40 and the comparison of the histogram 50.
  • A comparison based on a speaker extracted from the voice data 36 is not limited to a comparison between histograms. For example, the determination unit 2020 may use a comparison method similar to the method described in “Comparison Method 1 Related to Text Data”. In other words, when the number of speakers who appear commonly in the voice data 36 associated with the relevant account 20-1 and the voice data 36 associated with the relevant account 20-2 is equal to or more than a predetermined number, the determination unit 2020 determines that a similarity degree between the content 30-1 and the content 30-2 is high. On the other hand, when the number of speakers who appear commonly to both pieces of the voice data 36 is less than the predetermined number, the determination unit 2020 determines that the similarity degree between the content 30-1 and the content 30-2 is not high.
  • <Predetermined Processing>
  • As described above, when it is determined that a similarity degree between the content data 30-1 associated with the relevant account 20-1 and the content data 30-2 associated with the relevant account 20-2 is high, there is a high probability that the target account 10-1 and the target account 10-2 are owned by the same person. Thus, when it is determined that the similarity degree between the content 30-1 and the content 30-2 is high, the processing execution unit 2040 executes predetermined processing on the target account 10-1 and the target account 10-2. Hereinafter, a variation of the processing executed by the processing execution unit 2040 is illustrated.
  • <<Predetermined Processing 1>>
  • For example, when it is determined that the similarity degree between the content 30-1 and the content 30-2 is high, the processing execution unit 2040 outputs information representing that there is a high probability that the target account 10-1 and the target account 10-2 are owned by the same person. The information is output, and thus a user of the information processing apparatus 2000 who acquires the information can easily realize a group of the target accounts 10 having a high probability of being owned by the same person.
  • There are various methods of outputting the information described above. For example, the processing execution unit 2040 causes a display apparatus connected to the information processing apparatus 2000 to display a notification representing that there is a high probability that the target account 10-1 and the target account 10-2 are owned by the same person. FIG. 9 is a diagram illustrating a notification displayed on the display apparatus. In addition, for example, the processing execution unit 2040 may transmit the notification described above to another computer communicably connected to the information processing apparatus 2000, or store the notification described above in a storage apparatus communicably connected to the information processing apparatus 2000.
  • Further, it is assumed that the information processing apparatus 2000 performs a determination by the determination unit 2020 on a plurality of combinations of the target account and the target account 10-2. In this case, a plurality of combinations of the target accounts having a high probability of being owned by the same person may be found. Thus, the processing execution unit 2040 may generate a list indicating one or more combinations of the target accounts 10 having a high probability of being owned by the same person, and output the list by various methods described above. By outputting such a list, a user of the information processing apparatus 2000 can easily realize the plurality of groups of the target accounts 10 having a high probability of being owned by the same person.
  • <<Predetermined Processing 2>>
  • In addition, for example, when it is determined that the similarity degree between the content 30-1 and the content 30-2 is high, the processing execution unit 2040 outputs information related to the content 30-1 and the content 30-2. Hereinafter, the information is referred to as similar content information. By outputting similar content information, a user of the information processing apparatus 2000 can acquire, for the target account 10-1 and the target account 10-2 inferred to have a high probability of being owned by the same person, information as grounds for the inference. Hereinafter, a variation of the similar content information is illustrated.
  • <<<Variation 1: Image of Object>>>
  • It is assumed that the determination unit 2020 performs a comparison between objects extracted from the pieces of the image data 32. In this case, for example, the processing execution unit 2040 includes, in the similar content information, the histogram 40 (see FIG. 5 ) representing a frequency of appearance of an object being generated for the image data 32. Herein, an image of each object indicated in the histogram 40 may be included together with the histogram 40 in the similar content information. In addition, for example, the processing execution unit 2040 includes, in the similar content information, a combination of images of objects determined to be similar to each other among objects extracted from the image data 32-1 and objects extracted from the image data 32-2. Note that, when an image of an object is included in the similar content information, the entire image data 32 in which the object is included may be included in the similar content information.
  • Furthermore, the processing execution unit 2040 may execute analysis processing on an image of an object to be included in the similar content information, and include a result of the analysis processing in the similar content information. For example, when there is an image of a person among object images to be included in the similar content information, the processing execution unit 2040 may infer an attribute (age, height, body shape, and gender) of the person of the image, and include a result of the inference in the similar content information, or may calculate a feature of an accessory object (such as glasses, clothing, and baggage) of the person of the image, and include information related to the feature in the similar content information. In addition, for example, the processing execution unit 2040 may extract an image of a part (such as a face, a mole, a tattoo, a nail, or a fingerprint) representing a feature of a person from the image of the person, and include the image of the part in the similar content information.
  • In addition, for example, when there is an image of a vehicle (such as a car, a motor cycle, and a bicycle) among object images to be included in the similar content information, the processing execution unit 2040 determines a maker of the vehicle, a type of the vehicle, a number of a number plate, and the like, and includes the determined information in the similar content information.
  • In addition, for example, when there is an image of a landmark (such as a building, a marking, a mountain, a river, and the sea) usable for identifying a capturing place (a place where the image data 32 is generated) among object images to be included in the similar content information, the processing execution unit 2040 includes a name of the landmark in the similar content information. Further, the processing execution unit 2040 may identify a location of the landmark, and include information (an address or global positioning system (GPS) coordinates) representing the location in the similar content information. Note that a location of a landmark can be identified by using map information and the like, for example.
  • <<<Variation 2: Keyword>>>
  • It is assumed that the determination unit 2020 performs a comparison between keywords extracted from text data or voice data. In this case, for example, the processing execution unit 2040 includes, in the similar content information, the histogram (see FIG. 7 ) generated for a keyword. At this time, each keyword indicated in the histogram may be included in the similar content information. In addition, for example, the processing execution unit 2040 includes, in the similar content information, a keyword determined to coincide among keywords extracted from the content 30-1 and keywords extracted from the content 30-2.
  • Note that, when a keyword is extracted from text data, the processing execution unit 2040 may include, in the similar content information, not only a keyword determined to coincide, but also a sentence and the entire text data in which the keyword is included. Further, when a keyword is extracted from voice data, the processing execution unit 2040 may include, in the similar content information, not only a keyword determined to coincide, but also voice data of a statement in which the keyword is included and the entire voice data from which the keyword is extracted.
  • <<<Variation 3: Speaker>>>
  • It is assumed that the determination unit 2020 performs extraction of a speaker from voice data. In this case, for example, the determination unit 2020 includes, in the similar content information, the histogram 60 (see FIG. 8 ) representing a frequency of appearance of a speaker. At this time, sound spectrogram data of each speaker indicated in the histogram may be included in the similar content information. In addition, for example, the determination unit 2020 includes, in the similar content information, sound spectrogram data of a speaker determined to coincide among speakers extracted from the voice data 36-1 and speakers extracted from the voice data 36-2.
  • <<<Variation 4: Topic>>>
  • It is assumed that the determination unit 2020 performs a comparison between topics extracted from the content 30. In this case, for example, the processing execution unit 2040 includes, in the similar content information, the histogram (see FIG. 6 ) representing a frequency of appearance of a topic extracted from the content 30. In addition, for example, the processing execution unit 2040 includes, in the similar content information, information (such as a name of a topic) representing a topic determined to coincide among topics extracted from the content 30-1 and topics extracted from the content 30-2.
  • While the example embodiments of the present invention have been described with reference to the drawings, the example embodiments are only exemplification of the present invention, and various configurations other than the above-described example embodiments can also be employed.
  • For example, when the content 30-1 and the content 30-2 are similar, the information processing apparatus 2000 may infer that an “owner of the target account 10-1 and an owner of the target account 10-2 belong to the same group” instead of inferring that “the target account and the target account 10-2 are owned by the same person”. In this case, the processing execution unit 2040 outputs “information representing that there is a high probability that the owner of the target account 10-1 and the owner of the target account 10-2 belong to the same group” instead of “information representing that there is a high probability that the target account and the target account 10-2 are owned by the same person”.

Claims (20)

1. An information processing apparatus, comprising:
at least one memory configured to store instructions; and
at least one processor configured to execute the instructions to perform operations, the operations comprising:
determining, for a first relevant account associated with a first target account and a plurality of relevant accounts associated with a plurality of target accounts other than the first relevant account, whether the first content data associated with the first relevant account and a plurality of content data associated with the plurality of relevant accounts are similar, and
executing predetermined processing when it is determined that the first content data and the plurality of content data are similar.
2. The information processing apparatus according to claim 1, wherein the operations further comprise determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of an object included in image data associated with the first relevant account and distributions of frequencies of appearance of objects included in image data associated with the plurality of relevant accounts are similar.
3. The information processing apparatus according to claim 1, wherein the operations further comprise determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a word included in text data or voice data associated with the first relevant account and distribution of frequencies of appearance of words included in text data or voice data associated with the plurality of relevant accounts are similar.
4. The information processing apparatus according to claim 1, wherein the operations further comprise determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a speaker extracted from voice data associated with the first relevant account and distributions of frequencies of appearance of speakers extracted from voice data associated with the plurality of relevant accounts are similar.
5. The information processing apparatus according to claim 1, wherein the operations further comprise determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a topic extracted from content data associated with the first relevant account and distributions of frequencies of appearance of topics extracted from content data associated with the plurality of relevant accounts are similar.
6. The information processing apparatus according to claim 1, wherein the operations further comprise, as the predetermined processing, outputting information indicating that there is a high probability that the first target account and the plurality of target accounts are owned by a same person, or information indicating that there is a high probability that an owner of the first target account and an owner of the plurality of target accounts belong to a same group.
7. The information processing apparatus according to claim 2, wherein the operations further comprise, as the predetermined processing, outputting the distributions.
8. The information processing apparatus according to claim 1, wherein the operations further comprise, as the predetermined processing, outputting content data that coincide or are similar among the first content data and the plurality of contents data.
9. The information processing apparatus according to claim 8, wherein the operations further comprise extracting an image region representing a characteristic part of a person included in image data and outputting the extracted image region.
10. The information processing apparatus according to claim 8, wherein the operations further comprise outputting information indicating at least one of a type, a maker, and a number of a number plate of a vehicle included in image data.
11. The information processing apparatus according to claim 8, wherein t the operations further comprise outputting a name or a location of a landmark included in image data.
12. A control method executed by a computer, comprising:
determining, for a first relevant account associated with a first target account and a plurality of relevant accounts associated with a plurality of target accounts other than the first relevant account, whether the first content data associated with the first relevant account and a plurality of content data associated with the plurality of relevant accounts are similar, and
executing predetermined processing when it is determined that the first content data and the plurality of content data are similar.
13. The control method according to claim 12, further comprising determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of an object included in image data associated with the first relevant account and distributions of frequencies of appearance of objects included in image data associated with the plurality of relevant accounts are similar.
14. The control method according to claim 12, further comprising determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a word included in text data or voice data associated with the first relevant account and distribution of frequencies of appearance of words included in text data or voice data associated with the plurality of relevant accounts are similar.
15. The control method according to claim 12, further comprising determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a speaker extracted from voice data associated with the first relevant account and distributions of frequencies of appearance of speakers extracted from voice data associated with the plurality of relevant accounts are similar.
16. The control method according to claim 12, further comprising determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a topic extracted from content data associated with the first relevant account and distributions of frequencies of appearance of topics extracted from content data associated with the plurality of relevant accounts are similar.
17. A non-transitory computer-readable medium storing a program for causing a computer to perform operations, the operations comprising:
determining, for a first relevant account associated with a first target account and a plurality of relevant accounts associated with a plurality of target accounts other than the first relevant account, whether the first content data associated with the first relevant account and a plurality of content data associated with the plurality of relevant accounts are similar, and
executing predetermined processing when it is determined that the first content data and the plurality of content data are similar.
18. The non-transitory computer-readable medium according to claim 17, wherein the operations further comprise determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of an object included in image data associated with the first relevant account and distributions of frequencies of appearance of objects included in image data associated with the plurality of relevant accounts are similar.
19. The non-transitory computer-readable medium according to claim 17, wherein the operations further comprise determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a word included in text data or voice data associated with the first relevant account and distribution of frequencies of appearance of words included in text data or voice data associated with the plurality of relevant accounts are similar.
20. The non-transitory computer-readable medium according to claim 17, wherein the operations further comprise determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a speaker extracted from voice data associated with the first relevant account and distributions of frequencies of appearance of speakers extracted from voice data associated with the plurality of relevant accounts are similar.
US18/240,160 2018-03-30 2023-08-30 Information processing apparatus, control method, and program Pending US20230410221A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/240,160 US20230410221A1 (en) 2018-03-30 2023-08-30 Information processing apparatus, control method, and program

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
PCT/JP2018/013880 WO2019187107A1 (en) 2018-03-30 2018-03-30 Information processing device, control method, and program
US202017043291A 2020-09-29 2020-09-29
US18/240,160 US20230410221A1 (en) 2018-03-30 2023-08-30 Information processing apparatus, control method, and program

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
PCT/JP2018/013880 Continuation WO2019187107A1 (en) 2018-03-30 2018-03-30 Information processing device, control method, and program
US17/043,291 Continuation US20210019553A1 (en) 2018-03-30 2018-03-30 Information processing apparatus, control method, and program

Publications (1)

Publication Number Publication Date
US20230410221A1 true US20230410221A1 (en) 2023-12-21

Family

ID=68059653

Family Applications (4)

Application Number Title Priority Date Filing Date
US17/043,291 Abandoned US20210019553A1 (en) 2018-03-30 2018-03-30 Information processing apparatus, control method, and program
US18/240,152 Pending US20230410220A1 (en) 2018-03-30 2023-08-30 Information processing apparatus, control method, and program
US18/240,209 Pending US20230410222A1 (en) 2018-03-30 2023-08-30 Information processing apparatus, control method, and program
US18/240,160 Pending US20230410221A1 (en) 2018-03-30 2023-08-30 Information processing apparatus, control method, and program

Family Applications Before (3)

Application Number Title Priority Date Filing Date
US17/043,291 Abandoned US20210019553A1 (en) 2018-03-30 2018-03-30 Information processing apparatus, control method, and program
US18/240,152 Pending US20230410220A1 (en) 2018-03-30 2023-08-30 Information processing apparatus, control method, and program
US18/240,209 Pending US20230410222A1 (en) 2018-03-30 2023-08-30 Information processing apparatus, control method, and program

Country Status (3)

Country Link
US (4) US20210019553A1 (en)
JP (1) JP7070665B2 (en)
WO (1) WO2019187107A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11565698B2 (en) * 2018-04-16 2023-01-31 Mitsubishi Electric Cornoration Obstacle detection apparatus, automatic braking apparatus using obstacle detection apparatus, obstacle detection method, and automatic braking method using obstacle detection method
JP7110293B2 (en) * 2020-09-28 2022-08-01 楽天グループ株式会社 Information processing device, information processing method and program

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5181691B2 (en) * 2008-01-21 2013-04-10 日本電気株式会社 Information processing apparatus, information processing method, computer program, and recording medium
US9201863B2 (en) * 2009-12-24 2015-12-01 Woodwire, Inc. Sentiment analysis from social media content
US20110320560A1 (en) * 2010-06-29 2011-12-29 Microsoft Corporation Content authoring and propagation at various fidelities
JP5758831B2 (en) * 2012-03-30 2015-08-05 楽天株式会社 Information providing apparatus, information providing method, information providing program, and computer-readable recording medium for recording the program
US8666123B2 (en) * 2012-04-26 2014-03-04 Google Inc. Creating social network groups
US9208171B1 (en) * 2013-09-05 2015-12-08 Google Inc. Geographically locating and posing images in a large-scale image repository and processing framework
US20150120583A1 (en) * 2013-10-25 2015-04-30 The Mitre Corporation Process and mechanism for identifying large scale misuse of social media networks
DE102014219407A1 (en) * 2014-09-25 2016-03-31 Volkswagen Aktiengesellschaft Diagnostic procedures and survey methods for vehicles
KR20160120604A (en) * 2015-04-08 2016-10-18 김근제 Apparatus for providing code using light source device or color information and code identification system
JP6557592B2 (en) * 2015-12-15 2019-08-07 日本放送協会 Video scene division apparatus and video scene division program
US20170235726A1 (en) * 2016-02-12 2017-08-17 Fujitsu Limited Information identification and extraction
JP2018037076A (en) * 2016-08-25 2018-03-08 株式会社ピープルコミュニケーションズ SNS portal system
US20180129929A1 (en) * 2016-11-09 2018-05-10 Fuji Xerox Co., Ltd. Method and system for inferring user visit behavior of a user based on social media content posted online
US10866633B2 (en) * 2017-02-28 2020-12-15 Microsoft Technology Licensing, Llc Signing with your eyes
CN107609461A (en) * 2017-07-19 2018-01-19 阿里巴巴集团控股有限公司 The training method of model, the determination method, apparatus of data similarity and equipment

Also Published As

Publication number Publication date
WO2019187107A1 (en) 2019-10-03
JPWO2019187107A1 (en) 2021-02-25
US20210019553A1 (en) 2021-01-21
JP7070665B2 (en) 2022-05-18
US20230410220A1 (en) 2023-12-21
US20230410222A1 (en) 2023-12-21

Similar Documents

Publication Publication Date Title
US11610394B2 (en) Neural network model training method and apparatus, living body detecting method and apparatus, device and storage medium
US20230410221A1 (en) Information processing apparatus, control method, and program
CN107742100B (en) A kind of examinee&#39;s auth method and terminal device
WO2019200781A1 (en) Receipt recognition method and device, and storage medium
CN112348117B (en) Scene recognition method, device, computer equipment and storage medium
CN109800320B (en) Image processing method, device and computer readable storage medium
WO2019033525A1 (en) Au feature recognition method, device and storage medium
US9613296B1 (en) Selecting a set of exemplar images for use in an automated image object recognition system
WO2019062081A1 (en) Salesman profile formation method, electronic device and computer readable storage medium
CN106874253A (en) Recognize the method and device of sensitive information
US10997609B1 (en) Biometric based user identity verification
US20180005022A1 (en) Method and device for obtaining similar face images and face image information
US20200218772A1 (en) Method and apparatus for dynamically identifying a user of an account for posting images
WO2022142903A1 (en) Identity recognition method and apparatus, electronic device, and related product
CN107809370B (en) User recommendation method and device
CN112241667A (en) Image detection method, device, equipment and storage medium
CN113704623A (en) Data recommendation method, device, equipment and storage medium
CN111738199B (en) Image information verification method, device, computing device and medium
CN110688878A (en) Living body identification detection method, living body identification detection device, living body identification detection medium, and electronic device
US9317887B2 (en) Similarity calculating method and apparatus
CN115223022A (en) Image processing method, device, storage medium and equipment
CN107656959B (en) Message leaving method and device and message leaving equipment
CN115618415A (en) Sensitive data identification method and device, electronic equipment and storage medium
CN112041847A (en) Providing images with privacy tags
CN111192150B (en) Method, device, equipment and storage medium for processing vehicle danger-giving agent service

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED