US20230410221A1 - Information processing apparatus, control method, and program - Google Patents
Information processing apparatus, control method, and program Download PDFInfo
- Publication number
- US20230410221A1 US20230410221A1 US18/240,160 US202318240160A US2023410221A1 US 20230410221 A1 US20230410221 A1 US 20230410221A1 US 202318240160 A US202318240160 A US 202318240160A US 2023410221 A1 US2023410221 A1 US 2023410221A1
- Authority
- US
- United States
- Prior art keywords
- relevant
- account
- similar
- content data
- content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 63
- 238000000034 method Methods 0.000 title claims description 57
- 238000009826 distribution Methods 0.000 claims description 27
- 238000010586 diagram Methods 0.000 description 18
- 238000003860 storage Methods 0.000 description 11
- 238000012544 monitoring process Methods 0.000 description 7
- 238000000605 extraction Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- NRNCYVBFPDDJNE-UHFFFAOYSA-N pemoline Chemical compound O1C(N)=NC(=O)C1C1=CC=CC=C1 NRNCYVBFPDDJNE-UHFFFAOYSA-N 0.000 description 2
- 101001109518 Homo sapiens N-acetylneuraminate lyase Proteins 0.000 description 1
- 102100022686 N-acetylneuraminate lyase Human genes 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000037237 body shape Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/758—Involving statistics of pixels or of feature values, e.g. histogram matching
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
Definitions
- the present invention relates to a user account.
- Some services such as a social networking service (SNS) provide an environment in which a user can take various types of actions by using a user account. For example, a picture, a moving image, or a text message can be uploaded in association with a user account.
- SNS social networking service
- NPL 1 discloses a technique for determining whether a plurality of user accounts are owned by the same person, based on a similarity degree among user names of a plurality of the user accounts.
- a user name registered in a user account can be any name unrelated to a real name of a user.
- a person who creates a plurality of user accounts can set user names registered in the user accounts to be not similar to each other. Then, in the technique in NPL 1, it is difficult to determine that a plurality of user accounts in which user names being not similar to each other in such a manner are registered are owned by the same person.
- the invention of the present application has been made in view of the above-described problem, and is to provide a technique capable of accurately detecting whether user accounts being compared are owned by the same person even when user names of the user accounts are not similar to each other.
- An information processing apparatus includes 1) a determination unit that determines, for a first relevant account associated with a first target account and a second relevant account associated with a second target account, whether first content data associated with the first relevant account and second content data associated with the second relevant account are similar, and 2) a processing execution unit that executes predetermined processing when it is determined that the first content data and the second content data are similar.
- a control method is executed by a computer.
- the control method includes 1) a determination step of determining, for a first relevant account associated with a first target account and a second relevant account associated with a second target account, whether first content data associated with the first relevant account and second content data associated with the second relevant account are similar, and 2) a processing execution step of executing predetermined processing when it is determined that the first content data and the second content data are similar.
- a program according to the present invention causes a computer to execute each step included in the control method according to the present invention.
- the present invention provides a technique capable of accurately detecting whether user accounts being compared are owned by the same person even when user names of the user accounts are not similar to each other.
- FIG. 1 is a diagram schematically illustrating processing executed by an information processing apparatus according to a present example embodiment.
- FIG. 2 is a diagram illustrating a functional configuration of an information processing apparatus according to an example embodiment 1.
- FIG. 3 is a diagram illustrating a computer for achieving the information processing apparatus.
- FIG. 4 is a flowchart illustrating a flow of processing executed by the information processing apparatus according to the example embodiment 1.
- FIG. 5 is a diagram illustrating a histogram generated for a relevant account.
- FIG. 6 is a diagram illustrating a histogram of a topic.
- FIG. 7 is a diagram illustrating a histogram of a frequency of appearance of a keyword.
- FIG. 8 is a diagram illustrating a histogram of a frequency of appearance of a speaker.
- FIG. 9 is a diagram illustrating a notification displayed on a display apparatus.
- each block diagram represents a configuration of a functional unit instead of a configuration of a hardware unit unless otherwise described.
- FIG. 1 is a diagram schematically illustrating processing executed by an information processing apparatus 2000 according to the present example embodiment.
- the information processing apparatus 2000 infers whether owners having user accounts different from each other are the same person.
- user information being information related to a user himself/herself and information (hereinafter, a content) such as image data and text data being registered in association with a user account are associated with the account.
- the user information is, for example, a name, an address, a phone number, an e-mail address, or the like.
- the information processing apparatus 2000 infers user accounts different from each other being owned by the same person by using a content associated with another user account associated with a user account.
- an account to be determined whether to be owned by the same person is expressed as a target account, and another account associated with the target account is referred to as a relevant account.
- a relevant account For example, in the SNS, a function of associating user accounts with each other as friends is often provided.
- an account associated as a friend of a target account is used as a relevant account. Note that which account is handled as a target account will be described below.
- the information processing apparatus 2000 determines, for two target accounts that are a target account 10 - 1 and a target account 10 - 2 , whether the target accounts are accounts owned by the same person.
- a target account 10 - 1 a plurality of relevant accounts 20 are present.
- the relevant account 20 associated with the target account 10 - 1 is expressed as a relevant account 20 - 1 .
- a content associated with the relevant account 20 - 1 is expressed as a content 30 - 1 .
- the content 30 - 1 is image data uploaded in association with the relevant account 20 - 1 , and the like.
- a relevant account of the target account 10 - 2 is expressed as a relevant account 20 - 2
- a content associated with the relevant account 20 - 2 is expressed as a content 30 - 2
- the “content 30 associated with the relevant account 20 ” is also simply expressed as the “content 30 of the relevant account 20 ”.
- the information processing apparatus 2000 determines whether the content 30 - 1 of the relevant account 20 - 1 and the content 30 - 2 of the relevant account 20 - 2 are similar. When the content 30 - 1 and the content 30 - 2 are similar, the target account 10 - 1 and the target account 10 - 2 can be inferred to belong to the same person. Thus, when the content 30 - 1 and the content 30 - 2 are similar, the information processing apparatus 2000 executes predetermined processing related to the target account 10 - 1 and the target account 10 - 2 . For example, the information processing apparatus 2000 outputs, as the predetermined processing, a notification indicating that the target account 10 - 1 and the target account 10 - 2 belong to the same person.
- the information processing apparatus 2000 determines a similarity degree between the content 30 - 1 of the relevant account 20 - 1 associated with the target account 10 - 1 and the content 30 - 2 of the relevant account 20 - 2 associated with the target account 10 - 2 .
- the similarity degree is high, the target account 10 - 1 and the target account 10 - 2 can be inferred to be owned by the same person. The reason will be described below.
- the relevant account 20 - 1 associated with the target account 10 - 1 conceivably belongs to a person who has some sort of connection with an owner of the target account 10 - 1 , such as a friend of the owner of the target account 10 - 1 , for example.
- an owner of the target account 10 - 1 such as a friend of the owner of the target account 10 - 1 , for example.
- a content including some sort of information related to the target account 10 - 1 is present among the contents 30 - 1 uploaded and the like in association with the relevant accounts 20 - 1 by owners of the relevant accounts 20 - 1 .
- a picture and a moving image uploaded by the relevant account 20 - 1 include the owner of the target account 10 - 1 , property (such as a vehicle) of the owner of the target account 10 - 1 , a landmark representing a place where the target account 10 - 1 has visited, and the like.
- property such as a vehicle
- text data and voice data uploaded by the relevant account 20 - 1 also include some sort of information related to the target account 10 - 1 .
- the information processing apparatus 2000 infers that there is a high probability that the owner of the target account 10 - 1 and an owner of the target account 10 - 2 are the same person. In this way, even when it is not clear whether the target account 10 - 1 and the target account 10 - 2 are owned by the same person just by comparing the user information of the target account 10 - 1 with the user information of the target account 10 - 2 , whether the target account 10 - 1 and the target account 10 - 2 are accounts owned by the same person can be inferred.
- FIG. 2 is a diagram illustrating a functional configuration of the information processing apparatus 2000 according to the example embodiment 1.
- the information processing apparatus 2000 includes a determination unit 2020 and a processing execution unit 2040 .
- the determination unit 2020 determines whether the content 30 - 1 of the relevant account 20 - 1 associated with the target account 10 - 1 and the content 30 - 2 of the relevant account 20 - 2 associated with the target account 10 - 2 are similar.
- the processing execution unit 2040 executes predetermined processing related to the target account 10 - 1 and the target account 10 - 2 .
- Each functional component unit of the information processing apparatus 2000 may be achieved by hardware (for example, a hard-wired electronic circuit and the like) that achieves each functional component unit, and may be achieved by a combination (for example, a combination of an electronic circuit and a program that controls the electronic circuit, and the like) of hardware and software.
- hardware for example, a hard-wired electronic circuit and the like
- a combination for example, a combination of an electronic circuit and a program that controls the electronic circuit, and the like
- FIG. 3 is a diagram illustrating a computer 1000 for achieving the information processing apparatus 2000 .
- the computer 1000 is any computer.
- the computer 1000 is a personal computer (PC), a server machine, or the like.
- the computer 1000 may be a dedicated computer designed for achieving the information processing apparatus 2000 , and may be a general-purpose computer.
- the computer 1000 includes a bus 1020 , a processor 1040 , a memory 1060 , a storage device 1080 , an input/output interface 1100 , and a network interface 1120 .
- the bus 1020 is a data transmission path for allowing the processor 1040 , the memory 1060 , the storage device 1080 , the input/output interface 1100 , and the network interface 1120 to transmit and receive data with one another.
- a method of connecting the processor 1040 and the like to each other is not limited to a bus connection.
- the processor 1040 is various types of processors such as a central processing unit (CPU), a graphics processing unit (GPU), and a field-programmable gate array (FPGA).
- the memory 1060 is a main storage achieved by using a random access memory (RAM) and the like.
- the storage device 1080 is an auxiliary storages achieved by using a hard disk, a solid state drive (SSD), a memory card, a read only memory (ROM), or the like.
- the input/output interface 1100 is an interface for connecting the computer 1000 and an input/output device.
- an input apparatus such as a keyboard and an output apparatus such as a display apparatus are connected to the input/output interface 1100 .
- the network interface 1120 is an interface for connecting the computer 1000 to a communication network.
- the communication network is, for example, a local area network (LAN) and a wide area network (WAN).
- a method of connection to the communication network by the network interface 1120 may be a wireless connection or a wired connection.
- the storage device 1080 stores a program module that achieves each functional component unit of the information processing apparatus 2000 .
- the processor 1040 achieves a function associated with each program module by reading each of the program modules to the memory 1060 and executing the read program module.
- FIG. 4 is a flowchart illustrating a flow of processing executed by the information processing apparatus 2000 according to the example embodiment 1.
- the determination unit 2020 acquires the content 30 - 1 of each of the relevant accounts 20 - 1 associated with the target account 10 - 1 (S 102 ).
- the determination unit 2020 acquires the content 30 - 2 of each of the relevant accounts 20 - 2 associated with the target account 10 - 2 (S 104 ).
- the determination unit 2020 determines whether the content 30 - 1 and the content 30 - 2 are similar (S 106 ). When the content 30 - 1 and the content 30 - 2 are similar (S 106 : YES), the processing execution unit 2040 executes predetermined processing (S 108 ). On the other hand, when the content 30 - 1 and the content 30 - 2 are not similar (S 106 : NO), the processing in FIG. 4 ends.
- the target account 10 and the relevant account 20 are user accounts created by a user in a service such as the SNS, for example.
- a user account is created by registering user information such as a name, and is continuously used.
- a user account handled by the information processing apparatus 2000 is not limited to a user account created by registering user information in such a manner. For example, on a bulletin board and the like on a Web page, when a user posts (uploads text data, and the like) a content, an identifier is assigned to the post. The information processing apparatus 2000 may handle the identifier as a user account. In this case, for example, when a certain user posts a content on a bulletin board site and another user comments on the post, one of the former and the latter can be handled as the target account 10 and the other can be handled as the relevant account 20 .
- the information processing apparatus 2000 infers, for two accounts of the target account 10 - 1 and the target account 10 - 2 , whether the accounts belong to the same person.
- the target account 10 - 1 and the target account 10 - 2 may be user accounts for using the same service (for example, the SNS), or may be user accounts for using services different from each other.
- the information processing apparatus 2000 receives a specification of a user account handled as the target account 10 from a user of the information processing apparatus 2000 .
- the user account specified by a user may be two, or may be three or more.
- the information processing apparatus 2000 executes, for each combination (n ⁇ 2 combination) of any two user accounts creatable for the specified user accounts, processing handling two user accounts included in the combination as the target accounts 10 .
- processing handling A and B as the target accounts 10
- processing handling A and C as the target accounts 10
- processing handling B and C as the target accounts 10
- processing handling B and C as the target accounts 10 are each executed.
- the information processing apparatus 2000 receives, from a user, an input that specifies one user account handled as the target account 10 .
- the information processing apparatus 2000 handles the user account specified by a user as the target account 10 - 1 .
- the information processing apparatus 2000 handles, as the target account 10 - 2 , another user account having user information similar to user information of the target account 10 - 1 .
- the similarity between pieces of user information herein refers to, for example, a part of various pieces of information (a part of a user ID, a part of a name, a part of a birth date, a part of an e-mail address, or the like) being common.
- the information processing apparatus 2000 handles each of the plurality of user accounts as the target account 10 - 2 .
- the information processing apparatus 2000 may operate in cooperation with a monitoring system for monitoring a user account, and receive a specification of a user account from the monitoring system. For example, the monitoring system monitors a usage aspect (such as a content of an uploaded content and a frequency of uploading) of a user account, and determines a user account whose usage aspect violates common sense, a user policy of a service, law, or the like (that is, determines a user account to beware of). The monitoring system notifies the determined user account to the information processing apparatus 2000 . The information processing apparatus 2000 executes, for each combination of any two user accounts creatable for the plurality of user accounts notified from the monitoring system, processing handling the two user accounts included in the combination as the target accounts 10 . Note that, when the monitoring system notifies user accounts one by one, the information processing apparatus 2000 executes the above-described processing on a plurality of user accounts indicated by a plurality of notifications received during a predetermined period of time, for example.
- a usage aspect such as a content of an uploaded
- the relevant account 20 is another account associated with the target account 10 , and is an account in a friendship with the target account 10 in the SNS, for example.
- the determination unit 2020 may acquire the content 30 for all of the relevant accounts 20 , and may acquire the content 30 for some of the relevant accounts 20 .
- the determination unit 2020 arbitrarily (for example, randomly) selects a predetermined number of the relevant accounts 20 from the plurality of relevant accounts 20 , for example.
- the determination unit 2020 acquires the content 30 - 1 associated with the relevant account 20 - 1 and the content 30 - 2 associated with the relevant account 20 - 2 (S 102 and S 104 ). For example, the determination unit 2020 automatically collects, for each of the relevant accounts 20 , each of the contents 30 from Web pages on which the contents 30 of the relevant accounts 20 are opened, by successively accessing the Web pages.
- an application programming interface (API) for acquiring a content associated with a user account may be provided in a service such as the SNS.
- the determination unit 2020 may acquire the content 30 of the relevant account 20 by using the API provided in a service used by the relevant account 20 .
- the determination unit 2020 may acquire all of the contents 30 associated with the relevant account 20 , and may acquire only the content 30 of a predetermined type. For example, when a target of a similarity determination is only image data, the determination unit 2020 acquires image data associated with the relevant account 20 as the content 30 .
- the determination unit 2020 compares content data of the relevant account 20 - 1 with content data of the relevant account 20 - 2 , and infers that, when a similarity degree between the pieces of the content data is high, the target account 10 - 1 and the target account 10 - 2 are owned by the same person.
- the processing may adopt various variations in points that 1) what kind of content data is to be compared and 2) what kind of comparison is performed. Hereinafter, a comparison between pieces of content data will be described while focusing on the two points.
- Image data are conceivable as a type of the content data to be compared.
- image data of a picture of a person, a building, scenery, or the like are uploaded by using a user account.
- the determination unit 2020 handles image data uploaded by using a user account in such a manner as a content associated with the user account.
- a user may make a post that refers to (links) a Web page including image data, and make a post that refers to image data uploaded by another user.
- the determination unit 2020 may also handle image data referred by a user in such a manner as content data associated with an account of the user. Note that a moving image frame constituting moving image data is also included in image data.
- image data has an advantage that similarity between the content 30 - 1 and the content 30 - 2 is easily determined even when a language used in the relevant account 20 - 1 is different from a language used in the relevant account 20 - 2 .
- image data has an advantage that similarity between the content 30 - 1 and the content 30 - 2 is easily determined even when a language used in the relevant account 20 - 1 is different from a language used in the relevant account 20 - 2 .
- the determination unit 2020 focuses on a similarity degree between an object detected from image data associated with the relevant account 20 - 1 and an object detected from image data associated with the relevant account 20 - 2 .
- the determination unit 2020 calculates the similarity degree between the object detected from the image data associated with the relevant account 20 - 1 and the object detected from the image data associated with the relevant account 20 - 2 . Then, when the number of groups (namely, groups of objects inferred to be the same) of objects having a similarity degree equal to or more than a predetermined value is equal to or more than a predetermined number, the determination unit 2020 determines that the similarity degree between the content data of the relevant account 20 - 1 and the content data of the relevant account 20 - 2 is high.
- the determination unit 2020 determines that the similarity degree between the content data of the relevant account 20 - 1 and the content data of the relevant account 20 - 2 is not high.
- the predetermined number described above is previously stored in a storage apparatus that can be accessed from the determination unit 2020 .
- an object detected from image data 32 may be an object of any kind, and may be an object of a specific kind. In a case of the latter, for example, only a person among objects included in the image data 32 is to be detected.
- an existing technique can be used as a technique for detecting an object from image data and a technique for determining a similarity degree between detected objects.
- the determination unit 2020 generates, for each of the relevant account 20 - 1 and the relevant account 20 - 2 , a histogram representing a distribution of a frequency of appearance of an object in image data associated thereto, and determines a similarity degree between the histograms.
- FIG. 5 is a diagram illustrating a histogram generated for the relevant account 20 .
- a plurality of pieces of image data 32 are associated with the relevant account 20 .
- a histogram 40 is a distribution of a frequency of appearance of an object detected from the image data 32 .
- the image data 32 associated with the relevant account 20 - 1 are expressed as image data 32 - 1
- the histogram 40 generated for the image data 32 - 1 is expressed as a histogram 40 - 1
- the image data 32 associated with the relevant account 20 - 2 are expressed as image data 32 - 2
- the histogram 40 generated for the image data 32 - 2 is expressed as a histogram 40 - 2 .
- the determination unit 2020 determines a similarity degree between the histogram 40 - 1 and the histogram 40 - 2 .
- the determination unit 2020 calculates the similarity degree between the histogram 40 - 1 and the histogram 40 - 2 , and, when the calculated similarity degree is equal to or more than a predetermined value, the determination unit 2020 determines that a similarity degree between the content 30 - 1 and the content 30 - 2 is high.
- the similarity degree between the histogram 40 - 1 and the histogram 40 - 2 is less than the predetermined value, the determination unit 2020 determines that the similarity degree between the content 30 - 1 and the content 30 - 2 is not high.
- an existing technique can be used as a technique for calculating a similarity degree between two histograms.
- the predetermined value described above is stored in a storage apparatus that can be accessed from the determination unit 2020 .
- the histogram 40 - 1 and the histogram 40 - 2 are generated as follows, for example.
- the determination unit 2020 recognizes an object included in each piece of the image data 32 - 1 by performing object recognition processing on each piece of the image data 32 - 1 as a target. Furthermore, the determination unit 2020 generates the histogram 40 - 1 representing a distribution of a frequency of appearance of an object by counting the number of appearances of each object.
- the determination unit 2020 assigns an identifier to each object detected from the image data 32 - 1 .
- the determination unit 2020 makes each object identifiable by assigning the same identifier to the same object, and can thus count the number of appearances of the object.
- a determination (identification of an object) of whether each object detected from the image data 32 is the same is needed.
- the determination unit 2020 assigns an identifier to an object detected from the image data 32 , and the object is the same as another object being already detected
- the determination unit 2020 assigns the same identifier as an identifier assigned to the object being already detected.
- the determination unit 2020 assigns a new identifier that is not assigned to any object.
- the determination unit 2020 generates the histogram 40 - 2 by also performing similar processing on the image data 32 - 2 . At this time, for an object detected from the image data 32 - 2 , not only identification with an object detected from the other piece of image data 32 - 2 but also identification with an object detected from the image data 32 - 1 are performed. In other words, when the same object as an object detected from the image data 32 - 2 is already detected from the image data 32 - 1 , the determination unit 2020 also assigns, to the object detected from the image data 32 - 2 , an identifier assigned to the object being already detected. Various types of existing techniques can be used for identification of an object.
- a comparison between the histogram 40 - 1 and the histogram 40 - 2 may be performed by using only a part of the histogram 40 - 1 and a part of the histogram 40 - 2 .
- the determination unit 2020 calculates a similarity degree between the histogram 40 - 1 and the histogram 40 - 2 by comparing a frequency of appearance of objects in top N places (N is a natural number of two or more) in the histogram 40 - 1 with a frequency of appearance of objects in top N places in the histogram 40 - 2 .
- a comparison related to image data may be achieved by a comparison between topics of the image data instead of a comparison between objects detected from the image data.
- a topic in a certain piece of data refers to a main matter or event expressed by the data.
- a topic such as work, food, sports, traveling, games, or politics is conceivable.
- the determination unit 2020 classifies each piece of the image data 32 associated with the relevant account 20 by topic.
- an existing technique can be used as a technique for classifying image data by topic.
- the determination unit 2020 generates a histogram of a frequency of appearance of a topic for each of the image data 32 - 1 and the image data 32 - 2 .
- FIG. 6 is a diagram illustrating a histogram of a topic.
- the determination unit 2020 determines that the similarity degree between the content 30 - 1 and the content 30 - 2 is not high.
- the determination unit 2020 may perform a comparison similar to the above-described comparison related to the image data 32 on text data associated with the relevant account 20 .
- text data representing information such as a thought of a user and a recent state of a user are uploaded in association with a user account.
- the determination unit 2020 handles, for example, text data uploaded by a user in such a manner as the content 30 .
- a user may also make a post that refers to a Web page, a post that refers to text data uploaded by another user, a post of a comment on a content of another user, and the like.
- the determination unit 2020 may also handle, as content data associated with an account of the user, the text data included in the Web page referred by the user in such a manner, the text data uploaded by the another user, and the text data representing the comment on the content of the another user.
- content data associated with an account of the user the text data included in the Web page referred by the user in such a manner, the text data uploaded by the another user, and the text data representing the comment on the content of the another user.
- the determination unit 2020 performs extraction of a keyword from text data associated with the relevant account 20 - 1 and text data associated with the relevant account 20 - 2 . For example, when the number of keywords that appear commonly to both pieces of the text data is equal to or more than a predetermined number, the determination unit 2020 determines that a similarity degree between the content 30 - 1 and the content 30 - 2 is high. On the other hand, when the number of keywords that appear commonly to both pieces of the text data is less than the predetermined number, the determination unit 2020 determines that the similarity degree between the content 30 - 1 and the content 30 - 2 is not high.
- a keyword extracted from text data may be any word, and may be a specific word. In a case of the latter, for example, a list of words to be adopted as a keyword is previously prepared, and only a word included in the list is extracted as a keyword. Note that an existing technique can be used as a technique for extracting a keyword from text data.
- the determination unit 2020 may perform, on a keyword extracted from text data associated with the relevant account 20 , a comparison similar to the comparison related to a histogram of a frequency of appearance of an object detected from image data associated with the relevant account 20 . Specifically, the determination unit 2020 generates, for each of the relevant account 20 - 1 and the relevant account 20 - 2 , a histogram representing a distribution of a frequency of appearance of a keyword in associated text data, and determines a similarity degree between the histograms.
- FIG. 7 is a diagram illustrating a histogram of a frequency of appearance of a keyword.
- a histogram 50 is generated for text data 34 associated with the relevant account 20 .
- the text data 34 associated with the relevant account 20 - 1 is expressed as text data 34 - 1
- the histogram 50 generated from the text data 34 - 1 is expressed as a histogram 50 - 1 .
- the text data 34 associated with the relevant account 20 - 2 is expressed as text data 34 - 2
- the histogram 50 generated from the text data 34 - 2 is expressed as a histogram 50 - 2 .
- the determination unit 2020 calculates a similarity degree between the histogram 50 - 1 and the histogram 50 - 2 , and, when the similarity degree is equal to or more than a predetermined value, the determination unit 2020 determines that a similarity degree between the content 30 - 1 and the content 30 - 2 is high. On the other hand, when the similarity degree between the histogram 50 - 1 and the histogram 50 - 2 is less than the predetermined value, the determination unit 2020 determines that the similarity degree between the content 30 - 1 and the content 30 - 2 is not high.
- the predetermined value described above is previously stored in a storage apparatus that can be accessed from the determination unit 2020 .
- a comparison between the histogram 50 - 1 and the histogram 50 - 2 may be performed by using only a part (for example, up to the top N place) of the histogram similarly to the comparison between the histogram 40 - 1 and the histogram 40 - 2 .
- the determination unit 2020 may determine a similarity degree between the content 30 - 1 and the content 30 - 2 by a comparison between frequencies of appearance of a topic extracted from the pieces of the text data 34 .
- a method of comparing frequencies of appearance of a topic extracted from the pieces of the text data 34 is similar to the above-described comparison between frequencies of appearance of a topic extracted from pieces of image data. Note that an existing technique can be used as a technique for extracting a topic from text data.
- the determination unit 2020 may handle voice data associated with the relevant account as the content 30 .
- the voice data herein include not only data generated by voice alone, but also data about voice included in moving image data. Hereinafter, comparison methods related to voice data are illustrated.
- the determination unit 2020 extracts a keyword from each piece of voice data associated with the relevant account 20 - 1 and voice data associated with the relevant account 20 - 2 . Then, the determination unit 2020 determines a similarity degree between the content 30 - 1 and the content 30 - 2 by handling the keywords extracted from the pieces of the voice data similarly to the keywords extracted from the pieces of the text data described above. In other words, the determination unit 2020 determines a similarity degree between the content 30 - 1 and the content 30 - 2 by comparing the numbers of common keywords and histograms representing a frequency of appearance of a keyword.
- the determination unit 2020 determines a similarity degree between the content 30 - 1 and the content 30 - 2 by comparing a frequency of appearance of a topic extracted from voice data associated with the relevant account 20 - 1 and a frequency of appearance of a topic extracted from voice data associated with the relevant account 20 - 2 .
- a method of comparing frequencies of appearance of a topic is similar to the above-described comparison between frequencies of appearance of a topic extracted from image data. Note that an existing technique can be used as a technique for extracting a topic from voice data.
- the determination unit 2020 performs extraction of a speaker from each piece of voice data associated with the relevant account 20 - 1 and voice data associated with the relevant account 20 - 2 .
- An existing technique such as voice print identification, for example, can be used as a technique for performing extraction of a speaker from voice data.
- voice print identification for example, there is a technique for identifying a speaker by generating sound spectrogram data representing a voice print from voice data, and using the sound spectrogram data as identification information.
- the determination unit 2020 generates, for each of the relevant account 20 - 1 and the relevant account 20 - 2 , a histogram of a frequency of appearance of a speaker extracted from associated voice data.
- FIG. 8 is a diagram illustrating a histogram of a frequency of appearance of a speaker.
- a histogram 60 of a frequency of appearance of a speaker is generated for voice data 36 associated with the relevant account 20 .
- the voice data 36 associated with the relevant account 20 - 1 is expressed as voice data 36 - 1
- the histogram generated from the voice data 36 - 1 is expressed as a histogram 60 - 1
- the voice data 36 associated with the relevant account 20 - 2 is expressed as voice data 36 - 2
- the histogram 60 generated from the voice data 36 - 2 is expressed as a histogram 60 - 2 .
- the determination unit 2020 calculates a similarity degree between the histogram 60 - 1 and the histogram 60 - 2 , and, when the similarity degree is equal to or more than a predetermined value, the determination unit 2020 determines that a similarity degree between the content 30 - 1 and the content 30 - 2 is high. On the other hand, when the similarity degree between the histogram 60 - 1 and the histogram 60 - 2 is less than the predetermined value, the determination unit 2020 determines that the similarity degree between the content 30 - 1 and the content 30 - 2 is not high.
- the predetermined value described above is previously stored in a storage apparatus that can be accessed from the determination unit 2020 .
- a comparison between the histogram 60 - 1 and the histogram 60 - 2 may be performed by using only a part (for example, up to the top N place) of the histogram similarly to the comparison of the histogram 40 and the comparison of the histogram 50 .
- a comparison based on a speaker extracted from the voice data 36 is not limited to a comparison between histograms.
- the determination unit 2020 may use a comparison method similar to the method described in “Comparison Method 1 Related to Text Data”. In other words, when the number of speakers who appear commonly in the voice data 36 associated with the relevant account 20 - 1 and the voice data 36 associated with the relevant account 20 - 2 is equal to or more than a predetermined number, the determination unit 2020 determines that a similarity degree between the content 30 - 1 and the content 30 - 2 is high. On the other hand, when the number of speakers who appear commonly to both pieces of the voice data 36 is less than the predetermined number, the determination unit 2020 determines that the similarity degree between the content 30 - 1 and the content 30 - 2 is not high.
- the processing execution unit 2040 executes predetermined processing on the target account 10 - 1 and the target account 10 - 2 .
- a variation of the processing executed by the processing execution unit 2040 is illustrated.
- the processing execution unit 2040 when it is determined that the similarity degree between the content 30 - 1 and the content 30 - 2 is high, the processing execution unit 2040 outputs information representing that there is a high probability that the target account 10 - 1 and the target account 10 - 2 are owned by the same person.
- the information is output, and thus a user of the information processing apparatus 2000 who acquires the information can easily realize a group of the target accounts 10 having a high probability of being owned by the same person.
- the processing execution unit 2040 causes a display apparatus connected to the information processing apparatus 2000 to display a notification representing that there is a high probability that the target account 10 - 1 and the target account 10 - 2 are owned by the same person.
- FIG. 9 is a diagram illustrating a notification displayed on the display apparatus.
- the processing execution unit 2040 may transmit the notification described above to another computer communicably connected to the information processing apparatus 2000 , or store the notification described above in a storage apparatus communicably connected to the information processing apparatus 2000 .
- the information processing apparatus 2000 performs a determination by the determination unit 2020 on a plurality of combinations of the target account and the target account 10 - 2 .
- a plurality of combinations of the target accounts having a high probability of being owned by the same person may be found.
- the processing execution unit 2040 may generate a list indicating one or more combinations of the target accounts 10 having a high probability of being owned by the same person, and output the list by various methods described above. By outputting such a list, a user of the information processing apparatus 2000 can easily realize the plurality of groups of the target accounts 10 having a high probability of being owned by the same person.
- the processing execution unit 2040 outputs information related to the content 30 - 1 and the content 30 - 2 .
- the information is referred to as similar content information.
- a user of the information processing apparatus 2000 can acquire, for the target account 10 - 1 and the target account 10 - 2 inferred to have a high probability of being owned by the same person, information as grounds for the inference.
- a variation of the similar content information is illustrated.
- the processing execution unit 2040 includes, in the similar content information, the histogram 40 (see FIG. 5 ) representing a frequency of appearance of an object being generated for the image data 32 .
- an image of each object indicated in the histogram 40 may be included together with the histogram 40 in the similar content information.
- the processing execution unit 2040 includes, in the similar content information, a combination of images of objects determined to be similar to each other among objects extracted from the image data 32 - 1 and objects extracted from the image data 32 - 2 . Note that, when an image of an object is included in the similar content information, the entire image data 32 in which the object is included may be included in the similar content information.
- the processing execution unit 2040 may execute analysis processing on an image of an object to be included in the similar content information, and include a result of the analysis processing in the similar content information. For example, when there is an image of a person among object images to be included in the similar content information, the processing execution unit 2040 may infer an attribute (age, height, body shape, and gender) of the person of the image, and include a result of the inference in the similar content information, or may calculate a feature of an accessory object (such as glasses, clothing, and baggage) of the person of the image, and include information related to the feature in the similar content information.
- an attribute age, height, body shape, and gender
- an accessory object such as glasses, clothing, and baggage
- the processing execution unit 2040 may extract an image of a part (such as a face, a mole, a tattoo, a nail, or a fingerprint) representing a feature of a person from the image of the person, and include the image of the part in the similar content information.
- a part such as a face, a mole, a tattoo, a nail, or a fingerprint
- the processing execution unit 2040 determines a maker of the vehicle, a type of the vehicle, a number of a number plate, and the like, and includes the determined information in the similar content information.
- the processing execution unit 2040 when there is an image of a landmark (such as a building, a marking, a mountain, a river, and the sea) usable for identifying a capturing place (a place where the image data 32 is generated) among object images to be included in the similar content information, the processing execution unit 2040 includes a name of the landmark in the similar content information. Further, the processing execution unit 2040 may identify a location of the landmark, and include information (an address or global positioning system (GPS) coordinates) representing the location in the similar content information. Note that a location of a landmark can be identified by using map information and the like, for example.
- GPS global positioning system
- the processing execution unit 2040 includes, in the similar content information, the histogram (see FIG. 7 ) generated for a keyword. At this time, each keyword indicated in the histogram may be included in the similar content information. In addition, for example, the processing execution unit 2040 includes, in the similar content information, a keyword determined to coincide among keywords extracted from the content 30 - 1 and keywords extracted from the content 30 - 2 .
- the processing execution unit 2040 may include, in the similar content information, not only a keyword determined to coincide, but also a sentence and the entire text data in which the keyword is included. Further, when a keyword is extracted from voice data, the processing execution unit 2040 may include, in the similar content information, not only a keyword determined to coincide, but also voice data of a statement in which the keyword is included and the entire voice data from which the keyword is extracted.
- the determination unit 2020 performs extraction of a speaker from voice data.
- the determination unit 2020 includes, in the similar content information, the histogram 60 (see FIG. 8 ) representing a frequency of appearance of a speaker.
- sound spectrogram data of each speaker indicated in the histogram may be included in the similar content information.
- the determination unit 2020 includes, in the similar content information, sound spectrogram data of a speaker determined to coincide among speakers extracted from the voice data 36 - 1 and speakers extracted from the voice data 36 - 2 .
- the processing execution unit 2040 includes, in the similar content information, the histogram (see FIG. 6 ) representing a frequency of appearance of a topic extracted from the content 30 .
- the processing execution unit 2040 includes, in the similar content information, information (such as a name of a topic) representing a topic determined to coincide among topics extracted from the content 30 - 1 and topics extracted from the content 30 - 2 .
- the information processing apparatus 2000 may infer that an “owner of the target account 10 - 1 and an owner of the target account 10 - 2 belong to the same group” instead of inferring that “the target account and the target account 10 - 2 are owned by the same person”.
- the processing execution unit 2040 outputs “information representing that there is a high probability that the owner of the target account 10 - 1 and the owner of the target account 10 - 2 belong to the same group” instead of “information representing that there is a high probability that the target account and the target account 10 - 2 are owned by the same person”.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
An information processing apparatus (2000) determines whether a content (30-1) of a relevant account (20-1) associated with a target account (10-1) and a content (30-2) of a relevant account (20-2) associated with a target account (10-2) are similar. When the content (30-1) and the content (30-2) are similar, the information processing apparatus (2000) executes predetermined processing related to the target account (10-1) and the target account (10-2).
Description
- This application is a Continuation of U.S. application Ser. No. 17/043,291 filed on Sep. 29, 2020, which is a National Stage of International Application No. PCT/JP2018/013880 filed on Mar. 30, 2018.
- The present invention relates to a user account.
- Some services, such as a social networking service (SNS), provide an environment in which a user can take various types of actions by using a user account. For example, a picture, a moving image, or a text message can be uploaded in association with a user account.
- Herein, the same person may own a plurality of accounts. With regard to this point, NPL 1 discloses a technique for determining whether a plurality of user accounts are owned by the same person, based on a similarity degree among user names of a plurality of the user accounts.
-
- [NPL1] Y Li, Y Peng, W. Ji, Z. Zhang, and Q. Xu, “User Identification Based on Display Names Across Online Social Networks”, IEEE Access, vol. 5, pp. 17342 to 17353, Aug. 25, 2017
- In general, a user name registered in a user account can be any name unrelated to a real name of a user. Thus, a person who creates a plurality of user accounts can set user names registered in the user accounts to be not similar to each other. Then, in the technique in NPL 1, it is difficult to determine that a plurality of user accounts in which user names being not similar to each other in such a manner are registered are owned by the same person.
- The invention of the present application has been made in view of the above-described problem, and is to provide a technique capable of accurately detecting whether user accounts being compared are owned by the same person even when user names of the user accounts are not similar to each other.
- An information processing apparatus according to the present invention includes 1) a determination unit that determines, for a first relevant account associated with a first target account and a second relevant account associated with a second target account, whether first content data associated with the first relevant account and second content data associated with the second relevant account are similar, and 2) a processing execution unit that executes predetermined processing when it is determined that the first content data and the second content data are similar.
- A control method according to the present invention is executed by a computer. The control method includes 1) a determination step of determining, for a first relevant account associated with a first target account and a second relevant account associated with a second target account, whether first content data associated with the first relevant account and second content data associated with the second relevant account are similar, and 2) a processing execution step of executing predetermined processing when it is determined that the first content data and the second content data are similar.
- A program according to the present invention causes a computer to execute each step included in the control method according to the present invention.
- The present invention provides a technique capable of accurately detecting whether user accounts being compared are owned by the same person even when user names of the user accounts are not similar to each other.
- The above-described object, the other objects, features, and advantages will become more apparent from suitable example embodiments described below and the following accompanying drawings.
-
FIG. 1 is a diagram schematically illustrating processing executed by an information processing apparatus according to a present example embodiment. -
FIG. 2 is a diagram illustrating a functional configuration of an information processing apparatus according to an example embodiment 1. -
FIG. 3 is a diagram illustrating a computer for achieving the information processing apparatus. -
FIG. 4 is a flowchart illustrating a flow of processing executed by the information processing apparatus according to the example embodiment 1. -
FIG. 5 is a diagram illustrating a histogram generated for a relevant account. -
FIG. 6 is a diagram illustrating a histogram of a topic. -
FIG. 7 is a diagram illustrating a histogram of a frequency of appearance of a keyword. -
FIG. 8 is a diagram illustrating a histogram of a frequency of appearance of a speaker. -
FIG. 9 is a diagram illustrating a notification displayed on a display apparatus. - Hereinafter, example embodiments of the present invention will be described with reference to the drawings. Note that, in all of the drawings, the same components have the same reference signs, and description thereof will not be repeated as appropriate. Further, in each block diagram, each block represents a configuration of a functional unit instead of a configuration of a hardware unit unless otherwise described.
-
FIG. 1 is a diagram schematically illustrating processing executed by aninformation processing apparatus 2000 according to the present example embodiment. Theinformation processing apparatus 2000 infers whether owners having user accounts different from each other are the same person. For example, user information being information related to a user himself/herself and information (hereinafter, a content) such as image data and text data being registered in association with a user account are associated with the account. The user information is, for example, a name, an address, a phone number, an e-mail address, or the like. - In general, when a user account is created in a social networking service (SNS) and the like, a user inputs various types of user information described above. At this time, there are many cases where authenticity of a content of the input user information is not required to be proven. In such a case, the content of the user information can be even falsified. Thus, the same person can create a plurality of accounts having contents of user information different from each other. In other words, the same person can own a plurality of accounts. For a plurality of user accounts having a characteristic that “pieces of user information different from each other are registered even though actual owners are the same person” in such a manner, it is difficult to recognize a fact that the user accounts are owned by the same person from only the user information and a content of the user accounts.
- Further, since a plurality of services such as the SNS are present, there is also a case where the same person creates user accounts with different account names in the plurality of services. In this case, even though a user registers user information without falsehood, when user information is private, it is difficult to recognize a fact that the plurality of user accounts are owned by the same person.
- Thus, the
information processing apparatus 2000 according to the present example embodiment infers user accounts different from each other being owned by the same person by using a content associated with another user account associated with a user account. Hereinafter, an account to be determined whether to be owned by the same person is expressed as a target account, and another account associated with the target account is referred to as a relevant account. For example, in the SNS, a function of associating user accounts with each other as friends is often provided. Thus, for example, an account associated as a friend of a target account is used as a relevant account. Note that which account is handled as a target account will be described below. - In the example in
FIG. 1 , theinformation processing apparatus 2000 determines, for two target accounts that are a target account 10-1 and a target account 10-2, whether the target accounts are accounts owned by the same person. For the target account 10-1, a plurality ofrelevant accounts 20 are present. Herein, therelevant account 20 associated with the target account 10-1 is expressed as a relevant account 20-1. InFIG. 1 , only one of the plurality of relevant accounts 20-1 is provided with a reference sign for simplifying the diagram. A content associated with the relevant account 20-1 is expressed as a content 30-1. For example, the content 30-1 is image data uploaded in association with the relevant account 20-1, and the like. Similarly, a relevant account of the target account 10-2 is expressed as a relevant account 20-2, and a content associated with the relevant account 20-2 is expressed as a content 30-2. Hereinafter, the “content 30 associated with therelevant account 20” is also simply expressed as the “content 30 of therelevant account 20”. - The
information processing apparatus 2000 determines whether the content 30-1 of the relevant account 20-1 and the content 30-2 of the relevant account 20-2 are similar. When the content 30-1 and the content 30-2 are similar, the target account 10-1 and the target account 10-2 can be inferred to belong to the same person. Thus, when the content 30-1 and the content 30-2 are similar, theinformation processing apparatus 2000 executes predetermined processing related to the target account 10-1 and the target account 10-2. For example, theinformation processing apparatus 2000 outputs, as the predetermined processing, a notification indicating that the target account 10-1 and the target account 10-2 belong to the same person. - The
information processing apparatus 2000 according to the present example embodiment determines a similarity degree between the content 30-1 of the relevant account 20-1 associated with the target account 10-1 and the content 30-2 of the relevant account 20-2 associated with the target account 10-2. Herein, when the similarity degree is high, the target account 10-1 and the target account 10-2 can be inferred to be owned by the same person. The reason will be described below. - The relevant account 20-1 associated with the target account 10-1 conceivably belongs to a person who has some sort of connection with an owner of the target account 10-1, such as a friend of the owner of the target account 10-1, for example. Thus, there is a high probability that a content including some sort of information related to the target account 10-1 is present among the contents 30-1 uploaded and the like in association with the relevant accounts 20-1 by owners of the relevant accounts 20-1. In other words, there is a high probability that some sort of information related to the target account 10-1 is revealed in information opened by the relevant account 20-1. For example, there is a high probability that a picture and a moving image uploaded by the relevant account 20-1 include the owner of the target account 10-1, property (such as a vehicle) of the owner of the target account 10-1, a landmark representing a place where the target account 10-1 has visited, and the like. Further, there is a high probability that text data and voice data uploaded by the relevant account 20-1 also include some sort of information related to the target account 10-1.
- Similarly, there is a high probability that a content including some sort of information related to the target account 10-2 is present among the contents 30-2 uploaded and the like in association with the relevant accounts 20-2 by owners of the relevant accounts 20-2. For this reason, it can be said that there is a high probability that the content 30-1 of the relevant account 20-1 and the content 30-2 of the relevant account 20-2 being similar indicates that information related to the target account 10-1 included in the content 30-1 and information related to the target account 10-2 included in the content 30-2 are similar.
- Thus, when the content 30-1 and the content 30-2 are similar, the
information processing apparatus 2000 infers that there is a high probability that the owner of the target account 10-1 and an owner of the target account 10-2 are the same person. In this way, even when it is not clear whether the target account 10-1 and the target account 10-2 are owned by the same person just by comparing the user information of the target account 10-1 with the user information of the target account 10-2, whether the target account 10-1 and the target account 10-2 are accounts owned by the same person can be inferred. - Note that the above-described description with reference to
FIG. 1 is exemplification for facilitating understanding of theinformation processing apparatus 2000, and does not limit the function of theinformation processing apparatus 2000. Hereinafter, theinformation processing apparatus 2000 according to the present example embodiment will be described in more detail. -
FIG. 2 is a diagram illustrating a functional configuration of theinformation processing apparatus 2000 according to the example embodiment 1. Theinformation processing apparatus 2000 includes adetermination unit 2020 and aprocessing execution unit 2040. Thedetermination unit 2020 determines whether the content 30-1 of the relevant account 20-1 associated with the target account 10-1 and the content 30-2 of the relevant account 20-2 associated with the target account 10-2 are similar. When the content 30-1 and the content 30-2 are similar, theprocessing execution unit 2040 executes predetermined processing related to the target account 10-1 and the target account 10-2. - Each functional component unit of the
information processing apparatus 2000 may be achieved by hardware (for example, a hard-wired electronic circuit and the like) that achieves each functional component unit, and may be achieved by a combination (for example, a combination of an electronic circuit and a program that controls the electronic circuit, and the like) of hardware and software. Hereinafter, a case where each functional component unit of theinformation processing apparatus 2000 is achieved by the combination of hardware and software will be further described. -
FIG. 3 is a diagram illustrating acomputer 1000 for achieving theinformation processing apparatus 2000. Thecomputer 1000 is any computer. For example, thecomputer 1000 is a personal computer (PC), a server machine, or the like. Thecomputer 1000 may be a dedicated computer designed for achieving theinformation processing apparatus 2000, and may be a general-purpose computer. - The
computer 1000 includes abus 1020, aprocessor 1040, amemory 1060, astorage device 1080, an input/output interface 1100, and anetwork interface 1120. Thebus 1020 is a data transmission path for allowing theprocessor 1040, thememory 1060, thestorage device 1080, the input/output interface 1100, and thenetwork interface 1120 to transmit and receive data with one another. However, a method of connecting theprocessor 1040 and the like to each other is not limited to a bus connection. - The
processor 1040 is various types of processors such as a central processing unit (CPU), a graphics processing unit (GPU), and a field-programmable gate array (FPGA). Thememory 1060 is a main storage achieved by using a random access memory (RAM) and the like. Thestorage device 1080 is an auxiliary storages achieved by using a hard disk, a solid state drive (SSD), a memory card, a read only memory (ROM), or the like. - The input/
output interface 1100 is an interface for connecting thecomputer 1000 and an input/output device. For example, an input apparatus such as a keyboard and an output apparatus such as a display apparatus are connected to the input/output interface 1100. Thenetwork interface 1120 is an interface for connecting thecomputer 1000 to a communication network. The communication network is, for example, a local area network (LAN) and a wide area network (WAN). A method of connection to the communication network by thenetwork interface 1120 may be a wireless connection or a wired connection. - The
storage device 1080 stores a program module that achieves each functional component unit of theinformation processing apparatus 2000. Theprocessor 1040 achieves a function associated with each program module by reading each of the program modules to thememory 1060 and executing the read program module. -
FIG. 4 is a flowchart illustrating a flow of processing executed by theinformation processing apparatus 2000 according to the example embodiment 1. Thedetermination unit 2020 acquires the content 30-1 of each of the relevant accounts 20-1 associated with the target account 10-1 (S102). Thedetermination unit 2020 acquires the content 30-2 of each of the relevant accounts 20-2 associated with the target account 10-2 (S104). Thedetermination unit 2020 determines whether the content 30-1 and the content 30-2 are similar (S106). When the content 30-1 and the content 30-2 are similar (S106: YES), theprocessing execution unit 2040 executes predetermined processing (S108). On the other hand, when the content 30-1 and the content 30-2 are not similar (S106: NO), the processing inFIG. 4 ends. - As described above, the
target account 10 and therelevant account 20 are user accounts created by a user in a service such as the SNS, for example. In general, such a user account is created by registering user information such as a name, and is continuously used. - However, a user account handled by the
information processing apparatus 2000 is not limited to a user account created by registering user information in such a manner. For example, on a bulletin board and the like on a Web page, when a user posts (uploads text data, and the like) a content, an identifier is assigned to the post. Theinformation processing apparatus 2000 may handle the identifier as a user account. In this case, for example, when a certain user posts a content on a bulletin board site and another user comments on the post, one of the former and the latter can be handled as thetarget account 10 and the other can be handled as therelevant account 20. - The
information processing apparatus 2000 infers, for two accounts of the target account 10-1 and the target account 10-2, whether the accounts belong to the same person. Herein, the target account 10-1 and the target account 10-2 may be user accounts for using the same service (for example, the SNS), or may be user accounts for using services different from each other. - Herein, there are various methods as a method of determining which user account among a plurality of user accounts is handled as the
target account 10. Hereinafter, a variation of the methods is illustrated. - For example, the
information processing apparatus 2000 receives a specification of a user account handled as thetarget account 10 from a user of theinformation processing apparatus 2000. The user account specified by a user may be two, or may be three or more. When three or more user accounts are specified, for example, theinformation processing apparatus 2000 executes, for each combination (n−2 combination) of any two user accounts creatable for the specified user accounts, processing handling two user accounts included in the combination as the target accounts 10. In other words, when user accounts of A, B, and C are specified, processing handling A and B as the target accounts 10, processing handling A and C as the target accounts 10, and processing handling B and C as the target accounts 10 are each executed. - For example, the
information processing apparatus 2000 receives, from a user, an input that specifies one user account handled as thetarget account 10. Theinformation processing apparatus 2000 handles the user account specified by a user as the target account 10-1. Furthermore, theinformation processing apparatus 2000 handles, as the target account 10-2, another user account having user information similar to user information of the target account 10-1. The similarity between pieces of user information herein refers to, for example, a part of various pieces of information (a part of a user ID, a part of a name, a part of a birth date, a part of an e-mail address, or the like) being common. When a plurality of other user accounts having user information similar to the user information of the target account 10-1 are present, theinformation processing apparatus 2000 handles each of the plurality of user accounts as the target account 10-2. - The
information processing apparatus 2000 may operate in cooperation with a monitoring system for monitoring a user account, and receive a specification of a user account from the monitoring system. For example, the monitoring system monitors a usage aspect (such as a content of an uploaded content and a frequency of uploading) of a user account, and determines a user account whose usage aspect violates common sense, a user policy of a service, law, or the like (that is, determines a user account to beware of). The monitoring system notifies the determined user account to theinformation processing apparatus 2000. Theinformation processing apparatus 2000 executes, for each combination of any two user accounts creatable for the plurality of user accounts notified from the monitoring system, processing handling the two user accounts included in the combination as the target accounts 10. Note that, when the monitoring system notifies user accounts one by one, theinformation processing apparatus 2000 executes the above-described processing on a plurality of user accounts indicated by a plurality of notifications received during a predetermined period of time, for example. - As described above, the
relevant account 20 is another account associated with thetarget account 10, and is an account in a friendship with thetarget account 10 in the SNS, for example. When the plurality ofrelevant accounts 20 are associated with thetarget account 10, thedetermination unit 2020 may acquire the content 30 for all of therelevant accounts 20, and may acquire the content 30 for some of the relevant accounts 20. When the content 30 is acquired for some of therelevant accounts 20, thedetermination unit 2020 arbitrarily (for example, randomly) selects a predetermined number of therelevant accounts 20 from the plurality ofrelevant accounts 20, for example. - The
determination unit 2020 acquires the content 30-1 associated with the relevant account 20-1 and the content 30-2 associated with the relevant account 20-2 (S102 and S104). For example, thedetermination unit 2020 automatically collects, for each of therelevant accounts 20, each of the contents 30 from Web pages on which the contents 30 of therelevant accounts 20 are opened, by successively accessing the Web pages. - Further, an application programming interface (API) for acquiring a content associated with a user account may be provided in a service such as the SNS. Thus, the
determination unit 2020 may acquire the content 30 of therelevant account 20 by using the API provided in a service used by therelevant account 20. - Note that the
determination unit 2020 may acquire all of the contents 30 associated with therelevant account 20, and may acquire only the content 30 of a predetermined type. For example, when a target of a similarity determination is only image data, thedetermination unit 2020 acquires image data associated with therelevant account 20 as the content 30. - <Comparison between Pieces of Content Data: S106>
- The
determination unit 2020 compares content data of the relevant account 20-1 with content data of the relevant account 20-2, and infers that, when a similarity degree between the pieces of the content data is high, the target account 10-1 and the target account 10-2 are owned by the same person. The processing may adopt various variations in points that 1) what kind of content data is to be compared and 2) what kind of comparison is performed. Hereinafter, a comparison between pieces of content data will be described while focusing on the two points. - <<Comparison between Pieces of Image Data>>
- Image data are conceivable as a type of the content data to be compared. For example, in the SNS, image data of a picture of a person, a building, scenery, or the like are uploaded by using a user account. The
determination unit 2020 handles image data uploaded by using a user account in such a manner as a content associated with the user account. Further, a user may make a post that refers to (links) a Web page including image data, and make a post that refers to image data uploaded by another user. Thedetermination unit 2020 may also handle image data referred by a user in such a manner as content data associated with an account of the user. Note that a moving image frame constituting moving image data is also included in image data. Using image data has an advantage that similarity between the content 30-1 and the content 30-2 is easily determined even when a language used in the relevant account 20-1 is different from a language used in the relevant account 20-2. Hereinafter, a few specific comparison methods related to image data are illustrated. - The
determination unit 2020 focuses on a similarity degree between an object detected from image data associated with the relevant account 20-1 and an object detected from image data associated with the relevant account 20-2. For example, thedetermination unit 2020 calculates the similarity degree between the object detected from the image data associated with the relevant account 20-1 and the object detected from the image data associated with the relevant account 20-2. Then, when the number of groups (namely, groups of objects inferred to be the same) of objects having a similarity degree equal to or more than a predetermined value is equal to or more than a predetermined number, thedetermination unit 2020 determines that the similarity degree between the content data of the relevant account 20-1 and the content data of the relevant account 20-2 is high. On the other hand, when the number of groups of objects having a similarity degree equal to or more than the predetermined value is less than the predetermined number, thedetermination unit 2020 determines that the similarity degree between the content data of the relevant account 20-1 and the content data of the relevant account 20-2 is not high. The predetermined number described above is previously stored in a storage apparatus that can be accessed from thedetermination unit 2020. - Herein, an object detected from
image data 32 may be an object of any kind, and may be an object of a specific kind. In a case of the latter, for example, only a person among objects included in theimage data 32 is to be detected. - Note that an existing technique can be used as a technique for detecting an object from image data and a technique for determining a similarity degree between detected objects.
- The
determination unit 2020 generates, for each of the relevant account 20-1 and the relevant account 20-2, a histogram representing a distribution of a frequency of appearance of an object in image data associated thereto, and determines a similarity degree between the histograms.FIG. 5 is a diagram illustrating a histogram generated for therelevant account 20. InFIG. 5 , a plurality of pieces ofimage data 32 are associated with therelevant account 20. A histogram 40 is a distribution of a frequency of appearance of an object detected from theimage data 32. Hereinafter, theimage data 32 associated with the relevant account 20-1 are expressed as image data 32-1, and the histogram 40 generated for the image data 32-1 is expressed as a histogram 40-1. Similarly, theimage data 32 associated with the relevant account 20-2 are expressed as image data 32-2, and the histogram 40 generated for the image data 32-2 is expressed as a histogram 40-2. - The
determination unit 2020 determines a similarity degree between the histogram 40-1 and the histogram 40-2. For example, thedetermination unit 2020 calculates the similarity degree between the histogram 40-1 and the histogram 40-2, and, when the calculated similarity degree is equal to or more than a predetermined value, thedetermination unit 2020 determines that a similarity degree between the content 30-1 and the content 30-2 is high. On the other hand, when the similarity degree between the histogram 40-1 and the histogram 40-2 is less than the predetermined value, thedetermination unit 2020 determines that the similarity degree between the content 30-1 and the content 30-2 is not high. Herein, an existing technique can be used as a technique for calculating a similarity degree between two histograms. Further, the predetermined value described above is stored in a storage apparatus that can be accessed from thedetermination unit 2020. - The histogram 40-1 and the histogram 40-2 are generated as follows, for example. First, the
determination unit 2020 recognizes an object included in each piece of the image data 32-1 by performing object recognition processing on each piece of the image data 32-1 as a target. Furthermore, thedetermination unit 2020 generates the histogram 40-1 representing a distribution of a frequency of appearance of an object by counting the number of appearances of each object. - Herein, the
determination unit 2020 assigns an identifier to each object detected from the image data 32-1. At this time, for example, thedetermination unit 2020 makes each object identifiable by assigning the same identifier to the same object, and can thus count the number of appearances of the object. In order to achieve this, a determination (identification of an object) of whether each object detected from theimage data 32 is the same is needed. In other words, when thedetermination unit 2020 assigns an identifier to an object detected from theimage data 32, and the object is the same as another object being already detected, thedetermination unit 2020 assigns the same identifier as an identifier assigned to the object being already detected. On the other hand, when the object is different from any objects being already detected, thedetermination unit 2020 assigns a new identifier that is not assigned to any object. - The
determination unit 2020 generates the histogram 40-2 by also performing similar processing on the image data 32-2. At this time, for an object detected from the image data 32-2, not only identification with an object detected from the other piece of image data 32-2 but also identification with an object detected from the image data 32-1 are performed. In other words, when the same object as an object detected from the image data 32-2 is already detected from the image data 32-1, thedetermination unit 2020 also assigns, to the object detected from the image data 32-2, an identifier assigned to the object being already detected. Various types of existing techniques can be used for identification of an object. - Herein, a comparison between the histogram 40-1 and the histogram 40-2 may be performed by using only a part of the histogram 40-1 and a part of the histogram 40-2. For example, the
determination unit 2020 calculates a similarity degree between the histogram 40-1 and the histogram 40-2 by comparing a frequency of appearance of objects in top N places (N is a natural number of two or more) in the histogram 40-1 with a frequency of appearance of objects in top N places in the histogram 40-2. - A comparison related to image data may be achieved by a comparison between topics of the image data instead of a comparison between objects detected from the image data. Herein, a topic in a certain piece of data refers to a main matter or event expressed by the data. For example, a topic such as work, food, sports, traveling, games, or politics is conceivable. The
determination unit 2020 classifies each piece of theimage data 32 associated with therelevant account 20 by topic. Herein, an existing technique can be used as a technique for classifying image data by topic. - For example, the
determination unit 2020 generates a histogram of a frequency of appearance of a topic for each of the image data 32-1 and the image data 32-2.FIG. 6 is a diagram illustrating a histogram of a topic. When a similarity degree between a histogram of a topic generated from the image data 32-1 and a histogram of a topic generated from the image data 32-2 is equal to or more than a predetermined value, thedetermination unit 2020 determines that a similarity degree between the content 30-1 and the content 30-2 is high. On the other hand, when the similarity degree between the histogram of the topic generated from the image data 32-1 and the histogram of the topic generated from the image data 32-2 is less than the predetermined value, thedetermination unit 2020 determines that the similarity degree between the content 30-1 and the content 30-2 is not high. - The
determination unit 2020 may perform a comparison similar to the above-described comparison related to theimage data 32 on text data associated with therelevant account 20. For example, in the SNS, text data representing information such as a thought of a user and a recent state of a user are uploaded in association with a user account. Thedetermination unit 2020 handles, for example, text data uploaded by a user in such a manner as the content 30. - In addition, for example, a user may also make a post that refers to a Web page, a post that refers to text data uploaded by another user, a post of a comment on a content of another user, and the like. The
determination unit 2020 may also handle, as content data associated with an account of the user, the text data included in the Web page referred by the user in such a manner, the text data uploaded by the another user, and the text data representing the comment on the content of the another user. Hereinafter, a few specific comparison methods related to text data are illustrated. - For example, the
determination unit 2020 performs extraction of a keyword from text data associated with the relevant account 20-1 and text data associated with the relevant account 20-2. For example, when the number of keywords that appear commonly to both pieces of the text data is equal to or more than a predetermined number, thedetermination unit 2020 determines that a similarity degree between the content 30-1 and the content 30-2 is high. On the other hand, when the number of keywords that appear commonly to both pieces of the text data is less than the predetermined number, thedetermination unit 2020 determines that the similarity degree between the content 30-1 and the content 30-2 is not high. - Herein, a keyword extracted from text data may be any word, and may be a specific word. In a case of the latter, for example, a list of words to be adopted as a keyword is previously prepared, and only a word included in the list is extracted as a keyword. Note that an existing technique can be used as a technique for extracting a keyword from text data.
- For example, the
determination unit 2020 may perform, on a keyword extracted from text data associated with therelevant account 20, a comparison similar to the comparison related to a histogram of a frequency of appearance of an object detected from image data associated with therelevant account 20. Specifically, thedetermination unit 2020 generates, for each of the relevant account 20-1 and the relevant account 20-2, a histogram representing a distribution of a frequency of appearance of a keyword in associated text data, and determines a similarity degree between the histograms. -
FIG. 7 is a diagram illustrating a histogram of a frequency of appearance of a keyword. InFIG. 7 , ahistogram 50 is generated fortext data 34 associated with therelevant account 20. Hereinafter, thetext data 34 associated with the relevant account 20-1 is expressed as text data 34-1, and thehistogram 50 generated from the text data 34-1 is expressed as a histogram 50-1. Similarly, thetext data 34 associated with the relevant account 20-2 is expressed as text data 34-2, and thehistogram 50 generated from the text data 34-2 is expressed as a histogram 50-2. - For example, the
determination unit 2020 calculates a similarity degree between the histogram 50-1 and the histogram 50-2, and, when the similarity degree is equal to or more than a predetermined value, thedetermination unit 2020 determines that a similarity degree between the content 30-1 and the content 30-2 is high. On the other hand, when the similarity degree between the histogram 50-1 and the histogram 50-2 is less than the predetermined value, thedetermination unit 2020 determines that the similarity degree between the content 30-1 and the content 30-2 is not high. The predetermined value described above is previously stored in a storage apparatus that can be accessed from thedetermination unit 2020. - Herein, a comparison between the histogram 50-1 and the histogram 50-2 may be performed by using only a part (for example, up to the top N place) of the histogram similarly to the comparison between the histogram 40-1 and the histogram 40-2.
- The
determination unit 2020 may determine a similarity degree between the content 30-1 and the content 30-2 by a comparison between frequencies of appearance of a topic extracted from the pieces of thetext data 34. A method of comparing frequencies of appearance of a topic extracted from the pieces of thetext data 34 is similar to the above-described comparison between frequencies of appearance of a topic extracted from pieces of image data. Note that an existing technique can be used as a technique for extracting a topic from text data. - The
determination unit 2020 may handle voice data associated with the relevant account as the content 30. The voice data herein include not only data generated by voice alone, but also data about voice included in moving image data. Hereinafter, comparison methods related to voice data are illustrated. - The
determination unit 2020 extracts a keyword from each piece of voice data associated with the relevant account 20-1 and voice data associated with the relevant account 20-2. Then, thedetermination unit 2020 determines a similarity degree between the content 30-1 and the content 30-2 by handling the keywords extracted from the pieces of the voice data similarly to the keywords extracted from the pieces of the text data described above. In other words, thedetermination unit 2020 determines a similarity degree between the content 30-1 and the content 30-2 by comparing the numbers of common keywords and histograms representing a frequency of appearance of a keyword. - The
determination unit 2020 determines a similarity degree between the content 30-1 and the content 30-2 by comparing a frequency of appearance of a topic extracted from voice data associated with the relevant account 20-1 and a frequency of appearance of a topic extracted from voice data associated with the relevant account 20-2. A method of comparing frequencies of appearance of a topic is similar to the above-described comparison between frequencies of appearance of a topic extracted from image data. Note that an existing technique can be used as a technique for extracting a topic from voice data. - The
determination unit 2020 performs extraction of a speaker from each piece of voice data associated with the relevant account 20-1 and voice data associated with the relevant account 20-2. An existing technique such as voice print identification, for example, can be used as a technique for performing extraction of a speaker from voice data. For example, there is a technique for identifying a speaker by generating sound spectrogram data representing a voice print from voice data, and using the sound spectrogram data as identification information. - For example, the
determination unit 2020 generates, for each of the relevant account 20-1 and the relevant account 20-2, a histogram of a frequency of appearance of a speaker extracted from associated voice data.FIG. 8 is a diagram illustrating a histogram of a frequency of appearance of a speaker. InFIG. 8 , a histogram 60 of a frequency of appearance of a speaker is generated forvoice data 36 associated with therelevant account 20. Hereinafter, thevoice data 36 associated with the relevant account 20-1 is expressed as voice data 36-1, and the histogram generated from the voice data 36-1 is expressed as a histogram 60-1. Similarly, thevoice data 36 associated with the relevant account 20-2 is expressed as voice data 36-2, and the histogram 60 generated from the voice data 36-2 is expressed as a histogram 60-2. - For example, the
determination unit 2020 calculates a similarity degree between the histogram 60-1 and the histogram 60-2, and, when the similarity degree is equal to or more than a predetermined value, thedetermination unit 2020 determines that a similarity degree between the content 30-1 and the content 30-2 is high. On the other hand, when the similarity degree between the histogram 60-1 and the histogram 60-2 is less than the predetermined value, thedetermination unit 2020 determines that the similarity degree between the content 30-1 and the content 30-2 is not high. The predetermined value described above is previously stored in a storage apparatus that can be accessed from thedetermination unit 2020. - Herein, a comparison between the histogram 60-1 and the histogram 60-2 may be performed by using only a part (for example, up to the top N place) of the histogram similarly to the comparison of the histogram 40 and the comparison of the
histogram 50. - A comparison based on a speaker extracted from the
voice data 36 is not limited to a comparison between histograms. For example, thedetermination unit 2020 may use a comparison method similar to the method described in “Comparison Method 1 Related to Text Data”. In other words, when the number of speakers who appear commonly in thevoice data 36 associated with the relevant account 20-1 and thevoice data 36 associated with the relevant account 20-2 is equal to or more than a predetermined number, thedetermination unit 2020 determines that a similarity degree between the content 30-1 and the content 30-2 is high. On the other hand, when the number of speakers who appear commonly to both pieces of thevoice data 36 is less than the predetermined number, thedetermination unit 2020 determines that the similarity degree between the content 30-1 and the content 30-2 is not high. - As described above, when it is determined that a similarity degree between the content data 30-1 associated with the relevant account 20-1 and the content data 30-2 associated with the relevant account 20-2 is high, there is a high probability that the target account 10-1 and the target account 10-2 are owned by the same person. Thus, when it is determined that the similarity degree between the content 30-1 and the content 30-2 is high, the
processing execution unit 2040 executes predetermined processing on the target account 10-1 and the target account 10-2. Hereinafter, a variation of the processing executed by theprocessing execution unit 2040 is illustrated. - For example, when it is determined that the similarity degree between the content 30-1 and the content 30-2 is high, the
processing execution unit 2040 outputs information representing that there is a high probability that the target account 10-1 and the target account 10-2 are owned by the same person. The information is output, and thus a user of theinformation processing apparatus 2000 who acquires the information can easily realize a group of the target accounts 10 having a high probability of being owned by the same person. - There are various methods of outputting the information described above. For example, the
processing execution unit 2040 causes a display apparatus connected to theinformation processing apparatus 2000 to display a notification representing that there is a high probability that the target account 10-1 and the target account 10-2 are owned by the same person.FIG. 9 is a diagram illustrating a notification displayed on the display apparatus. In addition, for example, theprocessing execution unit 2040 may transmit the notification described above to another computer communicably connected to theinformation processing apparatus 2000, or store the notification described above in a storage apparatus communicably connected to theinformation processing apparatus 2000. - Further, it is assumed that the
information processing apparatus 2000 performs a determination by thedetermination unit 2020 on a plurality of combinations of the target account and the target account 10-2. In this case, a plurality of combinations of the target accounts having a high probability of being owned by the same person may be found. Thus, theprocessing execution unit 2040 may generate a list indicating one or more combinations of the target accounts 10 having a high probability of being owned by the same person, and output the list by various methods described above. By outputting such a list, a user of theinformation processing apparatus 2000 can easily realize the plurality of groups of the target accounts 10 having a high probability of being owned by the same person. - In addition, for example, when it is determined that the similarity degree between the content 30-1 and the content 30-2 is high, the
processing execution unit 2040 outputs information related to the content 30-1 and the content 30-2. Hereinafter, the information is referred to as similar content information. By outputting similar content information, a user of theinformation processing apparatus 2000 can acquire, for the target account 10-1 and the target account 10-2 inferred to have a high probability of being owned by the same person, information as grounds for the inference. Hereinafter, a variation of the similar content information is illustrated. - It is assumed that the
determination unit 2020 performs a comparison between objects extracted from the pieces of theimage data 32. In this case, for example, theprocessing execution unit 2040 includes, in the similar content information, the histogram 40 (seeFIG. 5 ) representing a frequency of appearance of an object being generated for theimage data 32. Herein, an image of each object indicated in the histogram 40 may be included together with the histogram 40 in the similar content information. In addition, for example, theprocessing execution unit 2040 includes, in the similar content information, a combination of images of objects determined to be similar to each other among objects extracted from the image data 32-1 and objects extracted from the image data 32-2. Note that, when an image of an object is included in the similar content information, theentire image data 32 in which the object is included may be included in the similar content information. - Furthermore, the
processing execution unit 2040 may execute analysis processing on an image of an object to be included in the similar content information, and include a result of the analysis processing in the similar content information. For example, when there is an image of a person among object images to be included in the similar content information, theprocessing execution unit 2040 may infer an attribute (age, height, body shape, and gender) of the person of the image, and include a result of the inference in the similar content information, or may calculate a feature of an accessory object (such as glasses, clothing, and baggage) of the person of the image, and include information related to the feature in the similar content information. In addition, for example, theprocessing execution unit 2040 may extract an image of a part (such as a face, a mole, a tattoo, a nail, or a fingerprint) representing a feature of a person from the image of the person, and include the image of the part in the similar content information. - In addition, for example, when there is an image of a vehicle (such as a car, a motor cycle, and a bicycle) among object images to be included in the similar content information, the
processing execution unit 2040 determines a maker of the vehicle, a type of the vehicle, a number of a number plate, and the like, and includes the determined information in the similar content information. - In addition, for example, when there is an image of a landmark (such as a building, a marking, a mountain, a river, and the sea) usable for identifying a capturing place (a place where the
image data 32 is generated) among object images to be included in the similar content information, theprocessing execution unit 2040 includes a name of the landmark in the similar content information. Further, theprocessing execution unit 2040 may identify a location of the landmark, and include information (an address or global positioning system (GPS) coordinates) representing the location in the similar content information. Note that a location of a landmark can be identified by using map information and the like, for example. - It is assumed that the
determination unit 2020 performs a comparison between keywords extracted from text data or voice data. In this case, for example, theprocessing execution unit 2040 includes, in the similar content information, the histogram (seeFIG. 7 ) generated for a keyword. At this time, each keyword indicated in the histogram may be included in the similar content information. In addition, for example, theprocessing execution unit 2040 includes, in the similar content information, a keyword determined to coincide among keywords extracted from the content 30-1 and keywords extracted from the content 30-2. - Note that, when a keyword is extracted from text data, the
processing execution unit 2040 may include, in the similar content information, not only a keyword determined to coincide, but also a sentence and the entire text data in which the keyword is included. Further, when a keyword is extracted from voice data, theprocessing execution unit 2040 may include, in the similar content information, not only a keyword determined to coincide, but also voice data of a statement in which the keyword is included and the entire voice data from which the keyword is extracted. - It is assumed that the
determination unit 2020 performs extraction of a speaker from voice data. In this case, for example, thedetermination unit 2020 includes, in the similar content information, the histogram 60 (seeFIG. 8 ) representing a frequency of appearance of a speaker. At this time, sound spectrogram data of each speaker indicated in the histogram may be included in the similar content information. In addition, for example, thedetermination unit 2020 includes, in the similar content information, sound spectrogram data of a speaker determined to coincide among speakers extracted from the voice data 36-1 and speakers extracted from the voice data 36-2. - It is assumed that the
determination unit 2020 performs a comparison between topics extracted from the content 30. In this case, for example, theprocessing execution unit 2040 includes, in the similar content information, the histogram (seeFIG. 6 ) representing a frequency of appearance of a topic extracted from the content 30. In addition, for example, theprocessing execution unit 2040 includes, in the similar content information, information (such as a name of a topic) representing a topic determined to coincide among topics extracted from the content 30-1 and topics extracted from the content 30-2. - While the example embodiments of the present invention have been described with reference to the drawings, the example embodiments are only exemplification of the present invention, and various configurations other than the above-described example embodiments can also be employed.
- For example, when the content 30-1 and the content 30-2 are similar, the
information processing apparatus 2000 may infer that an “owner of the target account 10-1 and an owner of the target account 10-2 belong to the same group” instead of inferring that “the target account and the target account 10-2 are owned by the same person”. In this case, theprocessing execution unit 2040 outputs “information representing that there is a high probability that the owner of the target account 10-1 and the owner of the target account 10-2 belong to the same group” instead of “information representing that there is a high probability that the target account and the target account 10-2 are owned by the same person”.
Claims (20)
1. An information processing apparatus, comprising:
at least one memory configured to store instructions; and
at least one processor configured to execute the instructions to perform operations, the operations comprising:
determining, for a first relevant account associated with a first target account and a plurality of relevant accounts associated with a plurality of target accounts other than the first relevant account, whether the first content data associated with the first relevant account and a plurality of content data associated with the plurality of relevant accounts are similar, and
executing predetermined processing when it is determined that the first content data and the plurality of content data are similar.
2. The information processing apparatus according to claim 1 , wherein the operations further comprise determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of an object included in image data associated with the first relevant account and distributions of frequencies of appearance of objects included in image data associated with the plurality of relevant accounts are similar.
3. The information processing apparatus according to claim 1 , wherein the operations further comprise determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a word included in text data or voice data associated with the first relevant account and distribution of frequencies of appearance of words included in text data or voice data associated with the plurality of relevant accounts are similar.
4. The information processing apparatus according to claim 1 , wherein the operations further comprise determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a speaker extracted from voice data associated with the first relevant account and distributions of frequencies of appearance of speakers extracted from voice data associated with the plurality of relevant accounts are similar.
5. The information processing apparatus according to claim 1 , wherein the operations further comprise determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a topic extracted from content data associated with the first relevant account and distributions of frequencies of appearance of topics extracted from content data associated with the plurality of relevant accounts are similar.
6. The information processing apparatus according to claim 1 , wherein the operations further comprise, as the predetermined processing, outputting information indicating that there is a high probability that the first target account and the plurality of target accounts are owned by a same person, or information indicating that there is a high probability that an owner of the first target account and an owner of the plurality of target accounts belong to a same group.
7. The information processing apparatus according to claim 2 , wherein the operations further comprise, as the predetermined processing, outputting the distributions.
8. The information processing apparatus according to claim 1 , wherein the operations further comprise, as the predetermined processing, outputting content data that coincide or are similar among the first content data and the plurality of contents data.
9. The information processing apparatus according to claim 8 , wherein the operations further comprise extracting an image region representing a characteristic part of a person included in image data and outputting the extracted image region.
10. The information processing apparatus according to claim 8 , wherein the operations further comprise outputting information indicating at least one of a type, a maker, and a number of a number plate of a vehicle included in image data.
11. The information processing apparatus according to claim 8 , wherein t the operations further comprise outputting a name or a location of a landmark included in image data.
12. A control method executed by a computer, comprising:
determining, for a first relevant account associated with a first target account and a plurality of relevant accounts associated with a plurality of target accounts other than the first relevant account, whether the first content data associated with the first relevant account and a plurality of content data associated with the plurality of relevant accounts are similar, and
executing predetermined processing when it is determined that the first content data and the plurality of content data are similar.
13. The control method according to claim 12 , further comprising determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of an object included in image data associated with the first relevant account and distributions of frequencies of appearance of objects included in image data associated with the plurality of relevant accounts are similar.
14. The control method according to claim 12 , further comprising determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a word included in text data or voice data associated with the first relevant account and distribution of frequencies of appearance of words included in text data or voice data associated with the plurality of relevant accounts are similar.
15. The control method according to claim 12 , further comprising determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a speaker extracted from voice data associated with the first relevant account and distributions of frequencies of appearance of speakers extracted from voice data associated with the plurality of relevant accounts are similar.
16. The control method according to claim 12 , further comprising determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a topic extracted from content data associated with the first relevant account and distributions of frequencies of appearance of topics extracted from content data associated with the plurality of relevant accounts are similar.
17. A non-transitory computer-readable medium storing a program for causing a computer to perform operations, the operations comprising:
determining, for a first relevant account associated with a first target account and a plurality of relevant accounts associated with a plurality of target accounts other than the first relevant account, whether the first content data associated with the first relevant account and a plurality of content data associated with the plurality of relevant accounts are similar, and
executing predetermined processing when it is determined that the first content data and the plurality of content data are similar.
18. The non-transitory computer-readable medium according to claim 17 , wherein the operations further comprise determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of an object included in image data associated with the first relevant account and distributions of frequencies of appearance of objects included in image data associated with the plurality of relevant accounts are similar.
19. The non-transitory computer-readable medium according to claim 17 , wherein the operations further comprise determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a word included in text data or voice data associated with the first relevant account and distribution of frequencies of appearance of words included in text data or voice data associated with the plurality of relevant accounts are similar.
20. The non-transitory computer-readable medium according to claim 17 , wherein the operations further comprise determining whether the first content data and the plurality of content data are similar by determining whether a distribution of a frequency of appearance of a speaker extracted from voice data associated with the first relevant account and distributions of frequencies of appearance of speakers extracted from voice data associated with the plurality of relevant accounts are similar.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/240,160 US20230410221A1 (en) | 2018-03-30 | 2023-08-30 | Information processing apparatus, control method, and program |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2018/013880 WO2019187107A1 (en) | 2018-03-30 | 2018-03-30 | Information processing device, control method, and program |
US202017043291A | 2020-09-29 | 2020-09-29 | |
US18/240,160 US20230410221A1 (en) | 2018-03-30 | 2023-08-30 | Information processing apparatus, control method, and program |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2018/013880 Continuation WO2019187107A1 (en) | 2018-03-30 | 2018-03-30 | Information processing device, control method, and program |
US17/043,291 Continuation US20210019553A1 (en) | 2018-03-30 | 2018-03-30 | Information processing apparatus, control method, and program |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230410221A1 true US20230410221A1 (en) | 2023-12-21 |
Family
ID=68059653
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/043,291 Abandoned US20210019553A1 (en) | 2018-03-30 | 2018-03-30 | Information processing apparatus, control method, and program |
US18/240,152 Pending US20230410220A1 (en) | 2018-03-30 | 2023-08-30 | Information processing apparatus, control method, and program |
US18/240,209 Pending US20230410222A1 (en) | 2018-03-30 | 2023-08-30 | Information processing apparatus, control method, and program |
US18/240,160 Pending US20230410221A1 (en) | 2018-03-30 | 2023-08-30 | Information processing apparatus, control method, and program |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/043,291 Abandoned US20210019553A1 (en) | 2018-03-30 | 2018-03-30 | Information processing apparatus, control method, and program |
US18/240,152 Pending US20230410220A1 (en) | 2018-03-30 | 2023-08-30 | Information processing apparatus, control method, and program |
US18/240,209 Pending US20230410222A1 (en) | 2018-03-30 | 2023-08-30 | Information processing apparatus, control method, and program |
Country Status (3)
Country | Link |
---|---|
US (4) | US20210019553A1 (en) |
JP (1) | JP7070665B2 (en) |
WO (1) | WO2019187107A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11565698B2 (en) * | 2018-04-16 | 2023-01-31 | Mitsubishi Electric Cornoration | Obstacle detection apparatus, automatic braking apparatus using obstacle detection apparatus, obstacle detection method, and automatic braking method using obstacle detection method |
JP7110293B2 (en) * | 2020-09-28 | 2022-08-01 | 楽天グループ株式会社 | Information processing device, information processing method and program |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5181691B2 (en) * | 2008-01-21 | 2013-04-10 | 日本電気株式会社 | Information processing apparatus, information processing method, computer program, and recording medium |
US9201863B2 (en) * | 2009-12-24 | 2015-12-01 | Woodwire, Inc. | Sentiment analysis from social media content |
US20110320560A1 (en) * | 2010-06-29 | 2011-12-29 | Microsoft Corporation | Content authoring and propagation at various fidelities |
JP5758831B2 (en) * | 2012-03-30 | 2015-08-05 | 楽天株式会社 | Information providing apparatus, information providing method, information providing program, and computer-readable recording medium for recording the program |
US8666123B2 (en) * | 2012-04-26 | 2014-03-04 | Google Inc. | Creating social network groups |
US9208171B1 (en) * | 2013-09-05 | 2015-12-08 | Google Inc. | Geographically locating and posing images in a large-scale image repository and processing framework |
US20150120583A1 (en) * | 2013-10-25 | 2015-04-30 | The Mitre Corporation | Process and mechanism for identifying large scale misuse of social media networks |
DE102014219407A1 (en) * | 2014-09-25 | 2016-03-31 | Volkswagen Aktiengesellschaft | Diagnostic procedures and survey methods for vehicles |
KR20160120604A (en) * | 2015-04-08 | 2016-10-18 | 김근제 | Apparatus for providing code using light source device or color information and code identification system |
JP6557592B2 (en) * | 2015-12-15 | 2019-08-07 | 日本放送協会 | Video scene division apparatus and video scene division program |
US20170235726A1 (en) * | 2016-02-12 | 2017-08-17 | Fujitsu Limited | Information identification and extraction |
JP2018037076A (en) * | 2016-08-25 | 2018-03-08 | 株式会社ピープルコミュニケーションズ | SNS portal system |
US20180129929A1 (en) * | 2016-11-09 | 2018-05-10 | Fuji Xerox Co., Ltd. | Method and system for inferring user visit behavior of a user based on social media content posted online |
US10866633B2 (en) * | 2017-02-28 | 2020-12-15 | Microsoft Technology Licensing, Llc | Signing with your eyes |
CN107609461A (en) * | 2017-07-19 | 2018-01-19 | 阿里巴巴集团控股有限公司 | The training method of model, the determination method, apparatus of data similarity and equipment |
-
2018
- 2018-03-30 US US17/043,291 patent/US20210019553A1/en not_active Abandoned
- 2018-03-30 WO PCT/JP2018/013880 patent/WO2019187107A1/en active Application Filing
- 2018-03-30 JP JP2020508875A patent/JP7070665B2/en active Active
-
2023
- 2023-08-30 US US18/240,152 patent/US20230410220A1/en active Pending
- 2023-08-30 US US18/240,209 patent/US20230410222A1/en active Pending
- 2023-08-30 US US18/240,160 patent/US20230410221A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2019187107A1 (en) | 2019-10-03 |
JPWO2019187107A1 (en) | 2021-02-25 |
US20210019553A1 (en) | 2021-01-21 |
JP7070665B2 (en) | 2022-05-18 |
US20230410220A1 (en) | 2023-12-21 |
US20230410222A1 (en) | 2023-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11610394B2 (en) | Neural network model training method and apparatus, living body detecting method and apparatus, device and storage medium | |
US20230410221A1 (en) | Information processing apparatus, control method, and program | |
CN107742100B (en) | A kind of examinee's auth method and terminal device | |
WO2019200781A1 (en) | Receipt recognition method and device, and storage medium | |
CN112348117B (en) | Scene recognition method, device, computer equipment and storage medium | |
CN109800320B (en) | Image processing method, device and computer readable storage medium | |
WO2019033525A1 (en) | Au feature recognition method, device and storage medium | |
US9613296B1 (en) | Selecting a set of exemplar images for use in an automated image object recognition system | |
WO2019062081A1 (en) | Salesman profile formation method, electronic device and computer readable storage medium | |
CN106874253A (en) | Recognize the method and device of sensitive information | |
US10997609B1 (en) | Biometric based user identity verification | |
US20180005022A1 (en) | Method and device for obtaining similar face images and face image information | |
US20200218772A1 (en) | Method and apparatus for dynamically identifying a user of an account for posting images | |
WO2022142903A1 (en) | Identity recognition method and apparatus, electronic device, and related product | |
CN107809370B (en) | User recommendation method and device | |
CN112241667A (en) | Image detection method, device, equipment and storage medium | |
CN113704623A (en) | Data recommendation method, device, equipment and storage medium | |
CN111738199B (en) | Image information verification method, device, computing device and medium | |
CN110688878A (en) | Living body identification detection method, living body identification detection device, living body identification detection medium, and electronic device | |
US9317887B2 (en) | Similarity calculating method and apparatus | |
CN115223022A (en) | Image processing method, device, storage medium and equipment | |
CN107656959B (en) | Message leaving method and device and message leaving equipment | |
CN115618415A (en) | Sensitive data identification method and device, electronic equipment and storage medium | |
CN112041847A (en) | Providing images with privacy tags | |
CN111192150B (en) | Method, device, equipment and storage medium for processing vehicle danger-giving agent service |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |