JP2024028697A5

JP2024028697A5 -

Info

Publication number: JP2024028697A5
Application number: JP2023191415A
Authority: JP
Filing date: 2023-11-09
Publication date: 2024-05-16

Claims

1. A method for generating training data for a machine learning algorithm, the method comprising:
receiving a knowledge representation encoded as a non-transitory computer readable data structure based on an object of interest, the knowledge representation including at least one concept and/or a relationship between two or more concepts;
receiving a first set of content items, the first set including one or more unlabeled content items, the labels associating the content items with one or more features of the knowledge representation;
determining one or more scores for each of the one or more content items of the first set, the score for each content item being based on the knowledge representation and content of each of the content items;
generating the training data for the machine learning algorithm by assigning a label to each of the one or more content items in the first set based on the score associated with each of the one or more content items in the first set;
The method includes:

The method of claim 1, further comprising synthesizing the knowledge representation based on the content of the object of interest.

The method of claim 2, wherein the synthesizing step includes generating the at least one concept and/or relationships between two or more concepts, the concepts and/or relationships not enumerated in the object of interest.

The method of claim 1, wherein the knowledge representation includes a weight associated with the at least one concept.

The method of claim 1, wherein the score for each of the content items is based on an intersection of at least one concept in the knowledge representation with the content of each of the content items.

The method of claim 1, wherein the interest objects include unstructured data, text, audio, video, topics, tweets, web pages, websites, documents, collections of documents, document titles, messages, advertisements, and/or search queries.

training an algorithm to predict labels of one or more unassociated content items based on the labels assigned to the first set of content items and one or more features associated with the first set of content items;
The method of claim 1 further comprising:

receiving a second set of content items, the second set including one or more unlabeled content items;
assigning, by the algorithm, a label to one or more of the second set of content items based on one or more features associated with each of the one or more content items in the second set;
The method of claim 7 further comprising:

The method of claim 1, wherein assigning a label to each of the one or more content items in the first set includes assigning the label based on a score of each content item in the first set exceeding a predetermined threshold.

The method of claim 1, wherein the machine learning algorithm uses supervised learning to infer one or more functions from labeled training data.

The method of claim 7, wherein the one or more features associated with the first set of content items include at least one of title, length, author, word frequency, inverse document frequency, and/or attributes of the knowledge representation.

1. A system for generating training data for a machine learning algorithm, the system including at least one processor, the processor configured to execute a method, the method comprising:
receiving a knowledge representation encoded as a non-transitory computer readable data structure based on an object of interest, the knowledge representation including at least one concept and/or a relationship between two or more concepts;
receiving a first set of content items, the first set including one or more unlabeled content items, the labels associating the content items with one or more features of the knowledge representation;
determining one or more scores for each of the one or more content items of the first set, the score for each content item being based on the knowledge representation and content of each of the content items;
generating the training data for the machine learning algorithm by assigning a label to each of the one or more content items in the first set based on the score associated with each of the one or more content items in the first set;
Including, the system.

The system of claim 12, wherein the method further comprises synthesizing the knowledge representation based on content of the object of interest.

The system of claim 13, wherein the synthesizing step includes generating a relationship between the at least one concept and/or two or more concepts, the concept and/or relationship not being enumerated in the object of interest.

The system of claim 12, wherein the knowledge representation includes a weight associated with the at least one concept.

The system of claim 12, wherein the score for each of the content items is based on an intersection of at least one concept in the knowledge representation with the content of each of the content items.

The system of claim 12, wherein the interest objects include unstructured data, text, audio, video, topics, tweets, web pages, websites, documents, collections of documents, document titles, messages, advertisements, and/or search queries.

The system of claim 12, wherein the method further comprises training an algorithm to predict labels of one or more unassociated content items based on labels assigned to the first set of content items and one or more features associated with the first set of content items.

The method comprises:
receiving a second set of content items, the second set including one or more unlabeled content items;
assigning, by the algorithm, a label to one or more of the second set of content items based on one or more features associated with each of the one or more content items in the second set;
The system of claim 18 further comprising:

The system of claim 12, wherein assigning a label to each of the one or more content items in the first set includes assigning the label based on a score of each content item in the first set exceeding a predetermined threshold.

The system of claim 12, wherein the machine learning algorithm uses supervised learning to infer one or more functions from labeled training data.

The system of claim 18, wherein the one or more features associated with the first set of content items include at least one of title, length, author, word frequency, inverse document frequency, and/or attributes of the knowledge representation.

At least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by at least one processor, cause the at least one processor to perform a method of generating training data for a machine learning algorithm, the method comprising:
receiving a knowledge representation encoded as a non-transitory computer readable data structure based on an object of interest, the knowledge representation including at least one concept and/or a relationship between two or more concepts;
receiving a first set of content items, the first set including one or more unlabeled content items, the labels associating the content items with one or more features of the knowledge representation;
determining one or more scores for each of the one or more content items of the first set, the score for each content item being based on the knowledge representation and content of each of the content items;
generating the training data for the machine learning algorithm by assigning a label to each of the one or more content items in the first set based on the score associated with each of the one or more content items in the first set;
At least one non-transitory computer readable storage medium comprising:

24. At least one non-transitory computer-readable storage medium according to claim 23, wherein the method further comprises synthesizing the knowledge representation based on content of the objects of interest.

25. At least one non-transitory computer-readable storage medium according to claim 24, wherein the synthesizing step is a step of generating the at least one concept and/or relationships between two or more concepts, the concepts and/or relationships not enumerated in the object of interest.

24. At least one non-transitory computer-readable storage medium according to claim 23, wherein the knowledge representation includes a weight associated with the at least one concept.

24. At least one non-transitory computer-readable storage medium according to claim 23, wherein the score for each of the content items is based on an intersection of at least one concept in the knowledge representation with the content of each of the content items.

24. At least one non-transitory computer-readable storage medium according to claim 23, wherein the objects of interest include unstructured data, text, audio, video, topics, tweets, web pages, websites, documents, collections of documents, document titles, messages, advertisements, and/or search queries.

24. At least one non-transitory computer-readable storage medium according to claim 23, wherein the method further comprises training an algorithm to predict labels of one or more unassociated content items based on labels assigned to the first set of content items and one or more features associated with the first set of content items.

The method comprises:
receiving a second set of content items, the second set including one or more unlabeled content items;
assigning, by the algorithm, a label to one or more of the second set of content items based on one or more features associated with each of the one or more content items in the second set;
30. The at least one non-transitory computer readable storage medium of claim 29, further comprising:

24. At least one non-transitory computer-readable storage medium according to claim 23, wherein assigning a label to each of the one or more content items of the first set includes assigning the label based on a score of each content item of the first set exceeding a predetermined threshold.

24. At least one non-transitory computer-readable storage medium according to claim 23, wherein the machine learning algorithm uses supervised learning to infer one or more functions from labeled training data.

30. At least one non-transitory computer-readable storage medium as described in claim 29, wherein the one or more features associated with the first set of content items include at least one of title, length, author, word frequency, inverse document frequency, and/or attributes of the knowledge representation.