WO2024025039A1

WO2024025039A1 - System and method for analyzing state of user in metaverse

Info

Publication number: WO2024025039A1
Application number: PCT/KR2022/016847
Authority: WO
Inventors: 김성엽
Original assignee: 주식회사 마블러스
Priority date: 2022-07-29
Filing date: 2022-10-31
Publication date: 2024-02-01
Also published as: KR20240016816A

Abstract

Embodiments relate to a system and a method for analyzing the state of a user in a metaverse, the system and the method comprising: detecting text or voice corpus data of a user in a metaverse for each of a plurality of users having accessed the metaverse; embedding the text or voice corpus data of the user so that same is converted into vector data corresponding to each unit text; extracting the characteristics of the user from the vector data corresponding to each unit text; calculating an index score of each characteristic by individual user on the basis of a computation value calculated while extracting the characteristics; and calculating the state of each user on the basis of the user characteristics extracted by user and the index score of a corresponding characteristic.

Description

System and method for analyzing the status of users in the metaverse

Embodiments of the present application relate to a system and method for analyzing the user's status by calculating the user's characteristics from the user's text captured in the metaverse.

Metaverse service providers have the task of constantly developing Metaverse in a way that increases satisfaction while always checking users' various experiences and satisfaction with them from a detailed perspective in order to develop services.

To solve this task, Metaverse service providers need technology to extract and analyze elements that reflect the user experience within the Metaverse and provide accurate and easy-to-understand analysis results to service decision makers. Only with this technology can metaverse service providers determine the appropriate means and timing to intervene in the metaverse.

In order to intervene at the appropriate time to increase the satisfaction of users experiencing the metaverse, all user behavior logs, including interactions between extracted elements, must be monitored in real time. However, not only do all of the user's behavior logs have diverse backgrounds, but the amount of behavior logs themselves is very large, making it difficult to process them in real time, and determining which of the many behavior logs are related to users' positive experiences and satisfaction. And it is not easy to predict. This is especially true because most of the behavior logs do not directly describe the user's internal state (e.g., the user's feelings and opinions).

Therefore, there is a need for technology to accurately analyze the user's behavior log in detail and accurately derive factors related to the user's satisfaction.

This invention is a technology supported by the National IT Industry Promotion Agency's 2022 Metaverse Content Global Project Support Project (D0131-22-1002) "Project title: Strengthening overseas expansion capabilities based on the education Metaverse platform."

In order to solve the above-described problem, embodiments of the present application calculate user characteristics from the user's text captured as the user's behavior log in the metaverse, analyze the user's status, and accurately derive elements related to the user's satisfaction. , the goal is to provide a system and method for analyzing the status of users of the metaverse.

A system for analyzing the status of a user in the metaverse according to one aspect of the present application,

For each of the plurality of users accessing the metaverse, a data collection module configured to capture the user's text or voice corpus data in the metaverse - the corpus consists of one or more unit texts; a preprocessing module that embeds the user's text or voice corpus data and converts it into vector data corresponding to each unit text; a feature analysis module configured to extract characteristics of the user from vector data corresponding to each unit text, wherein the characteristics include one or more items and the items include a text type; an index analysis module configured to calculate an index score for each characteristic for each individual user based on the calculation value calculated during the characteristic extraction process; and a status analysis module configured to calculate the status of each user based on user characteristics extracted for each user and index scores for the characteristics.

In one embodiment, the data collection module may be configured to capture text or voice data of each of a plurality of users formed in the metaverse and search for associated data of the captured text or voice data.

The preprocessing module converts the captured voice data into text data using a pre-stored voice conversion unit, tokenizes the captured text data or converted text data into a plurality of tokens using a pre-stored natural language processing model, and It is configured to calculate vector data by embedding the plurality of tokens.

In one embodiment, the feature analysis module inputs vector data corresponding to each unit text into a pre-learned feature recognition model so that the corresponding unit text is included in each of a plurality of text types preset for the feature recognition model. It may be configured to calculate probability values to be classified, respectively, and determine the text type of the unit text based on the probability values calculated for each type. Text types of the above characteristics include one or more types of emotional expressions, behavioral expressions, hate speech, political speech, use of emoticons, use of unrefined language, and time expressions.

In one embodiment, the feature recognition model is learned in advance using a first training data set, and each training data in the first training data set is text data acquired in an online or offline space different from the metaverse. This is the first sample vector data that has been embedded. The feature analysis module uses a feature recognition model re-trained using a second training data set to determine the type of user's feature from vector data corresponding to the user's text captured after re-learning is completed. It could be configured further. Each training data in the second training set is vector data obtained by embedding text or voice data captured in the metaverse.

In one embodiment, at least some text types among the plurality of text types of the feature recognition model may each have two or more sub-types. Then, the feature recognition model includes, for each of the at least some text types, each sub-model that classifies two or more sub-types of the corresponding text type. The feature recognition model determines whether a text type without a subtype is classified into the corresponding text type, and for a text type with a subtype, one of two or more subtypes of the text type. It may be configured to classify. The text type of the determined unit text includes the classified subtext type.

In one embodiment, the index analysis module may be configured to calculate an index score for the text type included in the user's characteristics based on probability values in the process of classifying the text type. The operation value includes a probability value calculated during the classification operation of the feature recognition model, and the probability value may include a probability value that the unit text is classified into a determined text type and a probability value that the unit text is classified into another undetermined text type. there is.

In one embodiment, the index analysis module calculates the total number of unit texts included in the user's text or voice corpus, each probability value calculated for each unit text, that the unit text will be classified into the text type included in the feature. It is configured to calculate an index score for the text type included in the user's characteristic based on the sum for each unit text of the sum and the probability value that the corresponding text is not classified as the text type included in the characteristic.

In one embodiment, when the text type of the unit text is determined through multiple classification, the index analysis module calculates the total number of unit texts included in the user's text or voice corpus, and the corresponding unit text calculated for each unit text. Among the plurality of text types of the sub-model, the sum of the probability values of each to be classified as a text type included in the characteristic and the probability of being classified into each of the remaining text types among the plurality of text types of the sub-model are classified into the same text type. It is configured to calculate an index score for the text type included in the user's characteristics based on the sum of the probability values.

In one embodiment, the state analysis module forms a characteristic set consisting of characteristics extracted for each of a plurality of users, forms a plurality of clusters based on the characteristics of each user in the characteristic set, and forms a plurality of clusters within the formed clusters. It is configured to extract commonalities between users in a cluster based on at least one of individual user characteristics, index scores, and related data.

In one embodiment, the state analysis module selects two or more index scores among text types included in the characteristics of users belonging to the same cluster to calculate similarity between users belonging to the formed cluster, and selects the two or more selected index scores. It may be further configured to calculate the similarity of users within each cluster, expressed as an index score for each text type, for each index score. The similarity is calculated in the form of a similarity distribution, which includes the number (count) of users with the same or similar index score, and the similarity score is based on the axial scale of the plane in which the similarity distribution is expressed.

In one embodiment, the state analysis module selects index scores for two or more text types among the index scores for text types used to calculate similarity, and creates a similarity distribution based on the index scores for the selected text types. It may be further configured to calculate connectivity between users in each cluster based on

The connectivity is calculated in the form of a connectivity distribution, and the connectivity distribution form is implemented as a two-dimensional or three-dimensional distribution depending on the number of selected index scores, and each axis corresponds to the number of users with the selected index score.

In one embodiment, the state analysis module generates the user's state expressed in some or all of the user's extracted characteristics, index scores for the characteristics, commonality, similarity, and connectivity of the users.

When read by at least one processor, the computer-readable recording medium according to another aspect of the present application may record a program including instructions for executing a method of analyzing the user's state in the metaverse. The method of analyzing the user's status in the metaverse includes capturing the user's text or voice corpus data in the metaverse - the corpus consists of one or more unit texts; Embedding the user's text or voice corpus data and converting it into vector data corresponding to each unit text; extracting characteristics of the user from vector data corresponding to each unit text, wherein the characteristics include one or more items and the items include a text type; calculating an index score for each characteristic for each user based on the calculation value calculated in the process of extracting the characteristic; and calculating the status of each user based on at least one of the user characteristics extracted for each user and the index score for the characteristic.

The system for analyzing the user's status in the metaverse according to one aspect of the present invention can determine the user's status by analyzing it in multiple dimensions through the distribution of index scores calculated based on the user's characteristics.

As a result, it is possible to generate a multidimensional analysis report that can theoretically support the analysis results of applying deep learning technology to the user's behavior log.

Additionally, the system that analyzes the user's status in the metaverse can effectively provide marketing content for the target group by extracting a target group that has commonality or connectivity in the behavior log based on the user's characteristics.

Additionally, the system that analyzes the user's status in the metaverse can provide a customized metaverse based on the user's characteristics.

The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description of the claims.

In order to more clearly explain the technical solutions of the embodiments of the present invention or the prior art, drawings necessary in the description of the embodiments are briefly introduced below. It should be understood that the drawings below are for illustrative purposes only and not for limiting purposes of the present specification. Additionally, for clarity of explanation, some elements may be shown in the drawings below with various modifications, such as exaggeration or omission.

1 is a schematic diagram of a system for analyzing the status of a user of the metaverse, according to one aspect of the present application.

Figure 2 is a schematic diagram of a service server, according to an embodiment of the present application.

Figure 3 is a schematic diagram of a process for analyzing a user's state through clustering, according to an embodiment of the present application.

FIG. 4 is a diagram calculating similarity between users belonging to some clusters in FIG. 3.

Figure 5 is a diagram extracting connectivity between users within the cluster of Figure 4.

Figure 6 is a flowchart of a method for analyzing the status of a user in the metaverse according to another aspect of the present application.

Figure 7 is a schematic diagram of the relationship between users with malicious comments and users subject to malicious comments.

Figure 8 is a schematic diagram of an analysis report that newly supplements the existing research on Figure 7.

Figure 9 shows an output message converted from the same message by a language model customized for each cluster, according to an embodiment of the present application.

Figure 10 is a schematic diagram of providing a customized message to each user using a language model customized for each cluster in Figure 9.

Hereinafter, embodiments of the present application will be examined in detail with reference to the drawings.

However, this disclosure is not intended to limit the disclosure to specific embodiments, and should be understood to include various modifications, equivalents, and/or alternatives to the embodiments of the disclosure. . In connection with the description of the drawings, similar reference numbers may be used for similar components.

In this specification, expressions such as “have,” “may have,” “includes,” or “may include” refer to the corresponding features (e.g., numerical values, functions, operations, steps, parts, elements, and/or components). It refers to the presence of components such as etc.) and does not exclude the presence or addition of additional features.

When a component is said to be "connected" or "connected" to another component, it is understood that it may be directly connected to or connected to the other component, but that other components may exist in between. It should be. On the other hand, when it is mentioned that a component is “directly connected” or “directly connected” to another component, it should be understood that there are no other components in between.

Expressions such as “first,” “second,” “first,” or “second” used in various embodiments may modify various elements regardless of order and/or importance, and limit the elements. I never do that. The above expressions can be used to distinguish one component from another. For example, the first component and the second component may represent different components, regardless of order or importance.

Embodiments of singular expressions used in this specification also include embodiments of plural expressions, unless phrases related to the singular expression clearly indicate the contrary.

As used herein, the expression “configured to” may mean, for example, “suitable for,” “having the capacity to,” depending on the situation. ," can be used interchangeably with "designed to," "adapted to," "made to," or "capable of." The term “configured (or set) to” may not necessarily mean “specifically designed to” in hardware. Instead, in some contexts, the expression “a device configured to” may mean that the device is “capable of” working with other devices or components. For example, the phrase "processor configured (or set) to perform A, B, and C" refers to a processor dedicated to performing the operations (e.g., an embedded processor), or executing one or more software programs stored on a memory device. By doing so, it may mean a general-purpose processor (eg, CPU or application processor) that can perform the corresponding operations.

The system 1 for analyzing the user's status in the metaverse according to embodiments may be entirely hardware, or may be partially hardware and partially software. For example, a system may collectively refer to hardware equipped with data processing capabilities and operating software for running it. In this specification, terms such as “unit,” “system,” and “device” are intended to refer to a combination of hardware and software driven by the hardware. For example, the hardware may be a data processing device that includes a Central Processing Unit (CPU), Graphics Processing Unit (GPU), or other processor. Additionally, software may refer to a running process, object, executable, thread of execution, program, etc.

Referring to FIG. 1, the system 1 for analyzing the user's status in the metaverse includes the user's electronic device 100 and a service server 200.

The electronic device 100 and the service server 200 are connected through a telecommunication network.

The telecommunication network provides a wired/wireless telecommunication path through which the electronic device 100 and the service server 200 can transmit and receive data with each other. A telecommunication network is not limited to a communication method according to a specific communication protocol, and an appropriate communication method may be used depending on the implementation. For example, when configured as an Internet Protocol (IP)-based system, the telecommunication network may be implemented as a wired and/or wireless Internet network. Alternatively, when the electronic device 100 and the service server 200 are implemented as mobile communication terminals, the telecommunication network may be implemented as a wireless network such as a cellular network or a wireless local area network (WLAN) network.

The electronic device 100 is a client terminal device that communicates with the service server 200 and includes at least one processor capable of processing data, a memory for storing data, and a communication unit for transmitting/receiving data. This electronic device 100 may be implemented as, for example, a laptop computer, other computing device, tablet, cellular phone, smart phone, smart watch, smart glasses, head-mounted display (HMD), other mobile device, or other wearable device. there is.

In certain embodiments, the electronic device 100 is configured to input text/voice of a user accessing the metaverse through an input device and microphone. Additionally, in some embodiments, the electronic device 100 may be configured to provide an analysis result report or a metaverse intervention result of a metaverse service provider to the user.

The service server 200 is a plurality of computer systems or computer software implemented as network servers. Here, a network server is a computer system and computer that is connected to a sub-device that can communicate with other network servers through a computer network such as a private intranet or the Internet, receives a request to perform a task, performs the task, and provides a performance result. Refers to software (network server program). However, in addition to these network server programs, it should be understood as a broad concept that includes a series of application programs operating on a network server and, in some cases, various databases built within it. The service server 200 may be implemented as any type or combination of types of computing devices, such as a network server, web server, file server, supercomputer, desktop computer, etc. To this end, the service server 200 includes at least one processor capable of processing data, a memory for storing data, and a communication unit for transmitting/receiving data.

The service server 200 is a server that operates a service that creates and provides a metaverse. The service server 200 generates and provides metaverse data for displaying metaverse spaces and objects on the connected user's electronic device 100.

The metaverse data is data created by rendering an object model in a two-dimensional or three-dimensional virtual space. The objects include objects related to places provided by the virtual space (eg, products, installations, etc.) or avatar objects pointing to users.

The service server 200 receives the user's text or voice data through the user's electronic device 100 connected to the metaverse and generates a behavior log for each user. The service server 200 analyzes the user's status from text or voice data in the user's behavior log.

Referring to FIG. 2, the service server 200 includes a data collection module 210, a pre-processing module 220, a characteristic analysis module 240, an index analysis module 250, and a state analysis module 260. In some embodiments, the service server 200 may further include a natural language DB 230 and/or an analysis DB 270. Additionally, the service server 200 may further include a report generation module 280 and/or a metaverse management module 290.

The components in FIG. 2 refer to components from a software perspective that are functionally implemented according to the operation of hardware (processor, memory, etc.) of the service server 200. For example, the data collection module 210 may be implemented by a communication unit or a processor. Additionally,

various analysis modules

240, 250, and 260 may be implemented by a processor. Additionally, the natural language DB 230 and analysis DB 270 may be implemented by memory.

The data collection module 210 is configured to capture each user's natural language data generated in a metaverse accessed by a plurality of users. The natural language data may be text or voice data. The corpus contains one or more unit text/speech.

The data collection module 210 may capture a corpus including one sentence or phrase as natural language data in a single capture operation.

In some embodiments, the data collection module 210 may capture natural language data in the metaverse by retrieving natural language data included in the user's behavior log.

The data collection module 210 may store captured text or voice data of each of a plurality of users as raw natural language data in the natural language DB 230.

Additionally, the data collection module 210 may search for related data of natural language data. The data collection module 210 may search some or all of the data other than the natural language data in the behavior log as related data to the captured natural language data.

The searched related data may be added to the captured natural language data and stored together in the natural language DB 230. The related data may include, for example, personal information of the user who created the natural language data, creation time, creation location, etc. The creation location refers to a location within the metaverse.

The preprocessing module 220 is configured to convert the captured raw natural language data into data in a predetermined format for state analysis.

If the raw data is voice data, the preprocessing module 220 may be configured to convert the voice data into text data and then convert it into data in the predetermined format. For example, the pre-processing module 220 may include a speech-to-text (STT) unit that converts input speech into text.

The STT unit is configured to extract voice features from voice data and calculate text corresponding to the voice data based on the extracted voice features. The STT unit may be comprised of an artificial neural network. The parameters of the STT unit may have values for converting the language speech of the training data set into language text by the learning method of the machine learning model.

The preprocessing module 220 may convert text data of raw natural language data or text data primarily converted from voice data into data in a pre-designated format using a pre-stored natural language processing model.

In certain embodiments, the predefined format may be a vector data format. The natural language processing model may be a machine learning model configured to process the embedding of input text and produce numerical output data in vector form. The machine learning model may include a neural network structure for natural language processing.

In some embodiments, the natural language processing model may be configured to tokenize input text into a plurality of tokens and perform embedding processing on each of the plurality of tokens to calculate a text vector. Here, tokens may be tokenized on a word-by-word basis or a context-by-context basis. Here, the context unit is a text unit larger than a word and may be, for example, a phrase or sentence.

The natural language processing model, for example, BERT (Bidirectional Encoder Representations from Transformers), ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately), tokenizes and embeds the input text word by word, or tokenizes it. It may be configured to segment and embed words into larger units of text (e.g., phrases, sentences, paragraphs, etc.) and/or encode them in token order. However, the natural language processing model is not limited to the above-described model and may have various structures capable of calculating text vectors from input text.

The parameters of the natural language processing model may be designed in advance to have values for processing the target language. For example, if the system 1 is designed to process Korean language data, the preprocessing module 220 may include KoBERT, KcELECTRA, or other natural language processing model designed to process Korean language.

In some embodiments, before tokenization, the preprocessing module 220 extracts and removes preset special symbols and stopword texts from a corpus of text to be preprocessed, and/or performs normalization processing. It may be configured to do so.

The preprocessing module 220 may use reference texts pre-registered as special symbols and stop words to check whether the text corpus to be pre-processed includes texts corresponding to special symbols and stop words. The preprocessing module 220 may remove special symbols and stop words from the text corpus and perform a tokenization operation.

The normalization process includes a cleaning process and/or a regular expression process to remove noise data from the existing corpus.

In addition, the pre-processing module 220 extracts stems and/or headwords before embedding processing and embeds them with at least some tokens among a plurality of tokens, and/or extracts unregistered text for which no vector is defined after embedding. It may be further configured to newly register as a registered text.

The preprocessing module 220 may extract lemmas and stems from a plurality of tokens using a lemmat/stem extraction tool. The preprocessing module 220 may include, for example, a lemmatization tool such as WordNetLemmatizer, or a stemming tool such as the Porter algorithm. The preprocessing module 220 may replace the text from which the lemma and stem have been extracted with the extracted lemma and stem and embed it with the remaining text, or may process the extracted lemma and stem together with the plurality of tokens.

The preprocessing module 220 may check undefined vectors in a plurality of tokens. The definition or registration text of the vector may be stored in advance in the natural language DB 230. Text for which this vector is not defined may be unregistered text.

The preprocessing module 220 may update the text corresponding to the undefined vector with a new registered text.

Additionally, in some embodiments, the preprocessing module 220 may add the data characteristics and importance tags to data converted to data in the pre-designated format and store the data in the natural language DB 230.

The data characteristics are characteristics extracted during the preprocessing process and are text characteristics that are different from the user characteristics described below.

The importance tag is a tag that indicates whether the expression is not very important in interpreting the meaning of the actual sentence. For example, the importance tag may be given to prepositions (for, to, the), particles, etc.

The preprocessing module 220 may associate the data characteristics and importance tags calculated during the preprocessing process with the corresponding vector data and store them in the natural language DB 230.

The preprocessing module 220 supplies the text vector calculated through preprocessing to the feature analysis module 240.

The characteristic analysis module 240 is configured to extract characteristics of each user from a text vector converted from captured text or converted text.

The above characteristics represent characteristics that describe the user's activity itself or the user's interaction with other users on the metaverse.

The characteristics may include one or more of the following: the type of text from which the characteristic is extracted, the frequency of use of the text classified into that type, the percentage of use, the style of expressing the text, and whether or not it is a key word. For example, when each sentence is embedded as a word text in a corpus, the text type may be a sentence type.

In some embodiments, the text type may include one or more types of emotional expressions, behavioral expressions, hate speech, political speech, emoticon expressions, crude language expressions, and time expressions. The above unrefined language expressions represent, for example, ㅋㅋㅋ, abbreviations, etc.

The text type of a specific expression/utterance may be classified and extracted as one of one or more pre-designated types. For example, if a specific text has an emoticon expression, the characteristic may have an emoticon expression as a type item, regardless of what meaning the emoticon means.

The feature analysis module 240 may extract features including text type using a pre-learned feature recognition model.

Specifically, the feature analysis module 240 inputs vector data corresponding to each unit text into the feature recognition model. Probability values that the corresponding unit text will be classified into each of the plurality of text types preset for the feature recognition model may be calculated, and the text type of the unit text may be determined based on the probability values calculated for each type. User characteristics, including the determined text type, are extracted from the corresponding unit text.

The feature recognition model is a machine learning model configured to perform a classification operation to determine the corresponding text type by inferring the correlation between the input vector data corresponding to the text and the type of feature. The machine learning model has a neural network structure.

The neural network structure is configured to determine the type of feature with the highest probability of classifying text data based on the correlation between vector data obtained by embedding text data and the type of feature. For example, the neural network structure may be composed of a fully connected layer or other NN structure including a multi-layer perceptron (MLP).

The feature recognition model may be trained in advance using a first training data set consisting of a plurality of first training samples. Each first training sample has first training data and label data, respectively. The first training data may be first sample vector data obtained by embedding text data obtained in an online or offline space other than the metaverse. The label data indicates the type of actual characteristics that the text has.

When the first training data of each first training sample is input, the characteristic recognition model calculates a result value (i.e., type of predicted characteristic) by processing the first training data with internal parameters. The parameters of the feature recognition model are learned so that the error between the calculation result value and the value of the label data (actual feature type) is further reduced. The learned feature recognition model has parameters that minimize the error.

The feature analysis module 240 inputs vector data corresponding to the user's captured text into a feature recognition model learned in advance using the first training data set. The feature recognition model calculates each vector data with pre-learned parameters and calculates the probability that the unit text corresponding to each of the plurality of types will be classified. The type with the highest calculated probability value may be determined as the type of user characteristic.

Additionally, in some embodiments, a pre-trained feature recognition model may be re-trained using a second training data set. Then, the feature analysis module 240 determines the text type from vector data corresponding to the unit text after re-learning is completed using the feature recognition model re-learned using the second training data set. It could be configured further. Since the re-learning process is similar to the above-described learning process, the differences are mainly described.

Each second training data in the second training data set may be second sample vector data obtained by embedding text or voice data captured in the metaverse. In some of the above embodiments, the system 1 for analyzing the user's status in the metaverse extracts the user's characteristics using an already published Korean data set as a first training data set before providing the metaverse service, The feature recognition model may be fine-tuned using a second training data set obtained by embedding text data of service users captured while providing the metaverse service.

Additionally, in some embodiments, at least some of the text types may have two or more sub-types. For example, a type of emotional expression may have subtypes of positive, neutral, and negative. Additionally, types of behavioral expressions may have various behaviors as sub-types.

The characteristic analysis module 240 performs a binary classification operation to determine whether a text type without a sub-type is classified into the corresponding text type, and for a text type with a sub-type, 2 of the corresponding text type. It is configured to perform a multi-classification operation to determine classification into any one of the above sub-types. For example, the characteristic analysis module 240 performs a binary classification operation to recognize the type of hate speech as whether or not it corresponds to hate speech, but performs a multi-classification operation to recognize more detailed emotional states for the type of emotional expression. You can also perform actions.

In some embodiments, the feature recognition model may include, for each of the at least some text types, each sub-model that classifies two or more sub-types of the corresponding text type.

In one example, the characteristic analysis module 240 may include a sub-model that recognizes whether the user's emotional expression is positive, negative, or neutral. The characteristic analysis module 240 inputs the user's unit text into the sub-model to calculate a probability value that the unit text will be classified into each of the positive, negative, and neutral sub-types, and calculates the probability value for each sub-type. Based on this, the unit text may determine any one of positive, negative, and neutral subtypes as the unit text type. Since the structure and learning process of the sub-model are similar to the feature recognition model described above, detailed description is omitted.

The usage ratio is the ratio of unit texts having the same determined type compared to the entire corpus.

The key word may be determined based on whether the unit text includes key words representing a proper noun, a location word, a time word, or an age word.

The characteristic analysis module 240 may extract user characteristics in vector form. The user's characteristics in the form of a vector may also be referred to as a user's characteristic vector.

Each dimension of the user's feature vector corresponds to a respective item. For example, when features including the user's text type are extracted, the user's feature vector includes a value indicating the text type determined by the value of the dimension corresponding to the text type.

In addition, when characteristics are extracted that further include the user's frequency of use of text types, the user's characteristic vector includes the frequency of emotional expression, frequency of behavioral expression, frequency of hate speech, frequency of political speech, frequency of emoticon use, and use of unrefined language. Frequency and time expression may further include values indicating frequency.

The index analysis module 250 is configured to calculate an index score for each characteristic of a plurality of users, a calculation value calculated in the process of extracting characteristics.

In certain embodiments, when the characteristic analysis module 240 extracts user characteristics including the text type of the unit text using a characteristic recognition model, the index analysis module 250 generates probability values in the process of classifying the text type. Based on this, an index score for the text type included in the user's characteristics may be calculated. The index score quantifies the degree to which the corresponding unit text matches the determined text type in the entire range of the corpus consisting of a plurality of unit texts. Matching deviations may exist even between different unit texts determined to be of the same type, and this deviation is quantified as an index score.

The calculation value includes a probability value calculated during the classification operation of the feature recognition model. The probability value includes a probability value that the unit text will be classified as a determined text type and a probability value that the unit text will be classified as another undetermined text type. The index analysis module 250 is configured to calculate index scores for each text type included in the user's characteristics based on the calculation value calculated in the process of extracting each type of the user's unit text.

The index analysis module 250 may calculate an index score for the text type included in the user's characteristics according to the operation in which the text type is determined.

When the text type of the unit text is determined through binary classification, the index analysis module 250 determines the total number of unit texts included in the corresponding user's text or voice corpus, and each unit text is a plurality of text types in the feature recognition model. It may be configured to calculate an index score for the text type included in the user's characteristic, based on the probability of being classified as a text type included in the characteristic and the probability that the same unit text will not be classified as a text type included in the characteristic. there is. In some embodiments, a binary classification is treated as determined if the determined text type is a superclass with no subclasses.

In order to calculate an index score for the text type of the unit text, the index analysis module 250 calculates the probability value of the unit text being classified into each of the plurality of text types of the feature recognition model from the feature analysis module 240. Obtain, obtain a probability value of being classified as a text type included in the feature from among each probability value, obtain a probability of being classified into each of the remaining text types among the plurality of text types of the feature recognition model, and calculate the sum of these. The probability that a text is not classified as a text type included in the above characteristics can also be calculated. These probability values may be obtained for each unit text included in the same corpus.

In some embodiments, the index analysis module 250 calculates the total number of unit texts included in the user's text or voice corpus, calculated for each unit text, and each unit text to be classified into the text type included in the characteristic. An index score for the text type included in the user's characteristic may be calculated based on the sum of the probability values and the sum for each unit text of the probability value that the corresponding text is not classified as the text type included in the characteristic. .

For example, if a binary classification operation has been previously performed, the index analysis module 250 may calculate an index score for the text type (c) included in the user's characteristics using the following equation.

The above equation can be set to have an output value within the scale range so that the user can more easily recognize it by adjusting the constant. Constant is a value arbitrarily designated by the user to set the scale range of the index score. For example, it may be 0 or 100, but is not limited thereto.

Here, Ttotal represents the total number of unit texts included in the text or voice corpus. The unit text may be a sentence as described above. Pc represents the probability value that each unit text will be classified. Pc' represents the probability that the same unit text is not classified as a text type included in the above characteristic. In a binary classification operation, the sum of probability Pc' and probability Pc has a relationship of 100% (i.e., Pc'+Pc=1).

In one example, two sentences were captured: "I hate old people for no reason. They only harm society", and these two sentences were each treated as unit text, and the type of hate speech was determined by whether or not it was classified as hate speech. After preprocessing, each was input into a feature recognition model that performs a binary classification operation, and the text types of both sentences were determined to be hate speech. For this decision, the probability of being classified as hate speech (Pc) for each sentence and the probability of being classified as hate speech were determined. Let's assume that the probability (Pc') of not doing so is calculated as (0.924, 0.076) and (0.742, 0.258), respectively.

When the Constant value is set to 100 and the corresponding probabilities Pc and Pc' are applied to Equation 1, respectively, set to a scale range between 0 and 200, the index score for the type of text (i.e. hate speech) included in the user's characteristics can also be calculated as 166.6 (=[((0.924+0.742) - (0.076+0.258)) / 2]Х100 + 100).

In addition, when the text type of the unit text is determined through multiple classification, the index analysis module 250 determines the total number of unit texts included in the text or voice corpus of the user, and each unit text is a submodel within the feature recognition model. Based on the probability of being classified as a text type included in the characteristic among the plurality of text types of and the probabilities of the same unit text being classified into each of the remaining text types among the plurality of text types of the sub model, the text type included in the user's characteristics It may be configured to calculate an index score for . If the determined text type is a subtype, it is treated as determined by multiple classifications. The plurality of text types of the sub-model represents a plurality of sub-types having the same upper-level type. The index score is a score for a higher-order type encompassing a plurality of lower-level types.

If there are n multiple sub-types with the same super-type, the probability of being classified into the text type included in the feature (i.e., a specific sub-type) and the n-1 probability values of being classified into each of the remaining n-1 text types are It is calculated. The sum of these n probability values may represent 100%.

Since the index score calculation process performed after multiple classification is similar to the index score calculation process performed after binary classification, the differences are mainly described.

In some embodiments, the index analysis module 250 calculates the total number of unit texts included in the user's text or voice corpus, each unit text, and the unit text is included in the characteristics among the plurality of text types of the sub model. User characteristics based on the sum of the probability values of each text type being classified as the same text type and the sum of the probability values of being classified into the same text type among the probability values of being classified into each of the remaining text types among the plurality of text types of the sub model. You can also calculate index scores for the text types included in .

For example, if a multi-classification operation has been previously performed, the index analysis module 250 may calculate an index score for the text type (c) included in the user's characteristics using the following equation.

Here, n represents the number of multiple sub-types having the same upper-level type.

Additionally, in some embodiments, when the text type of the unit text is determined through multiple classification, the index analysis module 250 further bases the type weight on a preset type weight for each of the plurality of lower types having the same upper type. Index scores can also be calculated for the text types included in the user's characteristics.

For example, the index analysis module 250 may calculate an index score for the text type (c) included in the user's characteristics through the following equation in which type weights are further applied to equation 2.

Here, W ₀ represents the weight of the subtype included in the characteristic. W ₁ to W _n-1 respectively represent the weights of the remaining sub-types among the n plural text types (i.e., sub-types) of the sub-model.

In one example, two sentences were captured: “It rained this morning and I was in a good mood. How about you?” These two sentences were each treated as unit text, and three sub-characteristics (positive, positive, After pre-processing, each was input into a sub-model of the feature recognition model that performs a multi-classification operation of classifying it into any one of the sub-characteristics (neutral, negative), and the first sentence was determined to be positive and the second sentence was determined to be neutral, and for this decision, Let's assume that the probability (P _c , P ¹ _c , .. + P ^n-1 _c ) of being classified into a subtype for each sentence is calculated as (0.820, 0.090, 0.090), (0.200, 0.750, 0.050).

When W0 is set to 0.3, W1 to 0.2, and W2 to 0.1, the Constant value is set to 100, and the probability values of being classified into subtypes for each sentence are applied to Equation 3, which is set to a scale range between 0 and 200. , the index score for the text types included in the user's characteristics (i.e., emotional expressions encompassing positive and neutral) is 124.4 (=[((0.820+0.200)*0.3 + (0.090+0.750)*0.2 + (0.090+ It can also be calculated as 0.050) * 0.1) ) / 2] Х100 +100).

The state analysis module 260 is configured to calculate the state of each user based on at least one of the extracted characteristics of the user and the index score for the characteristic. The feature includes a text type, and the index score for the feature includes an index score for the text type included in the feature. The text type may also be a subtext type.

The state analysis module 260 may obtain a characteristic set consisting of characteristics extracted for each of a plurality of users. Feature vectors with different items may be extracted for each user. A feature set is obtained based on these feature vectors.

Additionally, the state analysis module 260 may obtain an index set consisting of index scores for characteristics calculated for each of a plurality of users.

In some embodiments, the state analysis module 260 may be configured to form a plurality of clusters based on the characteristics of each user in the characteristic set.

The state analysis module 260 may select some characteristics from the characteristic set and form a plurality of clusters based on the selected characteristics. In some embodiments, some of the characteristics may be two or three.

As described above, user characteristics can be expressed in vector form. The characteristics of each user in the characteristic set may be expressed in the form of a vector with all items in the set as the overall dimension of the vector. Then, selecting two features may mean reducing a multi-dimensional vector to a two-dimensional vector.

The state analysis module 260 selects some (e.g., two) characteristics through PCA (Principal Components Analysis) or other dimensionality reduction algorithm, or selects some (e.g., two) characteristics by user input. You can also select.

The state analysis module 260 may cluster a plurality of users into a plurality of user clusters based on some (eg, two) selected characteristics.

For example, the state analysis module 260 may form a cluster using DBSCAN (Density-Based Spatial Clustering of Applications with Noise) or another cluster formation algorithm. DBSCAN is a density-based clustering method that clusters based on the reference radius (Epsilon) and the minimum number of vectors in the cluster.

In some embodiments, the state analysis module 260 forms two or more clusters through a cluster formation algorithm, and if there are remaining users who do not belong to the formed cluster, it may additionally form other clusters to which the remaining users belong. there is.

The state analysis module 260 may extract commonalities between users in a cluster based on at least some of the characteristics, index scores, and related data of individual users in the cluster. The commonality refers to common attributes of individual users in the cluster. Users within the same cluster may have differences in commonality attribute values, but share the attributes with each other.

The commonality may correspond to user characteristics, or may correspond to items in related data.

Referring to FIG. 3, analysis data for each user may be expressed as a two-dimensional vector corresponding to two characteristics selected through PCA. Based on the related data of users belonging to the formed cluster, the state analysis module 260 may extract age information items such as “adult,” “high school student,” and “middle school student or younger” in the related data as common features of users belonging to the same cluster. .

In addition, the status analysis module 260 forms other clusters composed of users who do not belong to the three formed clusters, and unlike other clusters, users belonging to the other clusters cannot define trends in terms of age information. It can also be extracted based on the commonality of the other clusters.

The state analysis module 260 may be further configured to calculate similarity or connectivity between users belonging to the formed cluster. The similarity or connectivity may be calculated based on an index score for the text type included in the user's characteristics.

In some embodiments, the state analysis module 260 may apply a network analysis algorithm to users belonging to the same cluster to calculate similarity or connectivity between users belonging to the same cluster. Here, users are treated as nodes. The similarity or connectivity is calculated for at least some of the plurality of clusters formed.

Figure 4 is a diagram calculating similarity between users belonging to some clusters in Figure 3.

In Figure 4, the solid line inside the area indicates the boundary of the cluster area for middle school students and younger. In Figure 4, the dotted line inside the area indicates the boundary of the border cluster area.

Referring to FIG. 4, the state analysis module 260 selects two or more index scores among text types included in the characteristics of users belonging to the same cluster and provides an index score for each of the selected two or more index scores for the corresponding text type. It may be to calculate the similarity of users within each cluster, expressed as an index score.

In some embodiments, the similarity may be calculated in the form of a similarity distribution. The similarity distribution form may include the number (count) of users with the same or similar index scores as shown in FIG. 4. Here, whether the index score is similar is based on the axis unit scale of the plane in which the similarity distribution is expressed (eg, x-axis unit in FIG. 4).

For example, the state analysis module 260 determines the characteristics of users belonging to the same cluster for each of the remaining clusters except the adult cluster among the clusters formed in FIG. 3, that is, users in the high school student cluster, users in the middle school student cluster or younger, and other clusters. Among the included text types, the text type is classified based on the index score for emoticon expressions (index_emo), the index score for time expressions (index_time), the index score for political remarks (index_pol), and the index score for hate speech (index_hate). The similarity of users within each cluster can also be calculated for each index score.

Referring to FIG. 5, the state analysis module 260 selects index scores for two or more text types among the index scores for text types used to calculate similarity, and calculates the index score based on the index scores for the selected text types. The connectivity between users within each cluster can also be calculated based on the similarity distribution.

The connectivity may be calculated in the form of a connectivity distribution. The connectivity distribution form is implemented as a two-dimensional or three-dimensional distribution depending on the number of selected index scores, and each axis corresponds to the number of users with the selected index score. Each coordinate value in the connectivity distribution is the number (count) of users with the same or similar index score corresponding to each axis, which is used as a coordinate value in the similarity distribution. This coordinate value may be obtained from the similarity distribution in Figure 4.

For example, the state analysis module 260 may generate, for each cluster shown in FIG. 4, an index score for emoticon expressions (index_emo), an index score for time expressions (index_time), and an index score for political remarks (index_pol ), select two random index scores among the index scores for hate expression (index_hate), and calculate the connectivity distribution based on the y-axis coordinate (i.e., number of users) of the similarity distribution based on each selected index score. It may be possible.

The state analysis module 260 may generate the user's state expressed in some or all of the user's extracted characteristics, index scores for the characteristics, commonality, similarity, and connectivity of the users. This expression result may be calculated as a result of analyzing the user's state.

Some or all of the analysis results of the characteristic analysis module 240, index analysis module 250, and state analysis module 260 may be stored in the analysis DB 270.

The report generation module 280 generates an analysis report including analysis results of one or more of extracted characteristics, calculated index scores, and the user's status, and transmits it to the user's electronic device 100 to transmit the analysis results to the user. It may also be provided to.

In some embodiments, the report generation module 280 may generate an analysis report including multidimensional analysis results that can theoretically support the deep learning analysis results.

For example, the report generation module 280 may generate and include a result of analyzing the relationship between the user's status and analysis purposes such as the user's style, behavior, interests, cognitive process, social process, emotional state, etc. It may be possible.

This will be described in more detail with reference to Figures 7 and 8 below.

The metaverse management module 290 may intensively manage a specific target user group based on the analysis results, or may perform a metaverse management operation to provide a customized service to a specific target user.

The metaverse management module 290 may intervene in the user's status in the metaverse by performing management operations corresponding to what emotions the user is currently feeling and what the situation is based on the calculated user's status. .

In some embodiments, when a cluster is formed based on user characteristics, the metaverse management module 290 may perform a matching operation to introduce users belonging to the same cluster. Through the matching operation, users can more easily form a sense of belonging within the metaverse.

Additionally, in some embodiments, when a cluster is formed based on user characteristics, the metaverse management module 290 creates a dedicated space on the metaverse for users belonging to the same cluster, and the metaverse management module 290 creates a dedicated space on the metaverse for users belonging to the same cluster. You may also grant permission to visit the dedicated space.

The metaverse management module 290 can also install marketing content in a dedicated space. The marketing content may be marketing content customized to the cluster of the dedicated space.

The marketing content includes descriptive information about the marketing target and advertisements about the marketing target.

In this way, the system 1 for analyzing the status of users in the metaverse may provide differentiated marketing to specific target users divided into clusters through a dedicated space.

As a result, the system (1), which analyzes the status of users in the metaverse, determines which target group is currently doing more activities in the metaverse, which group has a greater influence on others, and how much the consumption costs are. It can be easily verified, and ultimately effective marketing is possible.

Additionally, in some embodiments, the metaverse management module 290 may operate on the user's extracted characteristics, index score, or status based on an analysis result of one or more of the user's extracted characteristics, calculated index score, and status of the user. You can provide a message to the user in the metaverse to switch to the opposite side, or you can control the user's surroundings in the metaverse.

The metaverse management module 290 sends a message to change the mood to a user who expresses a lot of depression, uses a lot of self-destructive language or increases the speed of use, or sends a message to share the status with other users. By doing so, traits corresponding to depression (e.g., negative emotions) or traits corresponding to self-destructive language usage (e.g., hate speech) are transformed into relatively opposite traits (e.g., neutrality, positive emotions, or non-hate speech). You can also switch.

Additionally, the metaverse management module 290 may provide messages in the metaverse to users belonging to the cluster using a language model corresponding to the cluster.

The language model may be BART, Seq2seq, or other machine learning model for chatbots. The language model may be applied to an NPC avatar in the metaverse and used by the NPC avatar to communicate with the user.

The language model corresponding to the cluster may be learned using training data obtained by embedding captured text or voice data of users belonging to the cluster in the metaverse.

Referring to the feature recognition model, the language model corresponding to the cluster may be learned using a cluster-customized second training data set generated from users belonging to the same cluster.

In this way, providing customized messages in the metaverse to users using the language model corresponding to the cluster will be described in more detail with reference to FIGS. 9 and 10 below.

The metaverse management module 290 provides a message in an initially set style to users whose status has not been analyzed by the status analysis module 260.

In addition, the metaverse management module 290 uses a language model learned using a training data set generated from the cluster to which the user belongs, for the user whose state has been analyzed by the state analysis module 260, to identify the NPC avatar and the user. You can also conduct conversations.

It will be clear to those skilled in the art that the system 1 for analyzing the user's status in the metaverse may include other components. For example, a server may include other hardware elements necessary for the operations described herein, including input devices for data entry and output devices for printing or other data presentation. Additionally, the system may further include a network, network interface, and protocol connecting the server and external devices (eg, user terminals, external databases, etc.).

The method of analyzing the user's status in the metaverse of FIG. 6 may be performed by one or more computing devices, such as the service server of FIG. 2.

Referring to FIG. 6, the method of analyzing the status of a user in the metaverse includes capturing the user's text or voice corpus data in the metaverse for each of a plurality of users accessing the metaverse from a computing device. (S610); And a step of embedding the user's text or voice corpus data and converting it into vector data corresponding to each unit text (S620). Here, the corpus consists of one or more unit texts.

In some embodiments, in the case of voice corpus data, step S620 may include converting the captured voice data into text data using a voice conversion unit pre-stored in the computing device before conversion.

In addition, the step (S620) includes tokenizing the captured text data or converted text data into a plurality of tokens using a natural language processing model pre-stored in the computing device, and processing the plurality of tokens to embed them. It may also include a step of calculating vector data.

In addition, the step (S620) includes extracting and removing preset special symbols and stopword texts from the corpus before tokenization; normalizing the corpus; Extracting stems and/or lemmas before embedding processing and embedding them together with at least some tokens among a plurality of tokens; and extracting unregistered text for which no vector is defined after embedding and newly registering it as registered text.

This step (S620) has been described above with reference to the preprocessing module 220, and detailed description will be omitted.

Additionally, the method of analyzing the user's status within the metaverse includes extracting the user's characteristics from vector data corresponding to each unit text (S640).

The step (S640) includes inputting vector data corresponding to each unit text into the feature recognition model; calculating probability values that the corresponding unit text will be classified into each of a plurality of types preset for the feature recognition model; and determining the text type of the unit text based on the probability value calculated for each type in order to extract characteristics including the determined text type.

In some embodiments, in step S640, the text type of the unit text may be determined using different feature recognition models depending on whether or not natural language data is captured in the metaverse.

Before capturing natural language data in the metaverse, the text type of the unit text may be determined using a feature recognition model learned using the first training data set (S640).

On the other hand, after capturing natural language data in the metaverse, the text type of the unit text may be determined using a feature recognition model learned using the second training data set (S640).

In some embodiments, the step (S640) includes performing a binary classification operation to determine whether a text type that does not have a sub-type is classified into the corresponding text type; And, for a text type having a sub-type, a step of performing a multi-classification operation to determine whether the text type is classified into one of two or more sub-types of the text type.

In step S640, the characteristic may include one or more items. The characteristics may be extracted as a characteristic vector, expressed in vector form (S640).

This step (S640) has been described above with reference to the characteristic analysis module 240, and detailed description will be omitted.

In addition, the method of analyzing the status of a user in the metaverse includes calculating an index score for each characteristic for each individual user based on the calculation value calculated in the process of extracting the characteristic (S650).

The step (S650) may be to calculate index scores for each text type included in the user's characteristics based on the calculation value calculated in the process of extracting each type of unit text of the user. The calculation value includes a probability value calculated during the classification operation of the feature recognition model. The probability value includes a probability value that the unit text will be classified as a determined text type and a probability value that the unit text will be classified as another undetermined text type.

In some embodiments, in step S650, when the text type of the unit text is determined through binary classification, the total number of unit texts included in the corresponding user's text or voice corpus, and each unit text is included in the feature recognition model. Calculate an index score for the text type included in the user's characteristic based on the probability of being classified as the text type included in the characteristic among a plurality of text types and the probability that the same unit text will not be classified as the text type included in the characteristic. It may be that you do it. If the determined text type is a higher-level type without lower-level types, it is determined as a binary classification.

Additionally, in some embodiments, the step (S650) includes the total number of unit texts included in the user's text or voice corpus, calculated for each unit text, and each unit text to be classified into the text type included in the characteristic. An index score for the text type included in the user's characteristic may be calculated based on the sum of the probability values and the sum for each unit text of the probability value that the corresponding text is not classified as the text type included in the characteristic. there is.

In some other embodiments, the step (S650) is performed when the text type of the unit text is determined through multiple classification, the total number of unit texts included in the text or voice corpus of the user, and each unit text is calculated using a feature recognition model. Based on the probability of being classified as a text type included in the characteristics among the plurality of text types of my sub model and the probabilities of the same unit text being classified into each of the remaining text types among the plurality of text types of the sub model, included in the user's characteristics It may be to calculate an index score for the type of text used. If the determined text type is a subtype, it may be determined as a multiple classification.

In addition, in some other embodiments, the step (S650) includes the total number of unit texts included in the user's text or voice corpus, the unit text calculated for each unit text, and the corresponding unit text characteristics among the plurality of text types of the sub model. Based on the sum of the probability values of each of the probability values to be classified as a text type included in the sub-model and the sum of the probability values of being classified as the same text type among the probability values of being classified as each of the remaining text types among the plurality of text types in the sub-model, the user It may be to calculate an index score for the text type included in the characteristics of .

This step (S650) has been described above with reference to the index analysis module 250, and detailed description will be omitted.

Additionally, the method of analyzing the status of a user in the metaverse includes calculating the status of each user based on at least one of the user characteristics extracted for each user and the index score for the characteristic (S660). The feature includes a text type, and the index score for the feature includes an index score for the text type included in the feature. The text type may also be a subtext type.

The step (S660) includes forming a characteristic set consisting of characteristics extracted for each of a plurality of users; forming a plurality of clusters based on the characteristics of each user in the characteristic set; and extracting commonalities between users within a cluster based on at least one of characteristics, index scores, and related data of individual users within the formed cluster.

In some embodiments, forming the cluster may include selecting some features from a feature set; And it may also include forming a plurality of clusters based on some of the selected characteristics.

Additionally, in some embodiments, step S660 may further include calculating similarities between users belonging to the formed cluster. Additionally, in some embodiments, step S660 may further include calculating connectivity between users belonging to the formed cluster after similarity.

Here, similarity may be calculated in the form of a similarity distribution. The step of calculating similarity in the step (S660) involves selecting two or more index scores among text types included in the characteristics of users belonging to the same cluster, and calculating the index for the corresponding text type for each of the two or more selected index scores. It may be to calculate the similarity of users within each cluster, expressed as a score.

Additionally, connectivity may be calculated in the form of a connectivity distribution. The step of calculating connectivity in the step (S660) involves selecting index scores for two or three text types among the index scores for text types used to calculate the similarity, and index scores for the selected text types. The connectivity between users within each cluster may be calculated based on the similarity distribution based on .

In addition, the step (S660) may generate the user's status expressed in some or all of the user's extracted characteristics, index scores for the characteristics, commonality, similarity, and connectivity of the users.

This step (S660) has been described above with reference to the state analysis module 260, and detailed description will be omitted.

In addition, in some embodiments, the method of analyzing the status of the user in the metaverse includes the characteristics of the plurality of users in step S640, the index score of the plurality of users in step S650, and the status of the user in step S660. generating an analysis report based on one or more analysis results (S680); And/or it may further include providing customized information to a specific target user group based on the analysis results or providing the metaverse service through a customized provision method (S690).

The report in step S680 may be based on the characteristic set and index set formed in step S670.

In the above step (S680), the user's natural language patterns generated in the metaverse can be analyzed using deep learning technology, and a multidimensional analysis report that can theoretically support the analyzed results can be generated.

The above analysis report can be used as data to support new research or as data to supplement existing research.

Figure 7 is a schematic diagram of the relationship between malicious comment users and malicious comment target users, and Figure 8 is a schematic diagram of an analysis report that newly supplements the existing research on Figure 7.

In Figure 7, users are represented as nodes. The edges connecting nodes indicate the presence or absence of malicious comments. As shown in Figure 7, a phenomenon occurs where one malicious comment user transmits malicious comments to multiple targets.

Regarding the phenomenon having the relationship shown in Figure 7, existing research is already underway to visualize the linguistic features of sentences classified as malicious comments so that they can be classified and confirmed.

According to the meta method, the user's status can be analyzed in two or more dimensions. As shown in Figure 8, it is confirmed in three dimensions that the state of the user responding to the malicious comment in the above step (S660) has more characteristics showing anger and less characteristics showing cognitive and social processes than normal comments.

In some embodiments, step S690 includes introducing users belonging to the same cluster when a cluster is formed based on user characteristics.

In addition, in step S690, when a cluster is formed based on user characteristics, a dedicated space for users belonging to the same cluster is created on the metaverse, and users belonging to the same cluster are allowed to visit the dedicated space. granting permission to do so; And it may also include installing marketing content customized for the cluster in the dedicated space.

In addition, in step S690, the user's extracted characteristics, index score, or state are relatively converted to the opposite side based on the analysis result of one or more of the user's extracted characteristics, calculated index score, and state. If possible, it may include providing a message to the user in the metaverse or controlling the user's surrounding environment in the metaverse.

Additionally, the step (S690) may include providing a message in the metaverse to a user belonging to the cluster using a language model corresponding to the cluster.

The language model corresponding to the cluster may be learned using training data obtained by embedding captured text or voice data of users belonging to the cluster in the metaverse. In some embodiments, the language model may be trained for customization for a cluster when the cluster's commonality is style.

Figure 9 shows an output message converted from the same message by a language model customized for each cluster, according to an embodiment of the present application, and Figure 10 shows an output message converted from the same message by a language model customized for each cluster of Figure 9. This is a schematic diagram of the steps for providing a customized message.

Referring to Figure 9, the language model before customization is configured to output the message "In the metaverse, various people can gather and talk to each other and convey their emotions."

However, as shown in FIG. 9, when learning is customized according to various styles, a single output message of the language model before customization may be output in various ways according to the style.

As a result, as shown in FIG. 10, a message tailored to the user's style can be provided independently within the metaverse using a language model corresponding to the cluster to which the user belongs.

When implementing embodiments of the present invention using hardware, application specific integrated circuits (ASICs) or digital signal processors (DSPs), digital signal processing devices (DSPDs), or programmable PLDs (PLDs) configured to perform the embodiments of the present application. logic devices), field programmable gate arrays (FPGAs), etc. may be included in the components of the present application.

The operation of the system and method for analyzing the user's status in the metaverse according to the embodiments of the present application described above may be at least partially implemented as a computer program and recorded on a computer-readable recording medium. For example, implemented with a program product comprised of a computer-readable medium containing program code, which can be executed by a processor to perform any or all steps, operations, or processes described.

The computer-readable recording medium includes all types of recording devices that store data that can be read by a computer. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage devices. Additionally, computer-readable recording media may be distributed across computer systems connected to a network, and computer-readable codes may be stored and executed in a distributed manner. Additionally, functional programs, codes, and code segments for implementing this embodiment can be easily understood by those skilled in the art to which this embodiment belongs.

The present invention discussed above has been described with reference to the embodiments shown in the drawings, but these are merely illustrative examples, and those skilled in the art will understand that various modifications and modifications of the embodiments are possible therefrom. However, such modifications should be considered within the technical protection scope of the present invention. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the attached patent claims.

There is a high possibility that the present invention will be used in the big data field that utilizes user behavior logs.

Claims

In a system that analyzes the status of users in the metaverse,

For each of the plurality of users accessing the metaverse, a data collection module configured to capture the user's text or voice corpus data in the metaverse - the corpus consists of one or more unit texts;

a preprocessing module that embeds the user's text or voice corpus data and converts it into vector data corresponding to each unit text;

a feature analysis module configured to extract characteristics of the user from vector data corresponding to each unit text, wherein the characteristics include one or more items and the items include a text type;

an index analysis module configured to calculate an index score for each characteristic for each individual user based on the calculation value calculated during the characteristic extraction process; and

Comprising a status analysis module configured to calculate the status of each user based on user characteristics extracted for each user and index scores for the characteristics,

system.
The method of claim 1, wherein the data collection module:

Capture the text or voice data of each of the plurality of users formed in the metaverse, and

configured to retrieve associated data from the captured text or voice data;

The preprocessing module is,

Convert the captured voice data into text data using a pre-stored voice conversion unit,

Characterized in that it is configured to tokenize captured text data or converted text data into a plurality of tokens using a pre-stored natural language processing model and calculate vector data by embedding the plurality of tokens.

system.
The method of claim 1,

The characteristic analysis module is,

Input vector data corresponding to each unit text into a pre-learned feature recognition model to calculate probability values that the corresponding unit text will be classified into each of a plurality of text types preset for the feature recognition model, and

It is configured to determine the text type of the unit text based on the probability value calculated for each type,

Characterized in that the text type of the above characteristics includes one or more types of emotional expressions, behavioral expressions, hate speech, political speech, use of emoticons, use of unrefined language, and time expressions,

system.
According to claim 3,

The feature recognition model is,

It is learned in advance using the first training data set,

Each training data in the first training data set is first sample vector data obtained by embedding text data obtained in an online or offline space different from the metaverse,

The characteristic analysis module is,

further configured to determine the type of user's characteristic from vector data corresponding to the user's text captured after re-learning is completed using a feature recognition model re-trained using the second training data set;

Each training data in the second training set is vector data obtained by embedding text or voice data captured in the metaverse.

system.
According to claim 3,

Among the plurality of text types of the feature recognition model, at least some text types each have two or more subtypes,

The feature recognition model includes, for each of the at least some text types, each sub-model that classifies two or more sub-types of the text type,

The feature recognition model is,

For text types that do not have subtypes, determine whether they are classified into that text type, and

Text types that have subtypes are configured to be classified into one of two or more subtypes of the corresponding text type,

Characterized in that the text type of the determined unit text includes classified subtext types,

system.
The method of claim 5, wherein the index analysis module,

It is configured to calculate an index score for the text type included in the user's characteristics based on the probability values of the process of classifying the text type,

The operation value includes a probability value calculated during the classification operation of the feature recognition model, and the probability value includes a probability value that the unit text is classified into a determined text type and a probability value that the unit text is classified into another undetermined text type. Characterized by,

system.
The method of claim 6, wherein the index analysis module,

The total number of unit texts included in the user's text or voice corpus, the sum of the probability values calculated for each unit text that the unit text will be classified as a text type included in the feature, and the corresponding text being classified as the text type included in the feature Characterized in that it is configured to calculate an index score for the text type included in the user's characteristics based on the sum of the probability values for each unit text of not being classified as the text type included in.

system.
The method of claim 6, wherein the index analysis module,

If the text type of the unit text is determined through multiple classification, the total number of unit texts included in the user's text or voice corpus, calculated for each unit text, and the unit text is included in the characteristics among the plurality of text types of the sub model User characteristics based on the sum of the probability values of each text type being classified as the same text type and the sum of the probability values of being classified into the same text type among the probability values of being classified into each of the remaining text types among the plurality of text types of the sub model. Characterized in that it is configured to calculate an index score for the text type included in,

system.
The method of claim 6, wherein the state analysis module,

Forming a feature set consisting of the extracted features for each of the plurality of users,

Forming a plurality of clusters based on the characteristics of each user in the characteristic set, and extracting commonalities between users in the cluster based on at least one of the characteristics of each user in the formed cluster, index score, and related data. to,

system.
The method of claim 9, wherein the state analysis module,

In order to calculate the similarity between users belonging to the formed cluster, two or more index scores are selected among the text types included in the characteristics of users belonging to the same cluster, and for each of the two or more selected index scores, an index for the corresponding text type is calculated. It is further configured to calculate the similarity of users within each cluster, expressed as a score, respectively,

The similarity is calculated in the form of a similarity distribution,

The similarity distribution form includes a count of users with the same or similar index score, and the similar score is based on the axial scale of the plane in which the similarity distribution is expressed.

system.
The method of claim 10, wherein the state analysis module,

Select the index scores for two or more text types among the index scores for the text types used to calculate similarity, and calculate the connectivity between users within each cluster based on the similarity distribution based on the index scores for the selected text types. It is further configured to

The connectivity is calculated in the form of a connectivity distribution,

The connectivity distribution form is implemented as a two-dimensional or three-dimensional distribution depending on the number of selected index scores, and each axis corresponds to the number of users of the selected index score.

system.
The method of claim 11, wherein the state analysis module,

Characterized by generating the user's status expressed in some or all of the user's extracted characteristics, index scores for the characteristics, commonality, similarity, and connectivity of the user,

system.
A computer-readable recording medium recording a program containing instructions for executing a method of analyzing the state of a user in the metaverse when read by at least one processor, comprising:

The method of analyzing the user's status in the metaverse is:

Capturing the user's text or voice corpus data in the metaverse - the corpus consists of one or more unit texts;

Embedding the user's text or voice corpus data and converting it into vector data corresponding to each unit text;

extracting characteristics of the user from vector data corresponding to each unit text, wherein the characteristics include one or more items and the items include a text type;

calculating an index score for each characteristic for each user based on the calculation value calculated in the process of extracting the characteristic; and

Comprising a step of calculating the status of each user based on at least one of the user characteristics extracted for each user and the index score for the characteristic,

Computer-readable recording medium.