CN117093718B - Knowledge graph mass unstructured integration method based on cloud computing power and big data technology - Google Patents

Knowledge graph mass unstructured integration method based on cloud computing power and big data technology Download PDF

Info

Publication number
CN117093718B
CN117093718B CN202311365109.0A CN202311365109A CN117093718B CN 117093718 B CN117093718 B CN 117093718B CN 202311365109 A CN202311365109 A CN 202311365109A CN 117093718 B CN117093718 B CN 117093718B
Authority
CN
China
Prior art keywords
music
data
emotion
lyrics
labels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311365109.0A
Other languages
Chinese (zh)
Other versions
CN117093718A (en
Inventor
陈泽宇
李韩
胡磊明
林金怡
吴伟华
胡高生
余武
于善龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unicom WO Music and Culture Co Ltd
Original Assignee
China Unicom WO Music and Culture Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unicom WO Music and Culture Co Ltd filed Critical China Unicom WO Music and Culture Co Ltd
Priority to CN202311365109.0A priority Critical patent/CN117093718B/en
Publication of CN117093718A publication Critical patent/CN117093718A/en
Application granted granted Critical
Publication of CN117093718B publication Critical patent/CN117093718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a knowledge graph mass unstructured integration method based on cloud computing power and big data technology, which comprises the following steps: screening out texts containing lyrics or comments, and carrying out emotion scoring by using an emotion analysis module; identifying the acquired data at the cloud server, and screening lyrics or music elements related to emotion; forming emotion label-element pairs, and combining and unifying similar or related labels to reduce the complexity of the knowledge graph; integrating the unified tag-element pairs into a simplified data frame using a data integration engine; triggering the updating flow of the knowledge graph after the data integration is completed, and adding the simplified data frame into the music knowledge graph; analyzing melodies or lyrics focused by music producers or lyrics authors through a decision tree algorithm according to the music knowledge graph; and generating a lyric modification scheme for modifying melody or lyrics according to the predicted result of the decision tree algorithm and the music knowledge graph.

Description

Knowledge graph mass unstructured integration method based on cloud computing power and big data technology
Technical Field
The invention relates to the technical field of information, in particular to a knowledge graph mass unstructured integration method based on cloud computing power and big data technology.
Background
With the rapid development of cloud computing and big data technology, it is becoming increasingly important to acquire and process large amounts of unstructured text. In the field of music, obtaining and analyzing a large amount of unstructured text such as lyrics or comments, and applying the unstructured text to music production and lyrics creation remains a challenging problem. Currently, many music producers and lyrics authors still rely on personal experience and intuition to produce music and lyrics. They need to screen a large amount of music data for information related to music elements or emotions of interest to them. However, this process is very time consuming and error prone due to the huge and ambiguous structure of music data. Therefore, it becomes necessary to build a music knowledge graph to support music composition. Building a music knowledge graph requires a large amount of data to support, including various types of musical pieces, musicians' information, music reviews, musical trends, and the like. Such data may be distributed across different sources and platforms, requiring a significant amount of time and effort to collect, sort, and clean. Meanwhile, lyrics and comments in music are unstructured data, information extraction and marking are needed, and the method is a process which requires labor investment and fine operation and requires a great deal of time and fund support. Converting unstructured music data into a knowledge graph involves complex tasks such as entity identification, relation extraction and the like. How to accomplish these tasks and embody them in a knowledge graph is a challenging problem. Because the knowledge graph is constructed, a large number of nodes and sub-nodes are formed, so that the knowledge graph is difficult to know and simply use by creators, and the knowledge graph is also an unresolved problem in terms of reducing the complexity of the knowledge graph, so that the knowledge graph can break through the practicability of the knowledge graph. Moreover, the music domain is a rapidly changing domain, and the music knowledge graph needs to be updated periodically to reflect the latest music trends and changes. Meanwhile, maintenance is also required for the map to ensure the stability and usability of the map. However, multiple rapid updates can result in a music knowledge graph that is bulky and difficult to apply. In addition, music producers and creators often have multiple names, existing knowledge maps often have difficulty in merging them, resulting in knowledge maps that are too bulky, and data may be missed when analyzing singers and creators' preferences. In addition, knowledge of the music knowledge graph is static, and it is difficult to provide personalized modification suggestions according to the personal style of the music producer or lyrics author. Therefore, the application of the music knowledge graph in music creation and lyrics modification has a certain limitation.
Disclosure of Invention
The invention provides a knowledge graph mass unstructured integration method based on cloud computing power and big data technology, which mainly comprises the following steps:
the method comprises the steps of obtaining unstructured text of the connected Voicer music, including comments and lyrics, by utilizing cloud computing power, and carrying out preliminary classification by using a naive Bayesian algorithm; screening out texts containing lyrics or comments, and carrying out emotion scoring by using an emotion analysis module; identifying the acquired data at the cloud server, and screening lyrics or music elements related to emotion; forming emotion label-element pairs, and combining and unifying similar or related labels to reduce the complexity of the knowledge graph; integrating the unified tag-element pairs into a simplified data frame using a data integration engine; triggering the updating flow of the knowledge graph after the data integration is completed, and adding the simplified data frame into the music knowledge graph; the real-time detection mechanism synchronously updates newly added elements or labels in the data integration framework, and automatically triggers the combination and simplification of the labels when the complexity exceeds a preset level; analyzing melodies or lyrics focused by music producers or lyrics authors through a decision tree algorithm according to the music knowledge graph; and generating a lyric modification scheme for modifying melody or lyrics according to the predicted result of the decision tree algorithm and the music knowledge graph.
In one embodiment, the obtaining unstructured text of the connected wok music by using cloud computing force, including comments and lyrics, and performing preliminary classification by using a naive bayes algorithm includes:
acquiring related unstructured text data, including comments and lyrics, from the linked music through the disclosed data interface using the cloud server; for the obtained unstructured text data, performing pretreatment work by using jieba, wherein the pretreatment work comprises duplication removal, cleaning, word segmentation and part-of-speech tagging; converting the text into feature vectors for use by a naive Bayesian algorithm through TF-IDF, wherein the feature vectors comprise text content and semantic information; training a naive Bayes algorithm by using text feature vectors, and learning feature distribution of each category according to training data; the preprocessed feature vectors are input into a trained naive Bayesian algorithm, classification results of texts, namely predicted categories, are output, and the texts are classified into categories of emotion positive, emotion negative, topics related to music and topics unrelated to music according to requirements.
In one embodiment, the filtering text containing lyrics or comments and performing emotion scoring by using an emotion analysis module includes:
The method comprises the steps that keywords or key phrases are determined in advance, wherein the keywords or the key phrases comprise names representing songs, albums, artist names and lyrics, and keyword matching detection is carried out on texts by using a keyword matching method; if the text contains any keyword or phrase, the text is regarded as text containing lyrics or comments; precisely or fuzzy matching is carried out on the text by using a regular expression or character string matching method; the matched text containing lyrics or comments is reserved and recorded and used as input for emotion scoring of a subsequent Bayesian classification algorithm; acquiring a training data set containing emotion tags, wherein the training data set contains text samples and corresponding emotion tags positive, negative or neutral emotion; preprocessing a text sample, including text cleaning, word segmentation and stop word removal; converting the preprocessed text into a feature vector which is input by a naive Bayes algorithm by using TF-IDF; inputting the feature vector and the corresponding emotion label into a naive Bayes classification algorithm to train a model; the naive Bayes algorithm learns the feature distribution of the categories based on the training data, calculates the conditional probability of each category, and estimates the conditional probability of each feature in each category; and carrying out emotion classification on the text to be analyzed by using a trained naive Bayesian emotion analysis model, understanding emotion tendencies of the text according to probability values of prediction results, and quantifying the emotion tendencies of the text into emotion scores.
In one embodiment, the identifying, at the cloud server, the acquired data, and screening out lyrics or music elements related to emotion includes:
acquiring a data set containing lyrics and music elements, wherein the data set comprises song, album and artist information with emotion tags or theme tags, the data set contains emotion words and lyrics expressing emotion, and emotion-related elements in the song comprise rhythm, speed and tone; constructing an emotion word library by using open source resources, wherein the emotion word library comprises words describing emotion states or related to emotion; preprocessing the obtained lyrics and music element texts, including text cleaning, word segmentation and stop word removal, carrying out emotion vocabulary matching on the texts by using an emotion word bank, and extracting words or phrases related to emotion as characteristics; training an emotion analysis model by using a training data set with emotion labels and a support vector machine; inputting the feature vector and the corresponding emotion label into a model for training, and obtaining the association between emotion and the feature; carrying out emotion recognition and screening on the obtained lyrics and music elements by using a trained emotion analysis model, converting the preprocessed text into feature vectors, carrying out emotion classification on the feature vectors by using the emotion analysis model, and judging whether the text is related to specific emotion or not according to a prediction result of the model; and screening and outputting lyrics or music elements related to the specific emotion according to the emotion analysis result.
In one embodiment, the forming emotion label-element pairs and merging and unifying similar or related labels to reduce complexity of knowledge graph includes:
obtaining song comments on the UNICOM Wo music or using a social platform to obtain the evaluation of the user on the song, and obtaining song element information including rhythm and tone in a publicly available music data set; extracting emotion words in song comments by using space, and matching the emotion words with emotion labels by using an emotion dictionary; simultaneously extracting features of song elements, including extracting tone and rhythm features, and matching with emotion labels; calculating the similarity of emotion labels by using cosine similarity, merging and unifying emotion labels with similarity higher than preset similarity, and classifying and normalizing elements; combining the matched and normalized emotion labels with elements to construct emotion label and element pairs; each emotion tag and the corresponding element form a data line and are stored in a table or data set.
In one embodiment, the integrating the unified tag-element pairs into a simplified data frame using the data integration engine includes:
Selecting a Talend data integration engine according to user requirements and data requirements; the unified music emotion labels and the corresponding data of the music elements are stored in different data sources, including a database, a CSV file and an API, so that the accessibility and the data quality of the data sources are improved; connecting to various data sources through database connection or API connection using Talend; defining data conversion and mapping rules in an operation interface of an engine, designating associated fields, data format conversion and data cleaning, and merging data corresponding to music emotion tags and music elements into a simplified data frame; according to the defined rule, running a data integration flow, automatically executing data conversion and integration operation by Talend, and integrating data corresponding to music emotion tags and music elements into a simplified data frame; after integration is completed, the generated simplified data frame is checked and verified, and the accuracy and the integrity of the data are ensured.
In one embodiment, after the data integration is completed, triggering an update procedure of the knowledge graph, and adding the simplified data frame to the music knowledge graph, including:
acquiring a current music knowledge graph structure and an updating flow, and determining an adding mode of new data and a corresponding relation between the new data and an entity and a relation in an existing knowledge graph; extracting music ID, music title, artist, emotion tag and music element information from the simplified data frame as new entity or attribute values according to the structural requirement of the music knowledge graph; loading the existing music knowledge graph by using Neo4j, and editing; according to the information extracted from the simplified data frame, adding a new music entity into the knowledge graph by using an editing tool, and distinguishing the music entity from the existing entity by using a unique identifier music ID when adding the entity; establishing an association relationship with an existing entity for a newly added music entity according to information in the simplified data frame; using two attributes of emotion labels and music elements, establishing a relation with other entities, establishing an emotion relation with the emotion entities, and establishing a similar or related relation with the music element entities; after adding the new entity and relation into the knowledge graph, verifying and checking to ensure the accuracy and the integrity of the data, and checking whether the newly added entity and relation are correctly presented in the knowledge graph; if the verification is normal, the latest knowledge graph is saved and updated, and the newly added entity and relationship are integrated with the existing data; further comprises: entity identification and disambiguation are carried out through the cyclic neural network model, whether music entities in different data frames refer to the same entity is determined, and accurate entity information is obtained; and (3) establishing a relation model through the graph neural network, and extracting the relation between the music entities from the data frame to obtain accurate entity association information.
The entity identification and disambiguation are performed through the cyclic neural network model, and whether the music entities in different data frames refer to the same entity is determined, so that accurate entity information is obtained, and the method specifically comprises the following steps:
the type of entity that needs to be identified and disambiguated, including artists, songs in the musical entity, the data frames and data sources that need to be processed are determined. The existing music database, the music related articles and comments crawled by the network are used for acquiring a data set containing sample entities and corresponding contexts as training data, so that the diversity and representativeness of the training data are ensured. The entities and contexts are annotated in the training data, each entity being assigned a unique identifier. An entity recognition and disambiguation model is built by using entity recognition, a cyclic neural network model is trained, music entities appearing in input texts are recognized, including artists and song names, and whether the music entities in different data frames are output refer to the same entity is judged. Training the entity identification and disambiguation model by using training data, and performing model tuning according to the effect of the verification data. The use of cross-validation improves the accuracy and robustness of the model. And deploying the trained entity recognition and disambiguation model into a production environment, and calling the model through an API or other modes. And inputting the entity to be disambiguated into a model, acquiring a unique identifier of the entity, and acquiring accurate entity information. And periodically monitoring the performance of the entity recognition and disambiguation model, and continuously maintaining and improving the performance, and acquiring user feedback and labels for iterative model training and optimization.
The building of a relation model through a graph neural network extracts the relation between music entities from a data frame to obtain accurate entity association information, and the method specifically comprises the following steps:
the retrieval of music data from various data sources, including music databases, chunko music, music reviews, includes songs, artists, album entities, and relationships therebetween. The data is cleaned and preprocessed, including noise removal, normalization of the data. The TF-IDF is used to extract features from descriptions of songs and artists, including style, genre, mood, and music fingerprints are used to extract features from the audio characteristics of the music itself. According to the collected data and the extracted characteristics, a graph neural network model is selected to process the relation between music entities, songs and artists of the music entities are regarded as nodes in the graph, the relation between the music entities is regarded as edges in the graph, and the graph neural network model is trained to learn the interaction between the nodes and the edges. And inputting the new music entity and the relation into the graph neural network model to obtain the association information between the music entities, inputting a new song and an artist thereof into the model, and outputting the association degree between the music entities by the graph neural network model. And using cross verification to evaluate the accuracy and the robustness of the model, and adjusting and optimizing the model according to the evaluation result.
In one embodiment, the real-time detection mechanism synchronously updates newly added elements or labels in the data integration framework, and automatically triggers the merging and simplification of labels when a predetermined complexity is exceeded, including:
monitoring the data integration framework in real time to obtain newly added elements and labels in the data integration framework; performing label merging and simplifying treatment on newly added elements and labels, and incorporating the merged and simplified labels into a data integration framework; evaluating the complexity of judging the data integration frame according to the number of the labels and the number of the label layers, and if the complexity exceeds the preset complexity, using K-means to analyze the labels in the frame to determine which labels are further combined; acquiring tag data in a data integration framework, wherein the tag data comprises names, attributes and relations of tags, converting the tag data into a similarity matrix, calculating the similarity between the tags by using cosine similarity based on extracted features, performing cluster analysis on the tags by using K-means, and dividing the tags with similarity higher than preset similarity into the same cluster according to the result of similarity calculation; for labels of the same cluster, a merging strategy is formulated according to the characteristics of the cluster and the similarity between the labels; evaluating the combined labels, and evaluating the simplification degree and influence on the data integration framework; detecting the state of the data integration framework, and merging and simplifying the data integration framework again if the complexity exceeds the preset complexity; further comprises: and calculating the complexity of the data integration framework according to the number of the labels and the label layer level.
The calculating the complexity of the data integration framework according to the number of the labels and the label layer level number specifically comprises the following steps:
music related data including song information, album information, artist information are acquired, and music data of different data sources are integrated into one data integration frame. The label system is designed for classifying and marking the music data, and different levels of labels are defined according to the requirements, including music style, music genre and artist type. The number of tags in the data integration framework, including all levels of tags, is counted to see the hierarchy of tags in the data integration framework. For each tag, its depth and width in the hierarchy are calculated, the depth of the tag representing its number of layers in the hierarchy and the width of the tag representing the number of sibling tags having the same parent. And obtaining the association relationship between the labels, wherein the association relationship comprises a parent-child relationship and a peer relationship. And carrying out normalization processing on each index, giving weight, multiplying the normalization value of each index by the corresponding weight, and adding the weighted values to obtain the complexity score.
In one embodiment, the analyzing melody or lyrics focused by a music producer or a lyrics author according to the music knowledge graph through a decision tree algorithm includes:
Obtaining information in a music knowledge graph, wherein the information comprises the style, genre and artist type of songs, and training a decision tree model; analyzing the most probably focused melody aspect according to the type or style of music frequently made by a music producer, if one music producer frequently makes rock music, the music producer focuses more on the sound and rhythm of guitar, and the strength and dynamics of music; analyzing the focused lyrics according to the theme and emotion of the lyrics created by a lyrics author, and if most songs created by the lyrics author relate to love and injury, the lyrics author focuses on words and sentences expressing emotion and describing scenes; when training the decision tree model, respectively taking a music producer and a lyric author as two independent classification tasks; for each task, extracting corresponding characteristics from the music knowledge graph, extracting the music type or style tag in the music knowledge graph by a music producer, and extracting the theme and emotion tag in the music knowledge graph by a lyric author; after training is completed, this decision tree model is used to predict melodies or lyrics that the music producer and lyrics author pay attention to when creating a new composition.
In one embodiment, the generating a lyric modification scheme according to the predicted result of the decision tree algorithm and in combination with the music knowledge graph for modifying melody or lyrics includes:
obtaining melodies or lyrics most possibly focused by a music producer or lyrics author predicted by a decision tree model; according to the prediction result, combining with the related information in the music knowledge graph, analyzing the modification or improvement to be carried out, if the decision tree prediction result is that a music producer or a lyric author pays attention to the melody, searching the theory or skill related to the music type or style according to the music knowledge graph, and acquiring musical instruments and the musical performance used in a music library of the music producer or the lyric author and the past cooperated artist and music style of the music producer or the lyric author; for melody modification, the selection of notes is adjusted, the structure of the melody is changed, and the rhythm sense is enhanced; modification of lyrics, including increasing emotional expressions, improving vocabulary selection or optimizing descriptive skills; transmitting the lyric modification scheme to a related music producer or lyric author; acquiring feedback of a music producer or lyrics author, and re-analyzing and adjusting according to practice and feedback results; if further modifications and improvements are needed, the lyrics modification scheme is continued to be optimized.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
the invention discloses a method for obtaining unstructured text of a music platform by utilizing cloud computing power and carrying out preliminary classification by applying a naive Bayesian algorithm. Text containing lyrics or comments is selected through screening, and emotion scoring is conducted through an emotion analysis module. Lyrics or musical elements associated with the emotion are then identified and screened out on the cloud server. And then, forming emotion label-element pairs, and merging and unifying similar or related labels so as to reduce the complexity of the knowledge graph. The unified tag-element pairs are integrated into a simplified data framework using a data integration engine. After the data integration is completed, triggering the updating flow of the knowledge graph, and adding the simplified data frame into the music knowledge graph. Meanwhile, newly added elements or labels in the data integration framework are synchronously updated through a real-time detection mechanism, and the combination and simplification of the labels are automatically triggered when the data integration framework is too complex. And analyzing melodies or lyrics focused by a music producer or a lyrics author by utilizing a decision tree algorithm according to the music knowledge graph. Finally, according to the prediction result generation scheme of the decision tree algorithm, the method is used for modifying melodies or lyrics, and more efficient music creation and lyrics modification are achieved.
Drawings
Fig. 1 is a flow chart of a knowledge graph mass unstructured integration method based on cloud computing power and big data technology.
Fig. 2 is a schematic diagram of a knowledge graph mass unstructured integration method based on cloud computing power and big data technology.
Fig. 3 is a schematic diagram of a knowledge graph mass unstructured integration method based on cloud computing power and big data technology.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
The knowledge graph mass unstructured integration method based on cloud computing power and big data technology in the embodiment specifically comprises the following steps:
s101, obtaining unstructured text of the connected Voicer music, including comments and lyrics, by utilizing cloud computing power, and carrying out preliminary classification by using a naive Bayesian algorithm.
Related unstructured text data, including comments and lyrics, is obtained from the linked music through a disclosed data interface using a cloud server. And (3) preprocessing the obtained unstructured text data by using jieba, wherein the preprocessing comprises duplication removal, cleaning, word segmentation and part-of-speech tagging. The text is converted into feature vectors which can be used by a naive Bayesian algorithm through TF-IDF, and the feature vectors comprise text content and semantic information. Training a naive Bayesian algorithm by using the text feature vectors, and learning the feature distribution of each category according to training data. The preprocessed feature vectors are input into a trained naive Bayesian algorithm, classification results of texts, namely predicted categories, are output, and the texts are classified into categories of emotion positive, emotion negative, topics related to music and topics unrelated to music according to requirements. For example, 100 comments and 100 lyrics were obtained from the linked music as unstructured text data. First, the jieba is used for pretreatment, including duplication removal, cleaning, word segmentation and part-of-speech tagging. Duplicate comments and lyrics are removed, ensuring that each text appears only once. And removing irrelevant information such as special characters, punctuation marks, numbers and the like in the text. Each text is segmented using jieba, and the text is segmented into individual words according to the words. And marking the part of speech of the word segmentation result, wherein the marking of the part of speech of each word comprises nouns, verbs and adjectives. And preprocessing to obtain word segmentation results and part-of-speech labels of each text. Next, the text is converted into feature vectors using TF-IDF for training and prediction of naive bayes algorithm. And taking each word as a characteristic, and counting the occurrence times of each word in each text to form a characteristic vector of which the matrix represents the text. And counting the occurrence times of each part of speech in each text according to the part of speech tagging result to form a matrix representing the feature vector of the text. After feature vectors of the text are obtained, training and prediction are carried out by using a naive Bayesian algorithm. And learning the feature distribution of each category according to the marked text data, wherein the feature distribution comprises emotion positive, emotion negative, topics related to music and topics unrelated to music. The preprocessed feature vectors are input into a trained naive Bayesian algorithm, and classification results of the text, namely predicted categories, are output. Taking emotion classification as an example, 50 comments have been labeled as positive emotion and 50 comments as negative emotion. After training, the naive Bayes algorithm learns the feature distribution of each category. An emotion classification prediction is now performed for a new comment, assuming that the feature vector of the comment is represented as [0,1,0,1], which represents that it contains an emotion positive word and an emotion negative word. Inputting the feature vector to a trained naive Bayesian algorithm, and predicting that the comment is emotion negative according to feature distribution obtained through training. Similarly, classification predictions may also be made for topics related to music and topics unrelated to music.
S102, screening out texts containing lyrics or comments, and carrying out emotion scoring by using an emotion analysis module.
The predetermined keywords or key phrases include names representing songs, albums, artist names and lyrics, and the text is subjected to keyword matching detection using a keyword matching method. If any one keyword or phrase is included in the text, the text is considered to be text containing lyrics or comments. The text is precisely or fuzzy matched by using a regular expression or character string matching method. And (3) reserving and recording the matched text containing lyrics or comments as input for emotion scoring of a subsequent Bayesian classification algorithm. A training data set containing emotion tags is obtained, wherein the training data set contains text samples and corresponding emotion tags positive, negative or neutral emotion. And preprocessing the text sample, including text cleaning, word segmentation and stop word removal. The preprocessed text is converted into feature vectors which are input by a naive Bayesian algorithm by using TF-IDF. And inputting the feature vectors and the corresponding emotion labels into a naive Bayesian classification algorithm to train the model. The naive bayes algorithm learns feature distribution of the categories based on the training data, calculates conditional probability of each category, and estimates conditional probability of each feature in each category. And carrying out emotion classification on the text to be analyzed by using a trained naive Bayesian emotion analysis model, understanding emotion tendencies of the text according to probability values of prediction results, and quantifying the emotion tendencies of the text into emotion scores. For example, there is a task that requires finding text containing song names, album names, artist names, or lyrics among a large number of texts and emotion classifying the texts using a naive bayes algorithm. 1000 text samples were taken from the music on hold, including user comments and lyrics. Keywords or phrases such as song title, album title, artist name, and lyrics are predetermined. These texts are precisely or fuzzy matched by using a regular expression and a character string matching method. At this stage, 300 texts were found to contain keywords or phrases, such as the name of a song when sunset was dusk, the name of an album was the photo years, or the name of an artist Deng Ziqi. A training data set containing emotion tags is obtained, and each text sample is preprocessed, including text cleaning, word segmentation and stop word removal. At this stage, each text is labeled according to its emotional tendency either positive, negative, or neutral. And (3) reserving and recording the matched text containing lyrics or comments as input for emotion scoring of a subsequent Bayesian classification algorithm. The preprocessed text is converted into feature vectors which are input by a naive Bayesian algorithm by using TF-IDF, the feature vectors and corresponding emotion labels are input into a naive Bayesian classification algorithm to carry out model training, the naive Bayesian algorithm learns feature distribution of categories based on training data, conditional probability of each category is calculated, and the conditional probability of each feature in each category is estimated. And performing emotion classification on the text to be analyzed by using the trained naive Bayesian emotion analysis model. For the probability value of the predicted result, a threshold value of 0.7 is set, and if the positive emotion probability of a text is greater than the threshold value, the emotion tendencies of the text are quantized into positive emotion scores of 100. If a text's negative emotion probability is greater than a threshold, its emotion tendencies are quantified as negative emotion score 50. For other cases, its emotional tendency is quantified as a neutral emotion score 70. And quantifying the emotion tendencies of each text into a specific score according to the model prediction result.
S103, identifying the acquired data at the cloud server, and screening lyrics or music elements related to emotion.
Illustratively, a dataset containing song information, lyrics, and emotion tags is obtained by communicating with music. Each data point should include the title, album, artist of the song, while for lyrics, expressed complex emotion tags should be noted, such as melancholy or sadness, happiness or pleasure, which need to be done at the time of data acquisition or later labeling work. Through NLTK library in Python, an emotion word library is constructed, wherein the emotion word library comprises emotion words and related words expressing complex emotion, and the emotion word library comprises words expressing happiness, happiness and the like and words expressing happiness or happiness emotion, sadness, depression and the like. And cleaning, word segmentation and stop word removal are performed on the song words and the music element texts. In addition, stem extraction and part of speech tagging may also be performed to help more accurately understand the meaning of text. In this step, each word or phrase in the lyrics and music element text is traversed and matches are found in the emotion word library. If sadness occurs in the text, the depression and sadness matched with the sadness can be found in the emotion lexicon, and the matched items can be extracted as characteristics. A portion of the dataset is selected and labels are manually assigned to each lyrics or musical element, including complex emotion labels such as depression or sadness, happiness or pleasure, and these labeled data are then used to train a support vector machine model. In the training process, the preprocessed text is converted into feature vectors, and the feature vectors and corresponding emotion labels are input into a model for training. Once the model is trained, it can be used to identify and filter emotion for new lyrics and music elements. The preprocessed text is converted into feature vectors, and then the feature vectors and corresponding emotion labels are input into a model for prediction. And judging whether the text is related to a specific complex emotion or not according to the output of the model. And screening out lyrics or music elements related to the specific complex emotion according to the prediction result of the model. If one wants to find lyrics or musical elements that express a depression or sad emotion, one screens out songs that the model predicts as depression or sad, and thus finds out lyrics or musical elements that are relevant to a particular complex emotion. A data set containing song information, lyrics and emotion tags is obtained by communicating with the music. Each data point should include the title, album, artist of the song, while for lyrics, expressed complex emotion tags should be noted, such as melancholy or sadness, happiness or pleasure, which need to be done at the time of data acquisition or later labeling work. Through NLTK library in Python, an emotion word library is constructed, wherein the emotion word library comprises emotion words and related words expressing complex emotion, and the emotion word library comprises words expressing happiness, happiness and the like and words expressing happiness or happiness emotion, sadness, depression and the like. And cleaning, word segmentation and stop word removal are performed on the song words and the music element texts. In addition, stem extraction and part of speech tagging may also be performed to help more accurately understand the meaning of text. In this step, each word or phrase in the lyrics and music element text is traversed and matches are found in the emotion word library. If sadness occurs in the text, the depression and sadness matched with the sadness can be found in the emotion lexicon, and the matched items can be extracted as characteristics. A portion of the dataset is selected and labels are manually assigned to each lyrics or musical element, including complex emotion labels such as depression or sadness, happiness or pleasure, and these labeled data are then used to train a support vector machine model. In the training process, the preprocessed text is converted into feature vectors, and the feature vectors and corresponding emotion labels are input into a model for training. Once the model is trained, it can be used to identify and filter emotion for new lyrics and music elements. The preprocessed text is converted into feature vectors, and then the feature vectors and corresponding emotion labels are input into a model for prediction. And judging whether the text is related to a specific complex emotion or not according to the output of the model. And screening out lyrics or music elements related to the specific complex emotion according to the prediction result of the model. If one wants to find lyrics or musical elements that express a depression or sad emotion, one screens out songs that the model predicts as depression or sad, and thus finds out lyrics or musical elements that are relevant to a particular complex emotion.
S104, forming emotion label-element pairs, and combining and unifying similar or related labels to reduce complexity of the knowledge graph.
And obtaining song comments on the UNICOM music or obtaining the evaluation of the user on the song by using a social platform, and obtaining song element information including rhythm and tone in a publicly available music data set. And extracting emotion words in song comments by using space, and matching the emotion words with emotion labels by using an emotion dictionary. And simultaneously, extracting features of song elements, including extracting tone and rhythm features, and matching with emotion tags. And calculating the similarity of the emotion labels by using the cosine similarity, merging and unifying the emotion labels with the similarity higher than the preset similarity, and classifying and normalizing the elements. And combining the matched and normalized emotion labels with the elements to construct emotion label and element pairs. Each emotion tag and the corresponding element form a data line and are stored in a table or data set. For example, a song actor is selected from the music on the communication and comment data thereof is acquired. One of the comments is that the melody of the song is very dynamic, and the lyrics are also very tactile, so that the person is involved in recall and thinking. The comment is subjected to emotion vocabulary extraction by using space, and the obtained emotion vocabulary is a moving person, sense, recall and thinking. These emotion words are then matched with emotion tags using an emotion dictionary. And matching the dynamic person and the sense in the emotion dictionary as sad emotion labels, and matching the recall and the thinking as thinking emotion labels. Next, tempo and pitch information of the song are acquired from the publicly available music dataset. The song has a tempo characteristic of 120BPM and a medium tempo and a pitch characteristic of C major and happy pitches. And then, matching and normalizing according to the characteristics of the emotion labels and the elements. The preset similarity threshold is 6, the similarity of sad emotion labels and mind emotion labels is 8, and the similarity is higher than the threshold, so that the sad emotion labels and the mind emotion labels are combined into one emotion label feeling. And finally, combining the matched and normalized emotion labels with the elements to construct an emotion label and element pair. For this song, the emotion tag and element pair of "actor" is shown below, emotion tag is a feeling of wounding, a rhythm with medium rhythm in rhythm characteristics, and tone with pleasant tone characteristics. Each emotion tag and the corresponding element form a data line and are stored in a table or data set.
S105, integrating the unified tag-element pairs into a simplified data frame by using a data integration engine.
Illustratively, there are two data sources, for example, data source 1 includes music emotion tag data, stored in a CSV file. Music ID1, emotion tag happy, music ID2, emotion tag excited, music ID3, emotion tag wounded feeling, music ID4 and emotion tag calm; the data source 2 comprises music element data and is stored in a database table. Music ID1, strong music element rhythm, music ID2, high music element tone, music ID3, sadness of music element melody, music ID4, soft and light music element. A new data integration job is created in talen. And using the CSV data source component to configure a path and attributes for reading the CSV file, and reading the music emotion tag data into the Talend job. Using the database connection component, the configuration connects to a database containing music element data and specifies table names and related attributes, reading the music element data into the talen job. The music emotion tag is associated with the music element data using a data conversion component to specify an association field music ID. And performing data format conversion, duplicate value removal or missing value processing according to the requirements by using a data cleaning component. And selecting an output component to output the integrated data to a database table or CSV file. After the Talend operation is performed, a simplified data frame is generated, and the simplified data frame contains unified information corresponding to music emotion labels and music elements, wherein the unified information is shown as follows, music ID1, song A, artist A, pleasure and strong rhythm; music ID2, song B, artist B, exciting, high-pitched; music ID3, song C, artist C, feeling wounded, melody sad; music ID4, song D, artist D, calm, soft and light, for example, two data sources, data source 1, including music emotion tag data, are stored in a CSV file. Music ID1, emotion tag happy, music ID2, emotion tag excited, music ID3, emotion tag wounded feeling, music ID4 and emotion tag calm; the data source 2 comprises music element data and is stored in a database table. Music ID1, strong music element rhythm, music ID2, high music element tone, music ID3, sadness of music element melody, music ID4, soft and light music element. A new data integration job is created in talen. And using the CSV data source component to configure a path and attributes for reading the CSV file, and reading the music emotion tag data into the Talend job. Using the database connection component, the configuration connects to a database containing music element data and specifies table names and related attributes, reading the music element data into the talen job. The music emotion tag is associated with the music element data using a data conversion component to specify an association field music ID. And performing data format conversion, duplicate value removal or missing value processing according to the requirements by using a data cleaning component. And selecting an output component to output the integrated data to a database table or CSV file. After the Talend operation is performed, a simplified data frame is generated, and the simplified data frame contains unified information corresponding to music emotion labels and music elements, wherein the unified information is shown as follows, music ID1, song A, artist A, pleasure and strong rhythm; music ID2, song B, artist B, exciting, high-pitched; music ID3, song C, artist C, feeling wounded, melody sad; music ID4, song D, artist D, calm, soft and light.
And S106, triggering the updating flow of the knowledge graph after the data integration is completed, and adding the simplified data frame into the music knowledge graph.
And acquiring the current music knowledge graph structure and the updating flow, and determining the adding mode of new data and the corresponding relation between the new data and the entity and relation in the existing knowledge graph. According to the structural requirement of the music knowledge graph, music ID, music title, artist, emotion tag and music element information are extracted from the simplified data frame as values of new entities or attributes. And loading the existing music knowledge graph by using Neo4j, and performing editing operation. According to the information extracted from the simplified data frame, a new music entity is added to the knowledge graph by using an editing tool, and when the entity is added, the unique identifier music ID is used for distinguishing the entity from the existing entity. And establishing an association relation with the existing entity for the newly added music entity according to the information in the simplified data frame. And using the two attributes of the emotion label and the music element, establishing a relation with other entities, establishing an emotion relation with the emotion entity, and establishing a similar or related relation with the music element entity. After the new entity and relation are added into the knowledge graph, verification and proofreading are carried out, the accuracy and the integrity of data are ensured, and whether the newly added entity and relation are correctly presented in the knowledge graph is checked. If the verification is normal, the latest knowledge graph is saved and updated, and newly added entities and relations are integrated with the existing data. For example, there is a music knowledge graph, and the following simplified data framework is used to extract information. Music ID1, song 1, artist 1, emotion tag happy, music element popular music, rock music; music ID2, song 2, artist 2, emotion tag sadness, music element jazz, blue; music ID3, song 3, artist 3, emotion tag excitement, musical element rap, rhythm blue. Loading the existing music knowledge graph and preparing for editing operation. Using Neo4j, a new musical entity is added to the knowledge graph. To add a new song 4, the artist 4 sings, the emotion tag is happy, and the music element is popular music and dance music. A unique identifier 4 is added to the new song to distinguish it from the existing entity. And then, establishing the association relation between the newly added music entity and the existing entity according to the information in the simplified data frame. A relationship is added to the new song with artist 4 indicating that it was singed by the artist. A relationship to the happy emotion entity is added to the new song, indicating that it has a happy emotion tag. In addition, the association relationship between the two music element entities of popular music and dance music is added for the new song, which indicates that the new song is related or similar to the elements. After the addition of the new entity and relation is completed, the data is verified and checked, and the accuracy and the integrity of the data are ensured. It is checked whether the newly added entities and relationships are properly presented in the knowledge-graph. If the verification is normal, the latest knowledge graph is saved and updated, and newly added entities and relations are integrated with the existing data.
And carrying out entity identification and disambiguation through the cyclic neural network model, determining whether music entities in different data frames refer to the same entity, and acquiring accurate entity information.
Illustratively, a dataset containing different name-word aliases, nomination of the same singer is crawled from a music database and network, the dataset including lyrics, comments, music articles. The text data is cleaned up into a normalized text sequence using a word segmentation engine. The entity recognition and disambiguation model is built using a recurrent neural network, text is input, and the probabilities of the entity categories are output using a softmax activation function. And training the entity recognition and disambiguation model by using the labeled training data set. In the training process, cross-validation is used to evaluate the performance of the model and to adjust and optimize the model according to the effect of the validation data. When an external application or an internal service needs to disambiguate a certain entity, the entity to be disambiguated is input into the model through the API. The model will automatically identify the entity's unique identifier based on the context and return accurate information for that entity. If we input Aigers, avrilLavigne and Avril, the model determines that they refer to the same singer and returns the unique identifier for that singer. The performance of the entity recognition and disambiguation model is monitored, user feedback and labels are collected through an API (application program interface) so as to iterate model training and optimization, the accuracy and the robustness of the model are evaluated regularly, and model adjustment and improvement are carried out aiming at new problems. From the music database and the network, data sets containing aliases and nominations of different names of the same singer are crawled, wherein the data sets comprise lyrics, comments and music articles. The text data is cleaned up into a normalized text sequence using a word segmentation engine. The entity recognition and disambiguation model is built using a recurrent neural network, text is input, and the probabilities of the entity categories are output using a softmax activation function. And training the entity recognition and disambiguation model by using the labeled training data set. In the training process, cross-validation is used to evaluate the performance of the model and to adjust and optimize the model according to the effect of the validation data. When an external application or an internal service needs to disambiguate a certain entity, the entity to be disambiguated is input into the model through the API. The model will automatically identify the entity's unique identifier based on the context and return accurate information for that entity. If we input Aigers, avrilLavigne and Avril, the model determines that they refer to the same singer and returns the unique identifier for that singer. The performance of the entity recognition and disambiguation model is monitored, user feedback and labels are collected through an API (application program interface) so as to iterate model training and optimization, the accuracy and the robustness of the model are evaluated regularly, and model adjustment and improvement are carried out aiming at new problems.
And (3) establishing a relation model through the graph neural network, and extracting the relation between the music entities from the data frame to obtain accurate entity association information.
The retrieval of music data from various data sources, including music databases, chunko music, music reviews, includes songs, artists, album entities, and relationships therebetween. The data is cleaned and preprocessed, including noise removal, normalization of the data. The TF-IDF is used to extract features from descriptions of songs and artists, including style, genre, mood, and music fingerprints are used to extract features from the audio characteristics of the music itself. According to the collected data and the extracted characteristics, a graph neural network model is selected to process the relation between music entities, songs and artists of the music entities are regarded as nodes in the graph, the relation between the music entities is regarded as edges in the graph, and the graph neural network model is trained to learn the interaction between the nodes and the edges. And inputting the new music entity and the relation into the graph neural network model to obtain the association information between the music entities, inputting a new song and an artist thereof into the model, and outputting the association degree between the music entities by the graph neural network model. And using cross verification to evaluate the accuracy and the robustness of the model, and adjusting and optimizing the model according to the evaluation result. For example, 10000 songs of data including song title, singer, album, duration, etc. are acquired from the music database. The data is first cleaned to remove erroneous data and duplicate entries. Next, the TF-IDF is used to perform feature extraction on the song and artist descriptions. A description of a song is that this is a popular rock song, giving a pleasant feel, according to the TF-IDF algorithm, the following eigenvectors are obtained: 2,3,4,1,0,0,0,0, wherein each dimension of the vector corresponds to a particular word popularity, rock, happiness, etc., and the numerical value represents the importance of the word in the description. In addition, music fingerprints may be used to extract features of the music itself. The audio characteristics of a song include information such as tempo, tone, music style, etc., and the following feature vectors can be obtained: [5,4,1,0,0,0,0,0] songs, artists and albums are then treated as nodes in the graph, and the relationship between them songs and artist are treated as edges in the graph. The interaction between these nodes and edges is then processed using a graph neural network model. The model is trained to learn the associations between nodes and edges. A new song and its artist are input into the model, which will output the degree of association or similarity between them. The feature vector of a song is input [3,2,1,0,0,0,0,0], the feature vector of an artist is [4,3,2,0,0,0,0,0], and the association degree between the model output and the feature vector is 8. Finally, cross-validation is used to evaluate the accuracy and robustness of the graph neural network model. The data set is divided into a training set and a testing set, the training set is used to train the graph neural network model, and then the testing set is used to evaluate the prediction result of the graph neural network model. If the accuracy of the graph neural network model is high, the graph neural network model is considered to perform well in predicting the degree of association between the musical entities.
And S107, synchronously updating newly added elements or labels in the data integration framework by a real-time detection mechanism, and automatically triggering the combination and simplification of the labels when the complexity exceeds a preset level.
And monitoring the data integration framework in real time to obtain newly added elements and labels in the data integration framework. And carrying out label merging and simplifying processing on the newly added elements and labels, and incorporating the merged and simplified labels into a data integration framework. And evaluating the complexity of judging the data integration framework according to the number of the labels and the label layer level, and if the complexity exceeds the preset complexity, using K-means to analyze the labels in the framework and determining which labels are further combined. Obtaining tag data in a data integration framework, wherein the tag data comprises names, attributes and relations of tags, converting the tag data into a similarity matrix, calculating the similarity between the tags by using cosine similarity based on extracted features, performing cluster analysis on the tags by using K-means, and dividing the tags with similarity higher than preset similarity into the same cluster according to the result of similarity calculation. And for the labels of the same cluster, formulating a merging strategy according to the characteristics of the cluster and the similarity between the labels. The combined tags are evaluated for their degree of simplification and impact on the data integration framework. The state of the data integration framework is detected, and if the complexity exceeds a predetermined complexity, the data integration framework is again consolidated and simplified. For example, there is a music data integration framework containing tag data, tag a name is popular music, attributes including cheerful tempo, easy to get on mouth, relationship is artist and album; the label B is named rock music, the attributes comprise passion four, guitar solo, and the relationship is artist and album; tag C is named jazz, attributes including soft elegance, impromptu performance, relationship artist and album. The tag data is found to be newly added through real-time monitoring, the name of the tag D is popular, the attribute is that the rhythm is cheerful and easy to get on, and the relationship is that the artist and the album. For the new tag D, tag merging and simplification processing can be performed. According to the similarity of the attributes and the relationships, the tags A and D are combined into one tag A named popular music, the attributes are cheerful and easy to get on, and the relationships are artists and albums. The complexity of the data integration framework is evaluated. The complexity index is specified as the number of labels is not more than 5, and the number of label layer stages is not more than 2. Since there are now 4 labels and 1 hierarchy, the complexity does not exceed the predetermined complexity, and no further merging is required. Then, the similarity between the labels is calculated, and cluster analysis is performed. And calculating the similarity between the labels by using cosine similarity, dividing the labels with the similarity higher than 0.8 into the same cluster by using K-means, and formulating a merging strategy according to the characteristics of the cluster and the similarity between the labels. Depending on the actual situation, popular music and rock music are found to have a high similarity in characteristics and relationships, and can be considered to be combined into a wider tag.
And calculating the complexity of the data integration framework according to the number of the labels and the label layer level.
Illustratively, multiple music data sources are integrated, including online music libraries, music streaming platforms, artist databases, and the like. Music data of different data sources is integrated into a comprehensive data integration framework including song information, album information, and artist information. Different music style tags are defined according to music style, including pop, rock, classical, jazz, etc. Further categorizing the music styles, there may be tags under popular music styles including popular rock, popular electronics, popular dance music, etc. Artists are categorized according to artist type, including individual musicians, bands, choruses, etc. The number and hierarchical structure of each level of labels in the statistics data integration framework. The music style has 10 tags in total, and the hierarchical structure is a first-level tag. The music genre has 30 tags in total, and the hierarchical structure is a primary tag music style and a secondary tag music genre. The artist type has 5 tags in total, and the hierarchical structure is a first-level tag. Music style tags depth 1, music genre tags depth 2, artist type tags depth 1. Music style tags are 10 tags wide, music genre tags are 30 tags wide, and artist type tags are 5 tags wide. The label association relation is obtained, the music genre label is a child label of the music style label, the artist type label and the music style label have no father-son relation, and the artist type label and the label belonging to the music genre label have the same-level relation. And according to specific requirements, carrying out normalization processing on indexes such as label depth, label width and the like, and giving corresponding weights. Label depth normalization value, music style label normalization value is 1, music genre label normalization value is 0.67, artist type label normalization value is 1. Label width normalized value, music style label normalized value 1, music genre label normalized value 1, artist type label normalized value 0.33. The tag depth weight, the music style tag weight is 0.4, the music genre tag weight is 0.3, and the artist type tag weight is 0.5. Tag width weight, music style tag weight 0.6, music genre tag weight 0.7, artist type tag weight 0.5. Multiplying the normalized value of each index by the corresponding weight, and adding the weighted values to obtain the complexity score. Musical style complexity score 1 x 0.4+1 x 0.6=1; musical genre complexity score 0.67 x 0.3+1 x 0.7=0.901; artist type complexity score 1 x 0.5+0.33 x 0.5=0.665. And synthesizing the calculation results to obtain the complexity score of the whole data integration framework: complexity score = music style complexity score + music genre complexity score + artist type complexity score = 1+0.901+0.665 = 2.566 a plurality of music data sources, including online music libraries, music streaming platforms, artist databases, and the like, are integrated. Music data of different data sources is integrated into a comprehensive data integration framework including song information, album information, and artist information. Different music style tags are defined according to music style, including pop, rock, classical, jazz, etc. Further categorizing the music styles, there may be tags under popular music styles including popular rock, popular electronics, popular dance music, etc. Artists are categorized according to artist type, including individual musicians, bands, choruses, etc. The number and hierarchical structure of each level of labels in the statistics data integration framework. The music style has 10 tags in total, and the hierarchical structure is a first-level tag. The music genre has 30 tags in total, and the hierarchical structure is a primary tag music style and a secondary tag music genre. The artist type has 5 tags in total, and the hierarchical structure is a first-level tag. Music style tags depth 1, music genre tags depth 2, artist type tags depth 1. Music style tags are 10 tags wide, music genre tags are 30 tags wide, and artist type tags are 5 tags wide. The label association relation is obtained, the music genre label is a child label of the music style label, the artist type label and the music style label have no father-son relation, and the artist type label and the label belonging to the music genre label have the same-level relation. And according to specific requirements, carrying out normalization processing on indexes such as label depth, label width and the like, and giving corresponding weights. Label depth normalization value, music style label normalization value is 1, music genre label normalization value is 0.67, artist type label normalization value is 1. Label width normalized value, music style label normalized value 1, music genre label normalized value 1, artist type label normalized value 0.33. The tag depth weight, the music style tag weight is 0.4, the music genre tag weight is 0.3, and the artist type tag weight is 0.5. Tag width weight, music style tag weight 0.6, music genre tag weight 0.7, artist type tag weight 0.5. Multiplying the normalized value of each index by the corresponding weight, and adding the weighted values to obtain the complexity score. Musical style complexity score 1 x 0.4+1 x 0.6=1; musical genre complexity score 0.67 x 0.3+1 x 0.7=0.901; artist type complexity score 1 x 0.5+0.33 x 0.5=0.665. And synthesizing the calculation results to obtain the complexity score of the whole data integration framework: complexity score = music style complexity score + music genre complexity score + artist type complexity score = 1+0.901+0.665 = 2.566.
S108, analyzing melodies or lyrics focused by music producers or lyrics authors through a decision tree algorithm according to the music knowledge graph.
Information in the music knowledge graph is acquired, including the style, genre and artist type of the song, and a decision tree model is trained. The most probable melody aspects of interest of a music producer are analyzed according to the type or style of music they often produce, and if one produces rock music often, one is more concerned with the sound and rhythm of guitar, as well as the dynamics and dynamics of the music. The focused lyrics aspect is analyzed according to the topic and emotion of the lyrics authored by the lyrics author, and if the songs authored by one lyrics author mostly relate to love and emotion, the lyrics author focuses more on words and sentences expressing emotion and describing scenes. When training the decision tree model, the music producer and the lyrics author are respectively used as two independent classification tasks. For each task, corresponding features are extracted from the music knowledge graph, the music producer extracts the music type or style tag in the music knowledge graph, and the lyrics author extracts the theme and emotion tag in the music knowledge graph. After training is completed, this decision tree model is used to predict melodies or lyrics that the music producer and lyrics author pay attention to when creating a new composition. For example, a person who produces music calls Zhang San, obtains the music type or style label of Zhang Sanfrom the music knowledge graph, and obtains the music produced by the person mainly belongs to the rock and roll genre. Rock music producers are more concerned with guitar sounds and rhythms, as well as music dynamics and dynamics, and use these features as inputs to decision tree model training. When training the decision tree model, a Zhang three rock music sample is used as training data, and characteristics of guitar sound, rhythm characteristics, music strength, dynamics and the like are extracted, and are associated with labels in melody aspect. After training, the decision tree model is used for predicting the melody which is most possibly focused when creating a new composition, and if he starts to create a new rock song, he is informed that the guitar is likely to be focused on the performance and the sound effect, and the dynamics and the dynamic performance according to the prediction result of the model. One lyric author calls Li IV, and obtains the creation theme and emotion label of Li IV from the music knowledge graph, so as to obtain the song that she often creates and involves love and injury. Lyrics authors of love and injury focus more on words and sentences expressing emotion and describing a scene, and take these features as inputs for training of decision tree models. When training the decision tree model, love and wounded song samples of Lifour are used as training data, and characteristics such as vocabulary and sentences expressing emotion and describing scenes are extracted. These features are associated with tags to lyrics aspects. After training, this decision tree model is used to predict lyrics that we are most likely to pay attention to when creating a new work, if she starts to create a new love song, she is likely to pay attention to the expression of emotion and words and sentences describing love scenes according to the prediction result of the model.
S109, generating a lyric modification scheme by combining the music knowledge graph according to the decision tree algorithm prediction result so as to be used for modifying melody or lyrics.
The melody or lyrics predicted by the decision tree model that the music producer or lyrics author is most likely to pay attention to are obtained. And according to the prediction result, analyzing the modification or improvement required by combining the related information in the music knowledge graph, if the decision tree prediction result is that the music producer or the lyrics author pays attention to the melody, searching the theory or skill related to the music type or style according to the music knowledge graph, and acquiring musical instruments and effects used in a music library of the music producer or the lyrics author and artists and music styles of past cooperation of the music producer or the lyrics author. For melody modification, the selection of notes is adjusted, the structure of the melody is changed, and the rhythm sense is enhanced. Modification of lyrics includes increasing emotional expressions, improving vocabulary selection, or optimizing descriptive skills. The lyric modification scheme is sent to the relevant music producer or lyric author. And acquiring feedback of a music producer or lyrics author, and re-analyzing and adjusting according to the practice and feedback results. If further modifications and improvements are needed, the lyrics modification scheme is continued to be optimized. For example, a music producer called wang five, derives that he is most likely to be interested in a melody based on the prediction result of the decision tree model. Based on the prediction result, the music knowledge graph is queried to know the theory or skill related to the melody, and the artist and music style in cooperation with Wang Wuchang. In the music knowledge graph, wang Wujing is found to often produce popular music and work with popular musical instruments such as guitar, piano and drums, his music usually emphasizes the ease of mind of the dynamic and melody. Based on this information, the following modifications are proposed, and in creating a melody, an attempt is made to select notes and scales commonly used for some popular musical instruments so as to conform to the style of the popular music made by the king five. The design of the melody structure into a form that is easy to remember and popular is purposefully considered, and repeated melody fragments are added or common melody patterns are used to enhance the memory effect of the listener. Focusing on the feeling of rhythm in the melody, it is possible to try to use some drum points or emphasize the rhythm of some notes to enhance the overall feeling of motion and rhythm. These modifications are sent to the king five and await his feedback. After receiving the feedback of wang's, re-analysis and adjustment are performed according to his practice and feedback results. If further modification and improvement are needed, the lyric modification scheme is continuously optimized aiming at the specific problem so as to achieve better effect and meet the creation requirement of the King five.
The foregoing is merely illustrative of some preferred embodiments of the present invention, but the invention is not limited thereto and many modifications and variations are possible. Any modifications or variations which are based on the basic principles of the present invention should be considered as falling within the scope of the present invention.

Claims (7)

1. The knowledge graph mass unstructured integration method based on cloud computing power and big data technology is characterized by comprising the following steps of:
the method comprises the steps of obtaining unstructured text of a music platform, including comments and lyrics, by utilizing cloud computing power, and carrying out preliminary classification by using a naive Bayesian algorithm; screening out texts containing lyrics or comments, and carrying out emotion scoring by using an emotion analysis module; identifying the acquired data at the cloud server, and screening lyrics or music elements related to emotion; forming emotion label-element pairs, and combining and unifying similar or related labels to reduce the complexity of the knowledge graph; integrating the unified tag-element pairs into a simplified data frame using a data integration engine; triggering the updating flow of the knowledge graph after the data integration is completed, and adding the simplified data frame into the music knowledge graph; the real-time detection mechanism synchronously updates newly added elements or labels in the data integration framework, and automatically triggers the combination and simplification of the labels when the complexity exceeds a preset level; analyzing melodies or lyrics focused by music producers or lyrics authors through a decision tree algorithm according to the music knowledge graph; generating a lyric modification scheme by combining a music knowledge graph according to a decision tree algorithm prediction result, wherein the lyric modification scheme is used for modifying melody or lyrics;
The cloud server identifies the acquired data, screens out lyrics or music elements related to emotion, and comprises the following steps:
acquiring a data set containing song information, lyrics and emotion labels through a music platform, wherein each data point included in the data set comprises a title, an album and an artist of a song, and simultaneously marking expressed emotion labels for the lyrics;
constructing an emotion word library through an NLTK library in Python, wherein the emotion word library comprises emotion words and related words for expressing emotion;
cleaning, word segmentation and removal of stop words, word stem extraction and part of speech tagging are carried out on the song words and the music element texts, and the method specifically comprises the steps of traversing each word or phrase in the lyrics and the music element texts, and searching for matching items in an emotion word stock;
selecting a part of data set, distributing emotion labels for each lyric or music element, then training a support vector machine model by using the data with the emotion labels, converting the preprocessed text into feature vectors in the training process, and inputting the feature vectors and the corresponding emotion labels into the support vector machine model for training;
Converting the preprocessed text into feature vectors, inputting the feature vectors and corresponding emotion labels into a support vector machine model for prediction, and judging whether the text is related to a certain emotion or not according to the output of the support vector machine model;
screening lyrics or music elements related to a certain emotion according to a prediction result of the support vector machine model;
the forming of emotion label-element pairs and merging and unifying similar or related labels includes:
obtaining song comments on a music platform or obtaining the evaluation of a user on songs by using a social platform and obtaining song element information in a music data set, wherein the song element information at least comprises rhythms and tones;
extracting emotion words in song comments by using space, and matching the emotion words with emotion labels by using an emotion dictionary;
simultaneously extracting features of song elements, including extracting tone and rhythm features, and matching with emotion labels;
calculating the similarity of emotion labels by using cosine similarity, merging and unifying emotion labels with similarity higher than preset similarity, and classifying and normalizing elements;
Combining the matched and normalized emotion labels with elements to construct emotion label and element pairs, forming a data row by each emotion label and the corresponding element, and storing the data row in a table or a data set;
after the data integration is completed, triggering an updating flow of the knowledge graph, and adding the simplified data frame into the music knowledge graph, wherein the method comprises the following steps:
acquiring a current music knowledge graph structure, and determining an adding mode and a corresponding relation of new data; extracting necessary music information from the simplified data frame as a value of a new entity or attribute; editing an existing music knowledge graph by using Neo4j, and adding a new music entity and an entity association relation related to the new music entity; verifying and checking newly added entities and relations, and storing and updating after confirming the accuracy and the integrity of the newly added entities and relations; further comprises: entity identification and disambiguation are carried out through the cyclic neural network model, whether music entities in different data frames refer to the same entity is determined, and accurate entity information is obtained; establishing a relation model through a graph neural network, and extracting the relation between music entities from a data frame to obtain accurate entity association information;
The entity identification and disambiguation are performed through the cyclic neural network model, and whether the music entities in different data frames refer to the same entity is determined, so that accurate entity information is obtained, and the method specifically comprises the following steps: determining the entity types needing to be identified and disambiguated, including artists and songs in a music entity, and determining a data frame and a data source needing to be processed; the existing music database, the music related articles and comments crawled by the network are used for acquiring a data set containing sample entities and corresponding contexts as training data, so that the diversity and the representativeness of the training data are ensured; labeling entities and contexts in training data, and assigning a unique identifier to each entity; building an entity identification and disambiguation model by using entity identification, training a cyclic neural network model, identifying music entities appearing in an input text, including artists and song names, and outputting whether the music entities in different data frames refer to the same entity; training the entity identification and disambiguation model by using training data, and performing model tuning according to the effect of the verification data; the accuracy and the robustness of the model are improved by using cross verification; deploying the trained entity recognition and disambiguation model into a production environment, and calling the model through an API or other modes; inputting the entity to be disambiguated into a model, acquiring a unique identifier of the entity, and acquiring accurate entity information; periodically monitoring the performance of the entity recognition and disambiguation model, and continuously maintaining and improving to obtain user feedback and labels for iterative model training and optimization;
The building of a relation model through a graph neural network extracts the relation between music entities from a data frame to obtain accurate entity association information, and the method specifically comprises the following steps: obtaining music data including songs, artists, album entities and relationships between them from various data sources including music databases, music platforms, music reviews; cleaning and preprocessing the data, including removing noise and normalizing the data; performing feature extraction on the descriptions of songs and artists by using TF-IDF, wherein the descriptions comprise styles, genres and moods, and extracting features according to the audio characteristics of music by using music fingerprints; selecting a graph neural network model to process the relation between music entities according to the collected data and the extracted characteristics, regarding songs and artists of the music entities as nodes in the graph, regarding the relation between the music entities as edges in the graph, and training the graph neural network model to learn the interaction between the nodes and the edges; inputting the new music entity and the relation into a graph neural network model to obtain the association information between the music entities, inputting a new song and an artist thereof into the model, and outputting the association degree between the music entities by the graph neural network model; using cross verification to evaluate the accuracy and the robustness of the model, and adjusting and optimizing the model according to the evaluation result;
The method for generating a lyric modification scheme for modifying melody or lyrics by combining a music knowledge graph according to a predicted result of a decision tree algorithm comprises the following steps:
obtaining attention points of music producers or lyrics authors predicted by a decision tree model, wherein the attention points are melodies or lyrics;
according to the focus, the related information in the music knowledge graph is combined to analyze the modification or improvement needed to be carried out, if the decision tree prediction result is that a music producer or a lyric author pays attention to the melody, the theory or skill related to the music type or style is searched according to the music knowledge graph, the musical instrument and the music voice used in a music library of the music producer or the lyric author and the artist and the music style of the music producer or the lyric author cooperating in the past are obtained, the melody modification comprises the selection of adjusting notes, the structure of changing melodies and the enhancement of rhythm sense, and the lyric modification comprises the addition of emotion expression, the improvement of vocabulary selection or the optimization of description skills;
transmitting the lyric modification scheme to a related music producer or lyric author;
and acquiring feedback of a music producer or a lyric author, analyzing and adjusting again according to practice and feedback results, and if further modification and improvement are needed, continuing to optimize the lyric modification scheme.
2. The method of claim 1, wherein the obtaining unstructured text of the music platform using cloud computing power, including comments, lyrics, and preliminary classification using a naive bayes algorithm, comprises:
the cloud server acquires unstructured text data comprising comments and lyrics from a music platform public data interface; the text data is preprocessed by jieba, and the preprocessing comprises duplication removal, cleaning, word segmentation and part-of-speech tagging; after preprocessing, converting the data into feature vectors through TF-IDF for training of naive Bayes algorithm; after training is completed, feature vectors are input, and predicted categories are output and divided into topics with positive emotion, negative emotion, related music and unrelated music.
3. The method of claim 1, wherein the screening out text containing lyrics or comments and using an emotion analysis module for emotion scoring comprises:
text matches predetermined keywords or key phrases including song title, album, artist name and lyrics; the successfully matched text is regarded as containing lyrics or comments, and regular expression or character string matching is further carried out; recording the successfully matched text and inputting the successfully matched text for a subsequent naive Bayesian algorithm; acquiring a training data set containing emotion labels, preprocessing and converting the training data set into feature vectors; and after training the naive Bayes model, carrying out emotion classification and quantization to obtain emotion scores.
4. The method of claim 1, wherein the forming emotion tag-element pairs and merging and unifying similar or related tags to reduce complexity of knowledge graph comprises:
obtaining song comments on the music platform or obtaining the evaluation of the user on the song by using the social platform; extracting emotion words in the comments by using space, and matching with the emotion dictionary; extracting tone and rhythm characteristics in song elements and matching with the emotion labels; calculating emotion label similarity by using cosine similarity, and merging labels with similarity higher than a preset standard; performing element classification and normalization processing to construct a matched and normalized emotion label and element pair; each emotion tag and the corresponding element form a data line and are stored in a table or data set.
5. The method of claim 1, wherein integrating the unified tag-element pairs into a simplified data frame using the data integration engine comprises:
selecting a Talend data integration engine, and storing music emotion labels and music element data in a preset number of types of data sources, wherein the data sources comprise a database and CSV files; connecting to these data sources through an API or database using Talend; defining data conversion and mapping rules, and combining data into a simplified data frame in an engine operation interface; and automatically executing a data integration flow, and checking and verifying the data quality after integration is completed.
6. The method of claim 1, wherein the real-time detection mechanism synchronously updates newly added elements or tags in the data integration framework and automatically triggers merging and simplification of tags when a predetermined complexity is exceeded, comprising:
monitoring and merging newly added elements and labels in the data integration framework; evaluating the complexity of the data integration framework based on the number of tags and the number of tag layer levels; if the complexity exceeds the standard, K-means is applied to analyze the labels, and the labels with high similarity are divided into the same cluster; according to the characteristics of the cluster, a label merging strategy is formulated, and effect evaluation is carried out; if the complexity is still beyond the preset range, label merging and simplification are carried out again; further comprises: calculating the complexity of the data integration framework according to the number of the labels and the number of the label layer grades;
the calculating the complexity of the data integration framework according to the number of the labels and the label layer level number specifically comprises the following steps: acquiring music related data including song information, album information and artist information, and integrating music data of different data sources into a data integration frame; designing a label system for classifying and marking music data, and defining labels with different levels according to requirements, wherein the labels comprise music styles, music genres and artist types; counting the number of tags in the data integration framework, including all levels of tags, and checking the hierarchical structure of the tags in the data integration framework; for each tag, calculating the depth and width of the tag in the hierarchical structure, wherein the depth of the tag represents the number of layers of the tag in the hierarchical structure, and the width of the tag represents the number of peer tags with the same father level; acquiring an association relationship between labels, wherein the association relationship comprises a father-son relationship and a peer relationship; and carrying out normalization processing on each index, giving weight, multiplying the normalization value of each index by the corresponding weight, and adding the weighted values to obtain the complexity score.
7. The method of claim 1, wherein the analyzing the melody or lyrics of interest to the music producer or lyrics author through the decision tree algorithm according to the music knowledge graph comprises:
acquiring music knowledge graph information to train a decision tree model; analyzing melody or lyric aspects focused by music producers and lyrics authors, based on respective music types or styles, subjects and emotions; respectively carrying out feature extraction and model training on a music producer and a lyric author as independent classification tasks; and after training, predicting by using the decision tree model.
CN202311365109.0A 2023-10-20 2023-10-20 Knowledge graph mass unstructured integration method based on cloud computing power and big data technology Active CN117093718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311365109.0A CN117093718B (en) 2023-10-20 2023-10-20 Knowledge graph mass unstructured integration method based on cloud computing power and big data technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311365109.0A CN117093718B (en) 2023-10-20 2023-10-20 Knowledge graph mass unstructured integration method based on cloud computing power and big data technology

Publications (2)

Publication Number Publication Date
CN117093718A CN117093718A (en) 2023-11-21
CN117093718B true CN117093718B (en) 2024-04-09

Family

ID=88781601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311365109.0A Active CN117093718B (en) 2023-10-20 2023-10-20 Knowledge graph mass unstructured integration method based on cloud computing power and big data technology

Country Status (1)

Country Link
CN (1) CN117093718B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108475289A (en) * 2016-04-15 2018-08-31 谷歌有限责任公司 Suggest the system and method for content to writer for the content based on document
CN110852047A (en) * 2019-11-08 2020-02-28 腾讯科技(深圳)有限公司 Text score method, device and computer storage medium
CN111488465A (en) * 2020-04-14 2020-08-04 税友软件集团股份有限公司 Knowledge graph construction method and related device
CN113010701A (en) * 2021-02-25 2021-06-22 北京四达时代软件技术股份有限公司 Video-centered fused media content recommendation method and device
TW202223684A (en) * 2020-12-10 2022-06-16 中華電信股份有限公司 Music generation system and method based on music knowledge graph and intention recognition and computer-readable medium
KR20220112948A (en) * 2021-02-05 2022-08-12 이지은 Method for servicing musical contents based on user information

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108806656B (en) * 2017-04-26 2022-01-28 微软技术许可有限责任公司 Automatic generation of songs
US11157693B2 (en) * 2020-02-25 2021-10-26 Adobe Inc. Stylistic text rewriting for a target author

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108475289A (en) * 2016-04-15 2018-08-31 谷歌有限责任公司 Suggest the system and method for content to writer for the content based on document
CN110852047A (en) * 2019-11-08 2020-02-28 腾讯科技(深圳)有限公司 Text score method, device and computer storage medium
CN111488465A (en) * 2020-04-14 2020-08-04 税友软件集团股份有限公司 Knowledge graph construction method and related device
TW202223684A (en) * 2020-12-10 2022-06-16 中華電信股份有限公司 Music generation system and method based on music knowledge graph and intention recognition and computer-readable medium
KR20220112948A (en) * 2021-02-05 2022-08-12 이지은 Method for servicing musical contents based on user information
CN113010701A (en) * 2021-02-25 2021-06-22 北京四达时代软件技术股份有限公司 Video-centered fused media content recommendation method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
人工智能助推文化产业生产经营变革;李景平;张珊;;齐鲁艺苑(第06期);第116-122页 *

Also Published As

Publication number Publication date
CN117093718A (en) 2023-11-21

Similar Documents

Publication Publication Date Title
Ras et al. Advances in music information retrieval
Pérez-Sancho et al. Genre classification using chords and stochastic language models
Mesaros et al. Datasets and evaluation
Zheng et al. Music genre classification: A n-gram based musicological approach
Kolozali et al. Automatic ontology generation for musical instruments based on audio analysis
Farajzadeh et al. PMG-Net: Persian music genre classification using deep neural networks
Ünal et al. A hierarchical approach to makam classification of Turkish makam music, using symbolic data
Van Kranenburg A computational approach to content-based retrieval of folk song melodies
Knees et al. Towards semantic music information extraction from the web using rule patterns and supervised learning
CN115422947A (en) Ancient poetry assignment method and system based on deep learning
Dixon et al. Probabilistic and logic-based modelling of harmony
Nagavi et al. Overview of automatic Indian music information recognition, classification and retrieval systems
Krause et al. Hierarchical classification for instrument activity detection in orchestral music recordings
Van Balen Audio description and corpus analysis of popular music
Sarkar et al. Raga identification from Hindustani classical music signal using compositional properties
CN110134823B (en) MIDI music genre classification method based on normalized note display Markov model
CN117093718B (en) Knowledge graph mass unstructured integration method based on cloud computing power and big data technology
Kher Music Composer Recognition from MIDI Representation using Deep Learning and N-gram Based Methods
Silva et al. Real-time pattern recognition of symbolic monophonic music
Chu [Retracted] Feature Extraction and Intelligent Text Generation of Digital Music
Klügel et al. Towards Mapping Timbre to Emotional Affect.
Gopalakrishnan Search Engine and Recommendation System for the Music Industry built with JinaAI
Moon et al. How to Retrieve Music using Mood Tags in a Folksonomy
Wani et al. Music Suggestion Via Sentimental Analysis of User-Inputted Texts
Rico et al. Chord progressions selection based on song audio features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant