TW202320551A

TW202320551A - Music recommendation system and method based on image recognition and situational state

Info

Publication number: TW202320551A
Application number: TW110141339A
Authority: TW
Inventors: 廖奕雯; 蘇家輝; 曾弘宇; 陳育祺; 吳巧婷
Original assignee: 正修學校財團法人正修科技大學
Priority date: 2021-11-05
Filing date: 2021-11-05
Publication date: 2023-05-16

Abstract

A music recommendation system based on image recognition and situational status including: a training music database module; a situational image database module; an audio-visual training module; a scoring record module; a cloud database module, and the cloud database module includes a user database and a music file data; an image recognition module; an integrated analysis module; and a music recommendation module. A music recommendation method based on image recognition and contextual state, including: a step of creating audio and video data, a step of training of audio and video; a step of scoring preferences; a step of image recognition and classification; a step of selecting music styles; and a feature value ratio and recommended steps for music. Through the music recommendation system and method based on image recognition and situational state of the present invention, users are allowed to watch and listen to each situational image and training music track through the audio-visual training module in advance, and then score and record them. Then, when the user is driving, according to the scoring result, the integrated analysis module and the music recommendation module are calculated and processed, and the music tracks for the user's current situational is selected from a music file database and played.

Description

Music recommendation system and method based on image recognition and situational state

本發明係關於一種音樂推薦系統與方法，尤其是一種透過影像辨識方法來判斷駕駛人當下的情境狀態，並從一音樂檔案資料庫中推薦適合駕駛人當下情境狀態聆聽之基於影像辨識及情境狀態之音樂推薦系統與方法。The present invention relates to a music recommendation system and method, in particular to a method for judging the driver's current situational state through an image recognition method, and recommending from a music file database that is suitable for the driver's current situational state to listen to based on image recognition and situational state Music recommendation system and method.

在近年音樂串流用戶大幅成長的趨勢，可看出人們對於收聽音樂的需求與日俱增，目前的音樂串流媒體缺少適合的推薦的方法來建議使用者適合聆聽的音樂，過去使用演算法的方式推薦音樂只能推估出使用者可能喜歡的音樂類型。In recent years, music streaming users have grown significantly. It can be seen that people's demand for listening to music is increasing day by day. The current music streaming media lacks a suitable recommendation method to suggest music that is suitable for users to listen to. In the past, algorithms were used to recommend Music can only infer the type of music the user might like.

習知技術中華民國M523146號專利，係一種運用腦波回饋的智慧音樂及數位內容播放器、裝置及系統，依據當下腦波訊號進行智慧選擇音樂或數位資料庫內最適合的音樂或數位內容，並播放該音樂內容。藉此，可依據使用者的腦波訊號選擇適合使用者當下心境的音樂或數位內容進行播放，而達成紓壓或促進活動之目的。Known technology Patent No. M523146 of the Republic of China is a smart music and digital content player, device and system that uses brainwave feedback to intelligently select the most suitable music or digital content in the music or digital database based on the current brainwave signal. and play the music content. In this way, music or digital content suitable for the user's current state of mind can be selected and played according to the user's brain wave signal, so as to achieve the purpose of relieving stress or promoting activities.

惟，上述習知技術並未揭露針對駕駛員的情境狀態分析，透過事先的訓練與評分結果，來提升推薦適合音樂曲目的精準度以供駕駛人當下聆聽，滿足當駕駛人對於交通狀況與天氣造成情緒影響時，能提供舒緩情緒的音樂曲目以保持較佳的行車精神來提高行車安全。再且，習知技術對於駕駛人於駕駛當下所遇到的情境狀態並無較佳之音樂推薦技術，來滿足駕駛人當下的情緒需求，除了播放車上固定的音樂曲目或是透過廣播聆聽，並無實際透過事先模擬訓練的方式來來提高音樂推薦的精準度，因此習知技術有必要加以改良。However, the above-mentioned conventional technology does not disclose the analysis of the driver's situational state. Through the prior training and scoring results, the accuracy of recommending suitable music tracks is improved for the driver to listen to at the moment, satisfying the driver's understanding of traffic conditions and weather. When causing emotional impact, it can provide soothing music tracks to maintain a better driving spirit and improve driving safety. Moreover, the conventional technology does not have a better music recommendation technology for the situational state encountered by the driver at the moment of driving to meet the driver's current emotional needs. In addition to playing fixed music tracks on the car or listening through the radio, and There is no practical way to improve the accuracy of music recommendation through prior simulation training, so it is necessary to improve the known technology.

此外，另有I410811號專利雖與音樂推薦有關，但對駕駛車輛並未有對應之辨識及推薦。I699186、I723853等專利雖有情緒、腦波有關，但仍未有對應之車況辨識及推薦，亦有I71710號專利，雖是基於影像辨識的車聯網，但無涉音樂推薦或人類情緒。In addition, although another patent No. I410811 is related to music recommendation, there is no corresponding identification and recommendation for driving vehicles. Although patents such as I699186 and I723853 are related to emotions and brain waves, there is no corresponding vehicle condition recognition and recommendation. There is also patent No. I71710, which is based on image recognition for the Internet of Vehicles, but does not involve music recommendation or human emotions.

有鑑於此，本發明透過蒐集使用者觀看情境影片搭配聆聽音樂後的評分紀錄，並結合音樂特徵值相似度比對的技術，來推薦更適合並符合車輛駕駛人當下所期待的音樂，另外，更加入音樂推薦之評分機制，來優化推薦結果。In view of this, the present invention recommends music that is more suitable and in line with the current expectations of the vehicle driver by collecting the rating records of the user after watching the situational video and listening to the music, and combining the music feature value similarity comparison technology. A scoring mechanism for music recommendation is added to optimize the recommendation results.

本發明之一目的在提供一種基於影像辨識及情境狀態之音樂推薦系統與方法，具有依據駕駛人事先透過影音訓練所得之結果，並透過影像辨識判斷駕車時當下情境狀態後，提供適合駕駛人於當下駕車時所聆聽之音樂曲目的功能。An object of the present invention is to provide a music recommendation system and method based on image recognition and situational state, which can provide suitable information for the driver based on the results obtained by the driver through audio-visual training in advance, and after judging the current situational state while driving through image recognition. A function of the music track you are listening to while driving.

本發明之另一目的在提供一種基於影像辨識及情境狀態之音樂推薦系統與方法，具有提供駕駛人可以依據駕車當下情之境狀態記錄對應該情境之喜好音樂曲目，以供日後可以直接作為音樂播放曲目的功能。Another object of the present invention is to provide a music recommendation system and method based on image recognition and situation status, which can provide drivers with the ability to record favorite music tracks corresponding to the situation according to the current situation of driving, so that they can be directly used as music in the future Function to play tracks.

本發明之再一目的在提供一種基於影像辨識及情境狀態之音樂推薦系統與方法，具有提供駕駛人對於推薦音樂之再評分機制，來優化推薦結果以達到更精準的推薦音樂曲目的功能。Another object of the present invention is to provide a music recommendation system and method based on image recognition and situational status, which has the function of providing the driver with a re-rating mechanism for recommended music to optimize the recommendation results to achieve more accurate recommended music tracks.

為達成上述及其他目的，本發明之基於影像辨識及情境狀態之音樂推薦系統，包含：一訓練音樂資料庫模組，儲存依據音樂類型分類好之複數訓練曲目；一情境影像資料庫模組，儲存依據情境類型分類好之複數情境影像；連接該訓練音樂資料庫模組及該情境影像資料庫模組，存取該播放該訓練音樂資料庫模組中之訓練曲目及該情境影像資料庫模組中之情境影像，其中，每一情境影像對應搭配至少一訓練曲目同時播放；一評分紀錄模組，連接該影音訓練模組，提供使用者依據該音樂影像訓練模組所播放之該訓練曲目及該情境影像進行喜好評分並紀錄；一雲端資料庫模組，連接該評分紀錄模組，該雲端資料庫模組包含一使用者資料庫及一音樂檔案資料庫，其中，該使用者資料庫儲存使用者資料及該使用者評分資料，該音樂檔案資料庫儲存複數種類音樂曲風之音樂曲目；一影像辨識模組，針對使用者當下駕駛時之氣候與路況特徵進行情境狀態辨識分類，該影像辨識模組包含一影像深度學習單元提升辨識準確率；一整合分析模組，連接該該雲端資料庫模組及該影像辨識模組，依據該影像辨識模組當下所辨識之情境狀態分類結果與該雲端資料庫模組之該使用者資料庫進行整合分析比對，並產生一篩選音樂曲目資料表；及一音樂推薦模組，連接該雲端資料庫模組及該整合分析模組，將該篩選音樂曲目資料表之訓練曲目與該雲端資料庫模組之該音樂檔案資料庫進行音樂特徵搜尋比對，並從該音樂檔案資料庫中產生一適合使用者當下駕駛時之情境狀態聆聽之推薦音樂曲目播放清單供播放聆聽。In order to achieve the above and other purposes, the music recommendation system based on image recognition and situational status of the present invention includes: a training music database module storing multiple training tracks classified according to music types; a situational image database module, Store multiple situational images classified according to situational types; connect the training music database module and the situational image database module, access the training tracks in the training music database module and the situational video database module Situational images in the group, wherein each situational image is matched with at least one training track to be played at the same time; a score recording module is connected to the audio-visual training module to provide the user with the training track played according to the music video training module and the scene image for preference scoring and recording; a cloud database module, connected to the scoring record module, the cloud database module includes a user database and a music file database, wherein the user database Store user data and the user's rating data, the music file database stores music tracks of multiple types of music genres; an image recognition module, which recognizes and classifies the situation and state according to the user's current climate and road condition characteristics while driving, the The image recognition module includes an image deep learning unit to improve the recognition accuracy; an integrated analysis module connects the cloud database module and the image recognition module, and classifies the results according to the situational status currently recognized by the image recognition module performing integrated analysis and comparison with the user database of the cloud database module, and generating a screened music track data table; and a music recommendation module, connecting the cloud database module and the integrated analysis module, and Search and compare the music characteristics of the training tracks of the screened music track data table with the music file database of the cloud database module, and generate a listening experience suitable for the user's current driving situation from the music file database A playlist of recommended music tracks for playback and listening.

為達成上述及其他目的，本發明之基於影像辨識及情境狀態之音樂推薦方法，包含：一影音資料建檔步驟，將複數訓練曲目與複數情境影像分別儲存於一訓練音樂資料庫模組中及一情境影像資料庫模組中，其中，該複數訓練曲目分類成複數種音樂曲風；一影音訓練步驟，提供一影音訓練模組供使用者操作，該影音訓練模組並分別從該訓練音樂資料庫模組及該情境影像資料庫模組中，提取訓練曲目與情境影像，並由提供使用者點選情境影像及訓練曲目同時進行播放，讓使用者觀看及聆聽執行訓練，其中，每一情境影像畫面播放時，同時搭配該複數訓練曲目中的至少一曲目；一喜好評分步驟，當使用者在該影音訓練步驟中完成每一次影音訓練後，提供一評分紀錄模組供使用者對於當次之情境影像畫面與對應之訓練曲目之喜好感受進行評分，並將該使用者資訊與其評分紀錄儲存至之一使用者資料庫備用，其中，使用者可以設定評分標準門檻；一影像辨識及分類步驟，透過一影像辨識模組針對使用者當下駕駛時之氣候與路況特徵進行情境狀態辨識分類，並將該氣候與路況特徵分類成該複數情境影像之其中一種；一音樂曲風篩選步驟，依據該影像辨識及分類步驟中所判斷出之情境影像狀態，從該使用者資料庫中找出該使用者資訊與其評分紀錄，並透過所設定評分標準門檻從該複數訓練曲目中找出對應之曲目，產生一篩選音樂曲目資料表；及一特徵值比對及音樂推薦步驟，依序分析該篩選音樂曲目資料表中訓練曲目之音樂特徵，並搜尋一音樂檔案資料庫，從該音樂檔案資料庫中找出與該篩選音樂曲目資料表中訓練曲目對應之音樂特徵相似的音樂曲目作為推薦音樂曲目，並將該推薦音樂曲目作為播放清單以供使用者播放聆聽。In order to achieve the above and other objects, the music recommendation method based on image recognition and situational status of the present invention includes: an audio-visual data archiving step, storing multiple training tracks and multiple situational images in a training music database module and In a scene image database module, wherein, the plurality of training tracks are classified into a plurality of music genres; an audio-visual training step provides an audio-visual training module for the user to operate, and the audio-visual training module is respectively obtained from the training music In the database module and the situational image database module, the training tracks and the situational images are extracted, and the user is provided to click on the situational images and the training tracks to play at the same time, allowing the user to watch and listen to the execution training, wherein each When the scene image is played, at least one of the plurality of training tracks is matched at the same time; a preference scoring step, after the user completes each audio-visual training in the audio-visual training step, a scoring record module is provided for the user to evaluate the current Second, score the situational image screen and the preferences and feelings of the corresponding training tracks, and store the user information and its scoring records in a user database for future use, wherein the user can set the scoring standard threshold; 1. Image recognition and classification The step of using an image recognition module to identify and classify the climate and road condition characteristics of the user while driving, and classify the climate and road condition characteristics into one of the plurality of situation images; a music genre screening step, based on Find out the user information and scoring records from the user database for the situational image status judged in the image recognition and classification step, and find out the corresponding track from the plurality of training tracks through the set scoring standard threshold , generating a screened music track data table; and a feature value comparison and music recommendation step, sequentially analyzing the music features of the training tracks in the screened music track data table, and searching a music file database, from the music file database Find a music track similar to the music feature corresponding to the training track in the filtered music track data table as a recommended music track, and use the recommended music track as a playlist for the user to play and listen to.

在本發明的一些實施例中，該影像辨識模組另包含一影像資料增強單元，具有資料增強技術包含限制對比度自適應直方圖均衡化技術或影像放大技術。In some embodiments of the present invention, the image recognition module further includes an image data enhancement unit, which has data enhancement technology including limited contrast adaptive histogram equalization technology or image enlargement technology.

在本發明的一些實施例中，該影像辨識模組另包含一車速偵測單元偵測紀錄車速狀態。In some embodiments of the present invention, the image recognition module further includes a vehicle speed detection unit to detect and record the vehicle speed state.

在本發明的一些實施例中，該音樂推薦模組另具有一喜好樂曲儲存單元。In some embodiments of the present invention, the music recommendation module further has a favorite music storage unit.

在本發明的一些實施例中，該音樂推薦模組另具有一推薦曲目評分單元。In some embodiments of the present invention, the music recommendation module further has a recommended track scoring unit.

在本發明的一些實施例中，該影像辨識及分類步驟另包含一影像資料增強步驟，強化影像資料，增加情境狀態辨識的準確度。In some embodiments of the present invention, the image recognition and classification step further includes an image data enhancement step to enhance the image data to increase the accuracy of situational state recognition.

在本發明的一些實施例中，該影像辨識及分類步驟另包含一車速偵測步驟，偵測駕駛時當下車速狀態，輔助情境狀態辨識分類的準確度。In some embodiments of the present invention, the image recognition and classification step further includes a vehicle speed detection step, which detects the current vehicle speed state during driving, and assists the accuracy of situational state recognition and classification.

在本發明的一些實施例中，另包含一喜好樂曲記錄步驟，提供使用者記錄先前聽過覺得喜好之樂曲，供後續在相同情境狀態下可以再次聆聽。In some embodiments of the present invention, there is also a step of recording favorite music, allowing the user to record the favorite music that he has heard before, so that he can listen to it again in the same situation.

在本發明的一些實施例中，另包含一推薦曲目評分步驟，提供使用者針對推薦曲目進行評分，優化日後推薦曲目的精準度。In some embodiments of the present invention, a recommended track scoring step is further included, which provides users with ratings for the recommended tracks to optimize the accuracy of future recommended tracks.

圖1為本發明之基於影像辨識及情境狀態之音樂推薦系統之一實施例架構圖，請參考圖1。本發明之基於影像辨識及情境狀態之音樂推薦系統，包含：一訓練音樂資料庫模組(10)、一情境影像資料庫模組(20)、一影音訓練模組(30)、一評分紀錄模組(40)、一雲端資料庫模組(50)、一影像辨識模組(60)、一整合分析模組(70) 及一音樂推薦模組(80)。其中，該訓練音樂資料庫模組(10)儲存有依據音樂類型分類好之複數訓練曲目，例如音樂類型有藍調、鄉村、民謠、爵士樂、拉丁、新紀元、流行樂、饒舌、搖滾、古典、樂器演奏、聲樂及舞曲分為13種類型，每種類型蒐集10首樂曲作為訓練曲目。FIG. 1 is a structural diagram of an embodiment of a music recommendation system based on image recognition and situation status according to the present invention, please refer to FIG. 1 . The music recommendation system based on image recognition and situation status of the present invention includes: a training music database module (10), a situation image database module (20), an audio-visual training module (30), and a scoring record module (40), a cloud database module (50), an image recognition module (60), an integrated analysis module (70) and a music recommendation module (80). Wherein, the training music database module (10) stores a plurality of training tracks classified according to music types, such as blues, country, folk, jazz, Latin, new age, pop, rap, rock, classical, musical instruments Performance, vocal music and dance music are divided into 13 types, and 10 pieces of music are collected for each type as training repertoire.

該情境影像資料庫模組(20)儲存有依據情境類型分類好之複數情境影像，該複數情境影像可以據實際需求依據氣候或環境狀態來分類錄製，例如:白天或夜晚、晴天或雨天、道路順暢或道路壅塞及城市道路或鄉村道路等情境狀態，並將各種類型之情境影像錄製好後儲存於該情境影像資料庫模組(20)中。The situational image database module (20) stores multiple situational images classified according to the type of situation, and the multiple situational images can be classified and recorded according to actual needs according to climate or environmental conditions, such as: day or night, sunny or rainy, road Situation states such as smooth or blocked roads, urban roads or country roads, and various types of situational images are recorded and stored in the situational image database module (20).

較佳地，在本實施例中，該情境影像資料庫模組(20)共儲存八種情境影像檔案，包含:一白天-晴天-順暢情境影像、一白天-晴天-雍塞情境影像、一白天-雨天-順暢情境影像、一白天-雨天-雍塞情境影像、一夜晚-夜晴-順暢情境影像、一夜晚-夜晴-雍塞情境影像、一夜晚-雨天-順暢情境影像及一夜晚-雨天-雍塞情境影像，主要係依據日夜、氣候晴雨及道路順暢壅塞狀況來錄製上述共八種駕駛時可能遇到的情境影片。一般來說，駕駛人在白天與夜晚及晴天與雨天駕車時，由於精神或是視線關係，較容易造成駕駛時的情緒有不同的起伏，另外，道路的順暢或壅塞也容易造成駕駛心情開心或煩躁之因素。因此，使用上述八種情境影像可以更真實地作為模擬訓練使用的情境影像，以更符合駕駛人當下的情境狀態。Preferably, in this embodiment, the situational image database module (20) stores eight kinds of situational image files in total, including: a daytime-sunny day-smooth situational image, a daytime-sunny day-Yongsai situational image, a Daytime-Rainy-Smooth Situation Image, One Day-Rainy-Yongsai Situational Image, One Night-Night Clear-Smooth Situational Image, One Night-Yeqing-Yongsai Situational Image, One Night-Rainy-Smooth Situational Image and One Night -Rainy day-Yongsai situational video mainly records the above eight kinds of situational videos that may be encountered while driving based on day and night, weather, rain, and smooth road congestion. Generally speaking, when driving during the day and night, as well as on sunny and rainy days, due to the relationship between spirit and sight, it is more likely to cause different ups and downs in driving emotions. Anxiety factor. Therefore, using the above eight kinds of situational images can be more realistically used as situational images for simulation training, so as to be more in line with the driver's current situational state.

該影音訓練模組(30)連接該訓練音樂資料庫模組(10)及該情境影像資料庫模組(20)，存取該播放該訓練音樂資料庫模組(10)中之訓練曲目及該情境影像資料庫模組(20)中之情境影像，其中，每一情境影像對應搭配至少一訓練曲目同時播放。該影音訓練模組(30)可以為一VR(Virtual Reality)虛擬實境視聽裝置或是影音播放設備，供使用者進行「看」與「聽」之模擬訓練。舉例來說：若有M組情境影像及選取N首訓練曲目，每一位使用者則會產生MxN組的影音訓練記錄。另外，每次播放的影音檔案長度可以自行設定，例如播放30 秒作為一次訓練時間，在播放的過程中，藉由影片情境畫面與音樂旋律讓使用者去感受體驗當下情緒來作為喜好與否不同程度的判斷。The audio-visual training module (30) is connected to the training music database module (10) and the situational image database module (20), accessing the training tracks and playing in the training music database module (10) The situational images in the situational imagery database module (20), wherein each situational image is matched with at least one training track to be played simultaneously. The audio-visual training module (30) can be a VR (Virtual Reality) virtual reality audio-visual device or audio-visual playback equipment for users to perform simulation training of "seeing" and "listening". For example: if there are M sets of situational images and N training tracks are selected, each user will generate MxN sets of audio-visual training records. In addition, the length of the audio-visual file for each playback can be set by yourself. For example, 30 seconds of playback is used as a training time. During the playback process, the user can experience the current mood through the video scene and music melody to determine whether they like it or not. degree of judgment.

該評分紀錄模組(40)連接該影音訓練模組(30)，提供使用者依據該音樂影像訓練模組(30)所播放之該訓練曲目及該情境影像進行喜好評分。由於該情境影像畫面即是模擬使用者在駕車時可能會遇到的情境狀態，在本實施例中，例如播放該白天-雨天-雍塞情境影像，即是模擬駕駛人在白天開車時，遇到下雨且路況是塞車狀態，當然，所搭配的音樂訓練曲目可以由該影音訓練模組(30)自動選取或由使用者自行任意選曲或依據音樂類型選取。The score recording module (40) is connected to the audio-visual training module (30), and provides users with preference scoring according to the training track and the situational image played by the music video training module (30). Since the situational image screen is to simulate the situational state that the user may encounter when driving, in this embodiment, for example, playing the daytime-rainy day-Yongsai situational image is to simulate the situation when the driver drives in the daytime. When it is raining and the road condition is a traffic jam, of course, the matching music training repertoire can be automatically selected by the audio-visual training module (30) or selected arbitrarily by the user or according to the type of music.

當使用者觀看每一段情境影像及聆聽其所搭配的訓練曲目音樂時，會有不同情緒及對於當下的音樂產生喜好或厭惡的感覺，當影音檔案播放結束後，該評分紀錄模組(40)提供使用者針對每一首音樂給予評分，例如1~5分，分數越高表示越喜歡，當使用者給予評分後，該評分紀錄模組(40)將會紀錄該使用者對於該白天-雨天-雍塞情境影像及其所搭配的各首訓練曲目進行評分，作為後續音樂推薦的參考依據。較佳地，該評分紀錄模組(40)提供使用者可以設定評分標準門檻，例如，推薦音樂評分標準門檻值設為4分，則評分4分以上之訓練曲目，才列入推薦音樂參考，評分為1~3分則只列入評分紀錄，不列入推薦音樂參考依據。When the user watches each scene video and listens to the matching training track music, there will be different emotions and a feeling of liking or dislike for the current music. After the video file is played, the scoring record module (40) Provide users with ratings for each piece of music, such as 1 to 5 points, the higher the score, the more they like it. After the user gives a rating, the rating recording module (40) will record the user's rating for the day-rainy day - Score the situational video of Yongsai and its matching training tracks, which will be used as a reference for subsequent music recommendations. Preferably, the scoring record module (40) provides users with the ability to set the scoring standard threshold. For example, if the recommended music scoring standard threshold is set to 4 points, then the training tracks with a score of 4 points or more will be included in the recommended music reference. Scores of 1 to 3 will only be included in the score record, and will not be included in the recommended music reference.

該雲端資料庫模組(50)連接該評分紀錄模組(40)，該雲端資料庫模組(50)包含一使用者資料庫(51)及一音樂檔案資料庫(52)，其中，該使用者資料庫(51)儲存使用者資料及該使用者評分資料，該音樂檔案資料庫(52)儲存複數種類音樂曲風之音樂曲目。該使用者資料包含使用者註冊帳戶、姓名、影音訓練時間及登入時間…等，該使用者評分資料包含該使用者使用該評分紀錄模組(40)時之紀錄，包含每一情境影像及其所對應之每一首訓練曲目之評分分數紀錄及評分紀錄之日期時間，可以藉由評分記錄時間來更新對訓練曲目的喜好程度，該音樂檔案資料庫(52)所儲存之音樂曲目用以作為推薦音樂之音樂檔案庫，且可隨時增加、刪除或修改音樂檔案資料。The cloud database module (50) is connected to the scoring record module (40), and the cloud database module (50) includes a user database (51) and a music file database (52), wherein the The user database (51) stores user data and rating data of the user, and the music file database (52) stores music tracks of multiple types of music genres. The user data includes user registration account, name, audio-visual training time and login time... etc. The user rating data includes the record when the user uses the rating record module (40), including each situational image and its The score record and the date and time of the score record of each corresponding training track can be updated by the score record time to the degree of preference for the training track. The music track stored in the music file database (52) is used as A music file library for recommended music, and music file data can be added, deleted or modified at any time.

該影像辨識模組(60)針對使用者當下駕駛時之氣候與路況特徵進行情境狀態辨識分類，該影像模組(60)包含一影像深度學習單元(61)提升辨識準確率。當使用者在開車當下，透過該影像辨識模組(60)來對前方所拍攝到的環境路況影像進行分析判斷，包含依據光線的亮度、晴雨狀況及與前方車輛的距離及前方車輛數量等，來判斷目前的情境狀態。該影像辨識模組(60)可以透過車上之行車記錄器之攝像鏡頭來擷取影像畫面，該影像深度學習單元(61)係作為影像辨識的影像訓練神經網路模型，用以作為影像辨識之深度學習訓練之用。The image recognition module (60) performs situational state recognition and classification according to the climate and road condition characteristics of the user's current driving, and the image module (60) includes an image deep learning unit (61) to improve the recognition accuracy. When the user is driving, the image recognition module (60) is used to analyze and judge the environmental road condition image captured ahead, including the brightness of the light, rain or shine, the distance to the vehicle ahead, and the number of vehicles ahead, etc. to judge the current state of the situation. The image recognition module (60) can capture image frames through the camera lens of the driving recorder on the car, and the image deep learning unit (61) is used as an image training neural network model for image recognition, and is used as an image recognition model. for deep learning training.

為避免實際執行時的影像辨識結果錯誤率過高，該影像辨識模組(60)之該影像深度學習單元(61)使用ResNet50架構進行訓練來提升辨識準確率。ResNet50為一種深度學習的架構，為避免產生過度擬合與訓練退化的問題，透過ResNet50的殘差模組，隨著訓練次數提高，可以訓練更深層的神經網絡，也同時提高辨識準確度可達85%以上。在本實施例中，透過蒐集6,000張白天與夜晚的情境影像圖片作為訓練資料， 20,000張晴天與雨天的情境影像圖片作為訓練資料，以及9,000張雍塞與順暢的情境影像圖片作為訓練資料，包含：白天-晴天-順暢情境影像圖片、白天-晴天-雍塞情境影像圖片、白天-雨天-順暢情境影像圖片、白天-雨天-雍塞情境影像圖片、夜晚-夜晴-順暢情境影像圖片、夜晚-夜晴-雍塞情境影像圖片、夜晚-雨天-順暢情境影像圖片及夜晚-雨天-雍塞情境影像圖片，共八大類氣候與路況之情境狀態之圖片作為深度學習的資料。In order to avoid an excessively high error rate of image recognition results during actual execution, the image deep learning unit (61) of the image recognition module (60) uses the ResNet50 framework for training to improve recognition accuracy. ResNet50 is a deep learning architecture. In order to avoid the problems of overfitting and training degradation, through the residual module of ResNet50, as the number of training increases, a deeper neural network can be trained, and the recognition accuracy can be improved at the same time. More than 85%. In this embodiment, by collecting 6,000 daytime and night situation image pictures as training data, 20,000 sunny and rainy situation image pictures as training data, and 9,000 Yongsai and smooth situation image pictures as training data, including : Daytime-Sunny-Smooth Situation Image, Day-Sunny-Yongsai Situational Image, Day-Rainy-Smooth Situational Image, Day-Rainy-Yongsai Situational Image, Night-Night Clear-Smooth Situational Image, Night -Sunny night-Yongsai situational image picture, night-rainy day-smooth situational image picture and night-rainy day-Yongsai situational image picture, a total of eight categories of pictures of the situational state of climate and road conditions are used as data for deep learning.

圖2為本發明之基於影像辨識及情境狀態之音樂推薦系統之另一實施例架構圖，請參考圖2。為了更提高該影像辨識模組(60)之辨識精準度，該影像辨識模組(60)另包含一影像資料增強單元(62) ，具有資料增強技術包含限制對比度自適應直方圖均衡化技術或影像放大技術。該影像資料增強單元(62)具有兩種資料增強技術，第一種為限制對比度自適應直方圖均衡化(Contrast Limited Adaptive Histogram Equalization，簡稱CLAHE)，當影像圖片的前景或背景過亮、過暗時，該方法可以讓亮度更好的在影像上分布，可以避免晴天多雲時的天空，容易被誤判成雨天的狀況，此方法適用在晴天與雨天之判斷辨識。在本實施例中，將原本蒐集的影像資料進行增強，使原始影像6,000張加上增強後的影像6,000張，共12,000張以上影像資料進行深度學習訓練，可提升準確度並維持在85%以上。FIG. 2 is a structural diagram of another embodiment of the music recommendation system based on image recognition and situation status of the present invention, please refer to FIG. 2 . In order to further improve the recognition accuracy of the image recognition module (60), the image recognition module (60) further includes an image data enhancement unit (62), which has data enhancement technology including limited contrast adaptive histogram equalization technology or Image magnification technology. The image data enhancement unit (62) has two kinds of data enhancement techniques, the first is Contrast Limited Adaptive Histogram Equalization (CLAHE for short), when the foreground or background of the video picture is too bright or too dark When using this method, the brightness can be better distributed on the image, and it can avoid the cloudy sky on a sunny day, which is easily misjudged as a rainy day. This method is suitable for the judgment and identification of sunny days and rainy days. In this embodiment, the originally collected image data is enhanced, so that 6,000 original images plus 6,000 enhanced images, a total of more than 12,000 image data are used for deep learning training, which can improve the accuracy and maintain it at more than 85%. .

續上所述，第二種資料增強技術為影像放大方式，為了避免深度學習模型在某些情境會有學習上的不足或是資料上的不足，故使用此方法來增加學習資料，以增加辨識的準確率，此方法適用路況的順暢及壅塞的判斷辨識。放大方法採用裁切影像的方式，但會保留原影像的尺寸規格。舉例來說，當裁切尺寸為寬1280像素、高720像素的影像時，裁切後的尺寸仍會維持寬1280像素、高720像素，尺寸不會有所減損，但影像會有所放大。當相同圖片經過縮放處理後來做為訓練資料時，處理前的影像和處理後的影像雖然實質上是同一張影像，但對於機器來說是截然不同的資料，因此，藉由使用此方法增加訓練資料量，可提升辨識準確度並維持在85%以上。Continuing from the above, the second data enhancement technology is the image enlargement method. In order to avoid the insufficiency of the deep learning model in some situations or the lack of data, this method is used to increase the learning data to increase the recognition The accuracy rate, this method is suitable for the judgment and identification of smooth road conditions and congestion. The enlargement method adopts the method of cropping the image, but the size specification of the original image will be preserved. For example, when cropping an image with a width of 1280 pixels and a height of 720 pixels, the cropped size will still maintain a width of 1280 pixels and a height of 720 pixels. The size will not be reduced, but the image will be enlarged. When the same image is scaled and then used as training data, although the pre-processed image and the processed image are essentially the same image, they are completely different data for the machine. Therefore, by using this method to increase training The amount of data can improve the recognition accuracy and maintain it above 85%.

續上所述，在本實施例中，由於此狀態位於學習模型樹狀圖的最下層，也就代表需要符合的情境條件越多，在蒐集資料上也會更難找，所以將原本蒐集的影像資料，分別為晴天順暢及晴天壅塞之情境、雨天順暢及雨天壅塞之情境、夜晴順暢及夜晴壅塞之情境以及夜雨順暢及夜雨壅塞之情境等情境狀態進行資料增強後，以上四種情境狀態從原始影像6,000張再加上增強影像6,000張，每種情境狀態會有12,000張的影像圖片，辨識準確度會維持在85%以上。Continuing from the above, in this embodiment, since this state is located at the bottom of the tree diagram of the learning model, it means that the more situational conditions need to be met, the more difficult it is to find the collected data, so the originally collected The video data are the situation of smooth and sunny day and congested sunny day, the situation of smooth and rainy day and congested rainy day, the situation of smooth and sunny night and congested night, and the situation of smooth and congested night rain and night rain. After data enhancement, the above four There will be 6,000 original images plus 6,000 enhanced images for each situational state. There will be 12,000 images for each situational state, and the recognition accuracy will remain above 85%.

請續參考圖2。較佳地，該影像辨識模組(60)另包含一車速偵測單元(63)偵測紀錄車速狀態。為了增加道路狀況的順暢或雍塞的辨識準確性，可以搭配該車速偵測單元(63)偵測紀錄車速狀態來提高辨識精準度，尤其當前方畫面在順暢或雍塞的辨識上有難度的情況下，例如視線畫面不清楚或是道路施工所造成的壅塞情況，而非一般車流量多壅擠時的雍塞狀況，其中，該車速偵測單元(63)可為一GPS定位系統或是與一行車導航裝置電性連接，藉由位置移動的距離及所耗時間推算行車速度。舉例來說，若短時間內車速維持在一定速度以下，例如五分鐘內車速持續維持在40公里/小時以下，或是車速在十分鐘內由低速轉高速又由高速轉低速達5次以上，且車速差距10~30公里/小時，表示車輛處於走走停停狀況，以上都是屬塞車的情況。藉由搭配該車速偵測單元(63)可以提高道路雍塞或順暢情境的判斷準確度。Please continue to refer to Figure 2. Preferably, the image recognition module (60) further includes a vehicle speed detection unit (63) to detect and record the vehicle speed state. In order to increase the smoothness of road conditions or the recognition accuracy of Yongsai, the vehicle speed detection unit (63) can be used to detect and record the speed status to improve the recognition accuracy, especially when the front screen is difficult to identify smooth or Yongsai Under the situation, for example the line of sight picture is not clear or the congested situation that road construction causes, rather than the Yong congested situation when the general traffic volume is much congested, wherein, this vehicle speed detection unit (63) can be a GPS positioning system or It is electrically connected with a driving navigation device, and the driving speed is estimated based on the distance and time spent in moving the position. For example, if the vehicle speed remains below a certain speed within a short period of time, for example, the vehicle speed remains below 40 km/h within five minutes, or the vehicle speed changes from low speed to high speed and then from high speed to low speed for more than 5 times within ten minutes, And if the speed difference is 10-30 km/h, it means that the vehicle is in a stop-and-go situation, and all of the above are traffic jams. By collocating the vehicle speed detection unit (63), the judgment accuracy of road congestion or smooth situations can be improved.

請續參考圖1。該整合分析模組(70)連接該雲端資料庫模組(50)及與該影像辨識模組(60)，依據該影像辨識模組(60)當下所辨識之情境狀態分類結果與該雲端資料庫模組(50)之該使用者資料庫(51)進行分析比對，並產生一篩選音樂曲目資料表。當該影像辨識模組(60)辨識完駕駛人當下的情境後，辨識結果會判定為屬於該情境影像資料庫模組(20)中的哪一種情境影像，在本實施例中，共有八種情境狀態，分別為一白天-晴天-順暢情境狀態、一白天-晴天-雍塞情境狀態、一白天-雨天-順暢情境狀態、一白天-雨天-雍塞情境狀態、一夜晚-夜晴-順暢情境狀態、一夜晚-夜晴-雍塞情境狀態、一夜晚-雨天-順暢情境狀態及一夜晚-雨天-雍塞情境狀態。Please continue to refer to Figure 1. The integrated analysis module (70) is connected to the cloud database module (50) and the image recognition module (60), according to the situation status classification result currently recognized by the image recognition module (60) and the cloud data The user database (51) of the library module (50) is analyzed and compared, and a screened music track data table is generated. After the image recognition module (60) has identified the driver's current situation, the recognition result will determine which situation image belongs to the situation image database module (20). In this embodiment, there are eight types Situational state, respectively one day-sunny-smooth situational state, one day-sunny-yongsai situational state, one day-rainy-smooth situational state, one day-rainy-yongsai situational state, one night-night sunny-smooth Situational state, one night-night clear-Yongsai situational state, one night-rainy day-smooth situational state and one night-rainy day-Yongsai situational state.

舉例來說，若當該影像辨識模組(60)辨識結果判定為屬於該白天-雨天-順暢情境狀態，則該整合分析模組(70)透過網路連線方式例如4G或5G通訊系統，利用該使用者資料庫(51)進行分析比對，該整合分析模組(70)跟據駕駛人(即使用者)的身分及其先前使用該評分紀錄模組(40)及該影音訓練模組(30)所作之影音訓練及評分的記錄進行分析，找出當初對該白天-雨天-順暢情境影像所搭配之各訓練曲目之評分，並依據該評分紀錄模組(40)之評分設定篩選出該篩選音樂曲目資料表，其中，該篩選音樂曲目資料表為先前達評分標準門檻值以上之訓練曲目。For example, if the recognition result of the image recognition module (60) is determined to belong to the daytime-rainy-smooth situational state, then the integrated analysis module (70) through a network connection such as a 4G or 5G communication system, Using the user database (51) for analysis and comparison, the integrated analysis module (70) is based on the identity of the driver (i.e. the user) and his previous use of the scoring record module (40) and the audio-visual training module Analyze the audio-visual training and scoring records made by group (30), find out the scores of each training track that was originally matched with the daytime-rainy-smooth scene image, and filter according to the scoring setting of the scoring record module (40) The screened music track data table is generated, wherein the screened music track data table is a training track that has previously reached the threshold value of the scoring standard.

該音樂推薦模組(80)連接該該雲端資料庫模組(50)及該整合分析模組(70)，將該篩選音樂曲目資料表之訓練曲目與與該雲端資料庫模組(50)之該音樂檔案資料庫(52)進行音樂特徵搜尋比對，並從該音樂檔案資料庫(52)中產生一適合使用者當下駕駛時之情境狀態聆聽之推薦音樂曲目播放清單供播放聆聽。其中，音樂特徵包含如旋律(melody)、節拍(rhythm)及和弦(chord)等，在本實施例中，音樂特徵擷取方法採用梅爾倒頻譜係數(MFCC)。梅爾倒頻譜係數包含兩大重要子步驟，分別為梅爾頻率分析與倒譜分析，透過音樂片段產生的音頻信號，其頻率、幅度和相位決定音頻信號的特徵，依據該篩選音樂曲目資料表之訓練曲目之音樂特徵，從該音樂檔案資料庫(52)中找出音樂特徵相符合之音樂曲目作為推薦音樂曲目，並將其整理成播放清單供音樂播放器播放。當然，該推薦音樂曲目的數量及選取方式可以依據使用者設定，例如每次隨機挑選30首符合之推薦音樂曲目作為播放清單。The music recommendation module (80) is connected to the cloud database module (50) and the integrated analysis module (70), and the training track of the filtered music track data table is connected with the cloud database module (50) The music file database (52) performs music feature search and comparison, and generates a playlist of recommended music tracks suitable for the user to listen to in a situational state while driving from the music file database (52) for playing and listening. The music features include melody, rhythm and chord, etc. In this embodiment, the music feature extraction method uses Mel cepstral coefficients (MFCC). Mel cepstrum coefficients include two important sub-steps, namely Mel frequency analysis and cepstrum analysis. The frequency, amplitude and phase of the audio signal generated through the music clip determine the characteristics of the audio signal. According to the music track data table According to the music features of the training track, find out the music track that the music feature matches from the music file database (52) as the recommended music track, and organize it into a playlist for the music player to play. Of course, the number and selection method of the recommended music tracks can be set according to user settings, for example, 30 matching recommended music tracks are randomly selected each time as a playlist.

請續參考圖2。較佳地，該音樂推薦模組(80)另具有一喜好樂曲儲存單元(81) 。由於該音樂檔案資料庫(52)音樂曲目眾多，當被推薦到很喜歡的音樂時，可以透過該喜好樂曲儲存單元(81)將喜愛的曲目記錄儲存之，可供日後當遇到同樣情境狀態時直接列入推薦音樂曲目播放清單。Please continue to refer to Figure 2. Preferably, the music recommendation module (80) also has a favorite music storage unit (81). Due to the large number of music tracks in the music file database (52), when the favorite music is recommended, the favorite track storage unit (81) can be used to record and store the favorite tracks, which can be used in the future when encountering the same situation directly into the playlist of recommended music tracks.

較佳地，該音樂推薦模組(80)另具有一推薦曲目評分單元(82)。該音樂推薦模組(80)所推薦之曲目可能不被駕駛人喜歡，為了避免或減少此類狀況在日後推薦音樂時持續發生，該推薦曲目評分單元(82)提供使用者對所聽到的推薦音樂曲目進行再評分的程序。例如，某首推薦曲目被駕駛員評定為2分，且推薦音樂參考門檻值設為4分，則表示此首音樂及其曲風雖然符合最初推薦標準，但是音樂的喜好確實是有主觀性的。因此，此首推薦曲目或其類似之音樂特徵之音樂曲目將不會再被列入推薦音樂曲目播放清單，可以增加該音樂推薦模組(80)在搜尋推薦曲目時的效率。Preferably, the music recommendation module (80) further has a recommended track scoring unit (82). The songs recommended by the music recommendation module (80) may not be liked by the driver. In order to avoid or reduce the continuous occurrence of such situations when recommending music in the future, the recommended song scoring unit (82) provides the user with the recommended Procedures for regrading music tracks. For example, if a recommended song is rated as 2 points by the driver, and the recommended music reference threshold is set to 4 points, it means that although the music and its style meet the initial recommendation standard, the music preference is indeed subjective . Therefore, this recommended track or music tracks with similar music characteristics will no longer be included in the recommended music track playlist, which can increase the efficiency of the music recommendation module (80) when searching for recommended tracks.

圖3為本發明之基於影像辨識及情境狀態之音樂推薦方法之一實施例流程圖，請參考圖3。本發明之基於影像辨識及情境狀態之音樂推薦方法系應用於該基於影像辨識及情境狀態之音樂推薦系統，其方法包含：一影音資料建檔步驟(S0)、一影音訓練步驟(S1)、一喜好評分步驟(S2)、一影像辨識及分類步驟(S3)、一音樂曲風篩選步驟(S4)、一特徵值比對及音樂推薦步驟(S5)。該影音資料建檔步驟(S0)將複數訓練曲目與複數情境影像分別儲存於一訓練音樂資料庫模組中及一情境影像資料庫模組中，其中，該複數訓練曲目分類依據音樂風格屬性分成複數種音樂曲風，每類音樂曲風儲存有複數曲目，該複數情境影像包含不同類型情境狀態之影像。FIG. 3 is a flow chart of an embodiment of the music recommendation method based on image recognition and situational status of the present invention, please refer to FIG. 3 . The music recommendation method based on image recognition and situation status of the present invention is applied to the music recommendation system based on image recognition and situation status, and the method includes: an audio-visual data archiving step (S0), an audio-visual training step (S1), A preference scoring step (S2), an image recognition and classification step (S3), a music genre screening step (S4), a feature value comparison and music recommendation step (S5). The audio-visual data archiving step (S0) stores the plural training repertoires and the plural situational images in a training music database module and a situational image database module respectively, wherein the plural training repertoires are classified according to music style attributes A plurality of music styles, each type of music style stores a plurality of tracks, and the plurality of situational images include images of different types of situational states.

在本實施例中，該複數情境影像依據氣候與路況狀態分類成一白天-晴天-順暢情境狀態、一白天-晴天-雍塞情境狀態、一白天-雨天-順暢情境狀態、一白天-雨天-雍塞情境狀態、一夜晚-夜晴-順暢情境狀態、一夜晚-夜晴-雍塞情境狀態、一夜晚-雨天-順暢情境狀態及一夜晚-雨天-雍塞情境狀態，共八種情境影像。In this embodiment, the plurality of situational images are classified into one daytime-sunny-smooth situational state, one daytime-sunny day-Yongsai situational state, one daytime-rainy day-smooth situational state, and one daytime-rainy day-smooth situational state according to weather and road conditions. Situational status of Sai, one night-night-clear-smooth situational state, one night-night-clear-Yongsai situational state, one night-rainy day-smooth situational state and one night-rainy day-Yongsai situational state, a total of eight situational images.

該影音訓練步驟(S1)提供一影音訓練模組供使用者操作，該影音訓練模組並分別從該訓練音樂資料庫模組及該情境影像資料庫模組中，提取訓練曲目與情境影像，並由提供使用者點選情境影像及訓練曲目同時進行播放，讓使用者觀看及聆聽執行訓練，其中，每一情境影像畫面播放時，同時依序搭配該複數訓練曲目中的每一曲目。該影音訓練步驟(S1)提供使用者針對每一種情境影像挑選搭配訓練曲目，當然，使用者也可以直接一開始選取喜愛的曲風進行訓練，’例如:若使用者喜歡鄉村音樂、古典音樂及爵士音樂，則使用者可以從這三類曲風中分別挑選一些訓練曲目進行視聽的訓練，直接避開不喜歡的音樂曲風，可以提高後續音樂推薦的效率。The audio-visual training step (S1) provides an audio-visual training module for the user to operate, and the audio-visual training module extracts training tracks and situational images from the training music database module and the situational image database module respectively, And by providing the user to click on the scene image and the training track to play at the same time, so that the user can watch and listen to the training, wherein, when each scene image is played, each track in the plurality of training tracks is matched in sequence at the same time. The audio-visual training step (S1) provides the user with a choice of matching training tracks for each situational image. Of course, the user can also directly select a favorite style of music for training at the beginning. For example: if the user likes country music, classical music and For jazz music, users can select some training tracks from these three genres for audio-visual training, directly avoiding disliked music genres, which can improve the efficiency of subsequent music recommendations.

該喜好評分步驟(S2)係為當使用者在該影音訓練步驟中完成每一次影音訓練後，提供一評分紀錄模組供使用者對於當次之情境影像畫面與對應之訓練曲目之喜好感受進行評分，並將該使用者資訊與其評分紀錄儲存至之一使用者資料庫備用，其中，使用者可以設定評分標準門檻。使用者針對每一情境影像所搭配的每一首音樂給予評分，例如1~5分，分數越高表示越喜歡，作為後續音樂推薦的參考依據，例如，使用者對於該白天-雨天-雍塞情境影像及其所搭配的一首鄉村音樂曲風的訓練曲目給與2分評價，則表示當使用者在白天-雨天-雍塞的實際狀況下不太喜歡該首訓練曲目，因此，與該首訓練曲目有類似的音樂特徵之樂曲被推薦的機率就會很低。The preference scoring step (S2) is to provide a scoring record module for the user to perform on the preferences of the current situational image screen and the corresponding training track after the user completes each audio-visual training in the audio-visual training step. Scoring, and storing the user information and scoring records in a user database for backup, where the user can set the scoring standard threshold. The user gives a score for each piece of music that is matched with each scene image, such as 1 to 5 points. The higher the score, the more he likes it, which is used as a reference for subsequent music recommendations. Situational images and a training track of a country music style with which they are matched give a 2-point evaluation, which means that when the user does not like the first training track under the actual conditions of daytime-rainy day-Yongsai, therefore, the same as the training track If the first training track has similar musical characteristics, the probability of being recommended will be very low.

較佳地，該喜好評分步驟(S2)提供使用者可以設定評分標準門檻，例如，推薦音樂評分標準門檻值設為4分，則評分4分以上之訓練曲目，才列入推薦音樂參考，評分為1~3分則只列入評分紀錄，不列入推薦音樂參考依據。Preferably, the preference scoring step (S2) provides users with the ability to set the scoring standard threshold, for example, if the recommended music scoring standard threshold is set to 4 points, then the training tracks with a score of 4 points or more are included in the recommended music reference, scoring If the score is 1~3, it will only be included in the score record, and will not be included in the recommended music reference.

該影像辨識及分類步驟(S3)透過一影像辨識模組針對使用者當下駕駛時之氣候與路況特徵進行情境狀態辨識分類，並將該氣候與路況特徵分類成該複數情境影像之其中一種。當使用者在開車當下，該影像辨識及分類步驟(S3) 透過該影像辨識模組擷取當下影像圖片，例如透過車上之行車記錄器之攝像鏡頭來擷取，並依據圖片中光線的亮度、晴雨狀況及與前方車輛的距離及前方車輛數量等，來判斷目前的情境狀態。該影像辨識及分類步驟(S3)係利用神經網路模型作為影像辨識之方法，為避免實際執行時的影像辨識結果錯誤率過高，該影像辨識及分類步驟(S3)使用ResNet50架構進行訓練來提升辨識準確率。The image recognition and classification step (S3) uses an image recognition module to recognize and classify the climate and road condition characteristics of the user while driving, and classify the climate and road condition characteristics into one of the plurality of situation images. When the user is driving, the image recognition and classification step (S3) captures the current image picture through the image recognition module, for example, through the camera lens of the driving recorder on the car, and according to the brightness of the light in the picture , Rain or shine conditions, the distance to the vehicle in front and the number of vehicles in front, etc., to judge the current situation status. The image recognition and classification step (S3) uses a neural network model as a method for image recognition. In order to avoid an excessively high error rate of image recognition results during actual execution, the image recognition and classification step (S3) uses a ResNet50 framework for training. Improve recognition accuracy.

ResNet50為一種深度學習的架構，為避免產生過度擬合與訓練退化的問題，透過ResNet50的殘差模組，隨著訓練次數提高，可以訓練更深層的神經網絡，也同時提高辨識準確度達85%以上。在本實施例中，透過蒐集6,000張白天與夜晚的情境影像圖片作為訓練資料， 20,000張晴天與雨天的情境影像圖片作為訓練資料，以及9,000張雍塞與順暢的情境影像圖片作為訓練資料，包含：白天-晴天-順暢情境影像圖片、白天-晴天-雍塞情境影像圖片、白天-雨天-順暢情境影像圖片、白天-雨天-雍塞情境影像圖片、夜晚-夜晴-順暢情境影像圖片、夜晚-夜晴-雍塞情境影像圖片、夜晚-雨天-順暢情境影像圖片及夜晚-雨天-雍塞情境影像圖片，共八大類氣候與路況之情境狀態之圖片作為深度學習的資料。ResNet50 is a deep learning architecture. In order to avoid the problems of overfitting and training degradation, through the residual module of ResNet50, as the number of training increases, a deeper neural network can be trained, and the recognition accuracy can also be improved by 85%. %above. In this embodiment, by collecting 6,000 daytime and night situation image pictures as training data, 20,000 sunny and rainy situation image pictures as training data, and 9,000 Yongsai and smooth situation image pictures as training data, including : Daytime-Sunny-Smooth Situation Image, Day-Sunny-Yongsai Situational Image, Day-Rainy-Smooth Situational Image, Day-Rainy-Yongsai Situational Image, Night-Night Clear-Smooth Situational Image, Night -Sunny night-Yongsai situational image picture, night-rainy day-smooth situational image picture and night-rainy day-Yongsai situational image picture, a total of eight categories of pictures of the situational state of climate and road conditions are used as data for deep learning.

請續參考圖3。較佳地，在本發明的一些實施例中，該影像辨識及分類步驟(S3)另包含一影像資料增強步驟強化影像資料，增加情境狀態辨識的準確度，該影像資料增強步驟使用兩種資料增強技術，第一種為限制對比度自適應直方圖均衡化(Contrast Limited Adaptive Histogram Equalization，簡稱CLAHE)，當影像圖片的前景或背景過亮、過暗時，該方法可以讓亮度更好的在影像上分布，可以避免晴天多雲時的天空，容易被誤判成雨天的狀況，此方法適用在晴天與雨天之判斷辨識。在本實施例中，將原本蒐集的影像資料進行增強，使原始影像6,000張加上增強後的影像6,000張，共12,000張以上影像資料進行深度學習訓練，可提升準確度並維持在85%以上。Please continue to refer to FIG. 3 . Preferably, in some embodiments of the present invention, the image recognition and classification step (S3) further includes an image data enhancement step to enhance the image data to increase the accuracy of situational state recognition, and the image data enhancement step uses two kinds of data Enhancement technology, the first is Contrast Limited Adaptive Histogram Equalization (CLAHE for short), when the foreground or background of the image is too bright or too dark, this method can make the brightness better in the image The upper distribution can avoid the situation that when the sky is sunny and cloudy, it is easy to be misjudged as a rainy day. This method is suitable for the judgment and identification of sunny days and rainy days. In this embodiment, the originally collected image data is enhanced, so that 6,000 original images plus 6,000 enhanced images, a total of more than 12,000 image data are used for deep learning training, which can improve the accuracy and maintain it at more than 85%. .

續上所述，第二種資料增強技術為影像放大方式，為了避免深度學習模型在某些情境會有學習上的不足或是資料上的不足，故使用此方法來增加學習資料，以增加辨識的準確率，此方法適用於路況的順暢及壅塞的辨識判斷。在本實施例中，由於此狀態位於深度學習模型樹狀圖的最下層，也就代表需要符合的情境條件越多，在蒐集資料上也會更難找，所以將原本蒐集的影像資料，分別為晴天順暢及晴天壅塞之情境、雨天順暢及雨天壅塞之情境、夜晴順暢及夜晴壅塞之情境以及夜雨順暢及夜雨壅塞之情境等情境狀態進行資料增強後，以上四種情境狀態從原始影像6,000張再加上增強影像6,000張，每種情境狀態會有12,000張的影像圖片，準確度會維持在85%以上Continuing from the above, the second data enhancement technology is the image enlargement method. In order to avoid the insufficiency of the deep learning model in some situations or the lack of data, this method is used to increase the learning data to increase the recognition This method is suitable for the identification and judgment of smooth road conditions and congestion. In this embodiment, since this state is located at the bottom of the tree diagram of the deep learning model, it means that the more situational conditions need to be met, the more difficult it is to find the collected data. Therefore, the originally collected image data are divided into After data augmentation for the situational states of sunny and congested sunny days, smooth rainy days and congested rainy days, smooth nights and congested rainy nights, and smooth rainy nights and congested night rains, the above four situational states changed from 6,000 original images plus 6,000 enhanced images, there will be 12,000 images for each situation, and the accuracy will be maintained at more than 85%

該音樂曲風篩選步驟(S4)依據該影像辨識及分類步驟(S3)中所判斷出之情境影像狀態，從該使用者資料庫中找出該使用者資訊與其評分紀錄，並透過所設定評分標準門檻從該複數訓練曲目中找出對應之曲目，產生一篩選音樂曲目資料表。例如，該影像辨識及分類步驟(S3)辨識當下情境為白天-雨天-順暢情境，則從該使用者資料庫中找出當初該使用者對該白天-雨天-順暢情境影像所搭配之各訓練曲目之評分，並將滿足評分標準門檻之訓練曲目挑選之放入該篩選音樂曲目資料表，作為後續推薦音樂收尋比對之用，例如將評分4以上之訓練曲目篩選出列入。The music style screening step (S4) finds out the user information and its rating records from the user database according to the situational image status judged in the image recognition and classification step (S3), and uses the set rating The standard threshold finds the corresponding track from the plurality of training tracks to generate a filtered music track data table. For example, the image recognition and classification step (S3) recognizes that the current situation is a daytime-rainy day-smooth situation, and then find out from the user database the trainings that the user used to match the daytime-rainy-smooth situation image. The score of the track, and select the training track that meets the threshold of the scoring standard and put it into the screened music track data table for subsequent comparison of recommended music collection. For example, the training track with a score of 4 or above is screened out and included.

該特徵值比對及音樂推薦步驟(S5)依序分析該篩選音樂曲目資料表中訓練曲目之音樂特徵，並搜尋一音樂檔案資料庫，從該音樂檔案資料庫中找出與該篩選音樂曲目資料表中訓練曲目對應之音樂特徵相似的音樂曲目作為推薦音樂曲目，並將該推薦音樂曲目作為播放清單以供使用者播放聆聽。在本實施例中，音樂特徵包含如旋律(melody)、節拍(rhythm)及和弦(chord)等，在本發明中，音樂特徵擷取方法採用梅爾倒頻譜係數(MFCC)。梅爾倒頻譜係數包含兩大重要子步驟，分別為梅爾頻率分析與倒譜分析，透過音樂片段產生的音頻信號，其頻率、幅度和相位決定音頻信號的特徵，依據該篩選音樂曲目資料表之訓練曲目之音樂特徵，從該音樂檔案資料庫中找出音樂特徵相符合之音樂曲目作為推薦音樂曲目，並將其整理成播放清單供使用者播放聆聽。當然，該推薦音樂曲目的數量及選取方式可以依據使用者設定，例如每次隨機挑選30首符合之推薦音樂曲目作為播放清單。The feature value comparison and music recommendation step (S5) sequentially analyzes the music features of the training tracks in the screened music track data table, and searches a music file database, and finds out the music track corresponding to the screened music track from the music file database. The music tracks corresponding to the training tracks in the data table with similar music characteristics are used as recommended music tracks, and the recommended music tracks are used as a playlist for the user to play and listen to. In this embodiment, the music features include melody, rhythm, and chord, etc. In the present invention, the music feature extraction method uses Mel Cepstral Coefficients (MFCC). Mel cepstrum coefficients include two important sub-steps, namely Mel frequency analysis and cepstrum analysis. The frequency, amplitude and phase of the audio signal generated through the music clip determine the characteristics of the audio signal. According to the music track data table According to the music features of the training tracks, the music tracks matching the music features are found from the music file database as the recommended music tracks, and they are organized into a playlist for the user to play and listen to. Of course, the number and selection method of the recommended music tracks can be set according to user settings, for example, 30 matching recommended music tracks are randomly selected each time as a playlist.

較佳地，在本發明的一些實施例中，該影像辨識及分類步驟另包含一車速偵測步驟偵測駕駛時當下車速狀態，輔助情境狀態辨識分類的準確度。為了增加道路狀況的順暢或雍塞的辨識準確性，可以搭配該車速偵測步驟偵測紀錄車速狀態來提高辨識精準度，尤其當前方畫面在順暢或雍塞的辨識上有難度的情況下，例如視線畫面不清楚或是道路施工所造成的壅塞情況，而非一般車流量多壅擠時的雍塞狀況。舉例來說，若短時間內車速維持在一定速度以下，例如十分鐘內車速持續維持在40公里/小時以下，或是車速在十分鐘內由低速轉高速又由高速轉低速達5次以上，且車速差距10~30公里/小時，表示車輛處於走走停停狀況，以上都是屬塞車的情況。Preferably, in some embodiments of the present invention, the image recognition and classification step further includes a vehicle speed detection step to detect the current vehicle speed state while driving, so as to assist the accuracy of situational state recognition and classification. In order to increase the smoothness of the road conditions or the recognition accuracy of Yongsai, the vehicle speed detection step can be used to detect and record the speed status of the vehicle to improve the recognition accuracy, especially when the front screen is smooth or the recognition of Yongsai is difficult, For example, the line of sight is unclear or the congestion caused by road construction is not the normal congestion situation when there is a lot of traffic. For example, if the vehicle speed remains below a certain speed within a short period of time, for example, the vehicle speed remains below 40 km/h within ten minutes, or the vehicle speed changes from low speed to high speed and then from high speed to low speed for more than 5 times within ten minutes, And if the speed difference is 10-30 km/h, it means that the vehicle is in a stop-and-go situation, and all of the above are traffic jams.

圖4為本發明之基於影像辨識及情境狀態之音樂推薦方法之另一實施例流程圖，請參考圖4。較佳地，在本發明的一些實施例中，本發明之基於影像辨識及情境狀態之音樂推薦方法另包含一喜好樂曲記錄步驟(S6)，提供使用者紀錄先前聽過覺得喜好之樂曲，供後續在相同情境狀態下可以再次聆聽。由於該音樂檔案資料庫音樂曲目眾多，當被推薦到很喜歡的音樂時，可以透過該喜好樂曲記錄步驟(S6)將喜愛的曲目記錄儲存之，可供日後當遇到同樣情境狀態時直接列入推薦音樂曲目播放清單，避免被清除或是從新再推薦而降低音樂推薦過程的效率。FIG. 4 is a flow chart of another embodiment of the music recommendation method based on image recognition and situational status of the present invention, please refer to FIG. 4 . Preferably, in some embodiments of the present invention, the music recommendation method based on image recognition and situational status of the present invention further includes a favorite music recording step (S6), which provides the user with a record of music that he has heard before and thinks he likes it, for Subsequent listening can be done again in the same situational state. Due to the large number of music tracks in the music file database, when the favorite music is recommended, the favorite track record can be stored through the favorite music recording step (S6), which can be directly listed when encountering the same situation in the future Enter the recommended music track playlist to avoid being cleared or re-recommended to reduce the efficiency of the music recommendation process.

圖5為本發明之基於影像辨識及情境狀態之音樂推薦方法之另一實施例流程圖，請參考圖5。較佳地，在本發明的一些實施例中，本發明之基於影像辨識及情境狀態之音樂推薦方法另包含一推薦曲目評分步驟(S7)，提供使用者針對推薦曲目進行評分，優化日後推薦曲目的精準度。在該特徵值比對及音樂推薦步驟(S5)中所推薦之曲目，由於是透過深度學習模型所訓練由軟體系統自動選出，在駕駛人主觀的情形下可能不被該駕駛人喜歡，因此，為了避免或減少此類狀況在日後推薦音樂時持續發生，該推薦曲目評分步驟(S7)提供使用者對所聽到的推薦音樂曲目進行再評分的程序。FIG. 5 is a flow chart of another embodiment of the music recommendation method based on image recognition and situational status of the present invention, please refer to FIG. 5 . Preferably, in some embodiments of the present invention, the music recommendation method based on image recognition and situational status of the present invention further includes a recommended song scoring step (S7), providing users with ratings for recommended songs, and optimizing future recommended songs the accuracy. The songs recommended in the feature value comparison and music recommendation step (S5) are automatically selected by the software system through the training of the deep learning model, and may not be liked by the driver in the subjective situation of the driver. Therefore, In order to avoid or reduce the continuous occurrence of this kind of situation when recommending music in the future, the recommended track scoring step (S7) provides a procedure for the user to re-score the recommended music track heard.

舉例來說，某一首推薦音樂曲目被駕駛員評定為2分，且推薦音樂參考門檻值設為4分，則表示此首音樂及其曲風雖然符合最初推薦標準，但是音樂的喜好確實是有主觀性的。因此，此首推薦曲目或其類似之音樂特徵之音樂曲目將不會再被列入推薦音樂曲目播放清單，可以增加該音樂推薦模組(80)在搜尋推薦曲目時的效率。For example, if a certain recommended music track is rated as 2 points by the driver, and the recommended music reference threshold is set to 4 points, it means that although the music and its genre meet the initial recommendation standard, the music preference is indeed Subjective. Therefore, this recommended track or music tracks with similar music characteristics will no longer be included in the recommended music track playlist, which can increase the efficiency of the music recommendation module (80) when searching for recommended tracks.

綜上所述，藉由本發明之基於影像辨識及情境狀態之音樂推薦系統與方法，可以透過辨識駕駛人於駕車時當下的氣候狀況及車輛前方的道路順暢度來作為情境狀況判斷，並藉由事先的影音訓練的結果，將兩者整合再透過音樂特徵分析後，自動搜尋適合駕駛人當下情境聆聽的音樂曲目，提供駕駛人一個愉快的駕車心情或是緩和駕車時的負面情緒。To sum up, with the music recommendation system and method based on image recognition and situational status of the present invention, the situational status can be judged by identifying the driver's current weather conditions and the smoothness of the road ahead of the vehicle, and by As a result of prior audio-visual training, the two are integrated and then analyzed through music features to automatically search for music tracks suitable for the driver's current listening situation, providing the driver with a pleasant driving mood or alleviating negative emotions while driving.

以上所述之實施例僅係為說明本發明之技術思想及特徵，其目的在使熟習此項技藝之人士均能了解本發明之內容並據以實施，當不能以此限定本發明之專利範圍，凡依本發明之精神及說明書內容所作之均等變化或修飾，皆應涵蓋於本發明專利範圍內。The above-mentioned embodiments are only to illustrate the technical ideas and characteristics of the present invention, and its purpose is to enable those who are familiar with this art to understand the content of the present invention and implement it accordingly, and should not limit the patent scope of the present invention. , all equivalent changes or modifications made in accordance with the spirit of the present invention and the content of the description shall be covered within the patent scope of the present invention.

10:訓練音樂資料庫模組 20:情境影像資料庫模組 30:影音訓練模組 40:評分紀錄模組 50:雲端資料庫模組 51:使用者資料庫 52:音樂檔案資料庫 60:影像辨識模組 61:影像深度學習單元 62:影像資料增強單元 63:車速偵測單元 70:整合分析模組 80:音樂推薦模組 81:喜好樂曲儲存單元 82:推薦曲目評分單元 S0:影音資料建檔步驟 S1:影音訓練步驟 S2:喜好評分步驟 S3:影像辨識及分類步驟 S4:音樂曲風篩選步驟 S5:特徵值比對及音樂推薦步驟 S6:喜好樂曲記錄步驟 S7:推薦曲目評分步驟 10: Training music database module 20: Situation image database module 30: Video training module 40: Scoring record module 50:Cloud database module 51: User database 52: Music file database 60: Image recognition module 61: Image deep learning unit 62: Image data enhancement unit 63:Vehicle speed detection unit 70:Integrated analysis module 80:Music recommendation module 81: favorite music storage unit 82:Recommended track scoring unit S0: Audio-visual data archiving steps S1: Audio-visual training steps S2: Like scoring step S3: Image recognition and classification steps S4: music genre screening steps S5: Feature value comparison and music recommendation steps S6: Steps for recording favorite music S7: Recommended track scoring steps

圖1為本發明之基於影像辨識及情境狀態之音樂推薦系統之一實施例架構圖；圖2為本發明之基於影像辨識及情境狀態之音樂推薦系統之另一實施例架構圖；圖3為本發明之基於影像辨識及情境狀態之音樂推薦方法之一實施例流程圖；圖4為本發明之基於影像辨識及情境狀態之音樂推薦方法之另一實施例流程圖；圖5為本發明之基於影像辨識及情境狀態之音樂推薦方法之另一實施例流程圖。 Fig. 1 is the architecture diagram of an embodiment of the music recommendation system based on image recognition and situation status of the present invention; FIG. 2 is a structural diagram of another embodiment of the music recommendation system based on image recognition and situational status of the present invention; 3 is a flow chart of an embodiment of the music recommendation method based on image recognition and situational status of the present invention; 4 is a flow chart of another embodiment of the music recommendation method based on image recognition and situational status of the present invention; FIG. 5 is a flow chart of another embodiment of the music recommendation method based on image recognition and situation status of the present invention.

10:訓練音樂資料庫模組 10: Training music database module

20:情境影像資料庫模組 20: Situation image database module

30:影音訓練模組 30: Video training module

40:評分紀錄模組 40: Scoring record module

50:雲端資料庫模組 50:Cloud database module

60:影像辨識模組 60: Image recognition module

70:整合分析模組 70:Integrated analysis module

80:音樂推薦模組 80:Music recommendation module

Claims

A music recommendation system based on image recognition and situational status, including: A training music database module (10), storing plural training tracks classified according to music types; A situational image database module (20), storing multiple situational images classified according to situational types; An audio-visual training module (30), connected to the training music database module (10) and the situational image database module (20), accessing the training tracks played in the training music database module (10) And the situational images in the situational image database module (20), wherein each situational image corresponds to at least one training track to be played simultaneously; A score recording module (40), connected to the audio-visual training module (30), provides the user with scoring and recording preferences based on the training repertoire and the situational image played by the music video training module (30); A cloud database module (50), connected to the scoring record module (40), the cloud database module (50) includes a user database (51) and a music file database (52), wherein, The user database (51) stores user data and the user rating data, and the music file database (52) stores music tracks of multiple types of music styles; An image recognition module (60), which performs situational state recognition and classification according to the climate and road condition characteristics of the user while driving, and the image recognition module (60) includes an image deep learning unit (61) to improve the recognition accuracy; An integrated analysis module (70), connected to the cloud database module (50) and the image recognition module (60), based on the classification result of the situation status currently recognized by the image recognition module (60) and the cloud data The user database (51) of the library module (50) is analyzed and compared, and a screened music track data table is generated; and A music recommendation module (80), which connects the cloud database module (50) and the integrated analysis module (70), and filters the training tracks of the music track data table with the cloud database module (50) The music file database (52) performs music feature search and comparison, and generates a playlist of recommended music tracks suitable for the user to listen to in a situational state while driving from the music file database (52) for playing and listening.

The music recommendation system based on image recognition and situational status as described in Claim 1, wherein the image recognition module (60) further includes an image data enhancement unit (62), which has data enhancement technology including limited contrast adaptive histogram Equalization technology or image enlargement technology.

The music recommendation system based on image recognition and situation status as described in claim 1, wherein the image recognition module (60) further includes a vehicle speed detection unit (63) to detect and record the vehicle speed status.

The music recommendation system based on image recognition and situation status as described in claim 1, wherein the music recommendation module (80) further has a favorite music storage unit (81).

The music recommendation system based on image recognition and situation status as described in Claim 1, wherein the music recommendation module (80) further has a recommended track scoring unit (82).

A music recommendation method based on image recognition and situational status, comprising: An audio-visual data archiving step (S0), storing the plurality of training tracks and the plurality of situational images in a training music database module and a situational image database module respectively, wherein the plurality of training tracks are classified into a plurality of kinds of music genre; An audio-visual training step (S1), providing an audio-visual training module for the user to operate, and the audio-visual training module extracts training tracks and situational images from the training music database module and the situational image database module respectively , and provide the user to click on the situational image and the training track to play at the same time, so that the user can watch and listen to the training, wherein, when each situational image is played, at least one of the plurality of training tracks is matched at the same time; A preference scoring step (S2), when the user completes each audio-visual training in the audio-visual training step, a scoring record module is provided for the user to rate the current situational video screen and the corresponding training repertoire , and store the user information and its scoring records in a user database for backup, where the user can set the scoring standard threshold; An image recognition and classification step (S3), using an image recognition module to recognize and classify the climate and road condition characteristics of the user while driving, and classify the climate and road condition characteristics into one of the plurality of situation images; A music genre screening step (S4), according to the situational image status judged in the image recognition and classification step, find out the user information and its scoring records from the user database, and pass the set scoring standard threshold Find the corresponding track from the plurality of training tracks to generate a filtered music track data table; and A feature value comparison and music recommendation step (S5), sequentially analyzing the music features of the training tracks in the screened music track data table, and searching for a music file database, and finding the music that matches the screened music from the music file database Music tracks with similar music features corresponding to the training tracks in the track data table are used as recommended music tracks, and the recommended music tracks are used as playlists for users to play and listen to.

The music recommendation method based on image recognition and situational state as described in claim 6, wherein the image recognition and classification step (S3) further includes an image data enhancement step, which strengthens the image data to increase the accuracy of situational state recognition.

The music recommendation method based on image recognition and situational state as described in claim 6, wherein the image recognition and classification step (S3) further includes a vehicle speed detection step, which detects the current speed state of the vehicle while driving, and assists the situational state recognition and classification the accuracy.

The music recommendation method based on image recognition and situational state as described in claim 6, which further includes a favorite music recording step (S6), providing the user to record the favorite music that the user has heard before, for subsequent use in the same situational state Can listen again.

The music recommendation method based on image recognition and situational status as described in claim 6, further comprising a recommended track scoring step (S7), which provides users with ratings for recommended tracks to optimize the accuracy of future recommended tracks.