CN113076770A - Intelligent figure portrait terminal based on dialect recognition - Google Patents

Intelligent figure portrait terminal based on dialect recognition Download PDF

Info

Publication number
CN113076770A
CN113076770A CN201911300189.5A CN201911300189A CN113076770A CN 113076770 A CN113076770 A CN 113076770A CN 201911300189 A CN201911300189 A CN 201911300189A CN 113076770 A CN113076770 A CN 113076770A
Authority
CN
China
Prior art keywords
dialect
prisoner
prisoners
intelligent terminal
prison
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911300189.5A
Other languages
Chinese (zh)
Inventor
李国栋
邬玉香
李兴华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Jieshigao Information Technology Co ltd
Original Assignee
Guangzhou Jieshigao Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Jieshigao Information Technology Co ltd filed Critical Guangzhou Jieshigao Information Technology Co ltd
Priority to CN201911300189.5A priority Critical patent/CN113076770A/en
Publication of CN113076770A publication Critical patent/CN113076770A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Animal Behavior & Ethology (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of artificial intelligence, and provides a portrait intelligent terminal based on dialect recognition. The method comprises the following steps: step (1) inputting keywords of different dialects; (2) preprocessing dialect keywords; (3) carrying out feature extraction; (4) training dialect samples and testing the samples; (5) forming a dialect library; (6) inputting an audio file of a prisoner, matching the audio file with data of a dialect library, and outputting a result; (7) inputting a video file of a prisoner; (8) performing facial emotion capture; (9) matching data in a tag library; (10) combining the audio output file of the prisoner with the data of the tag library; (11) outputting the portrait of the prisoner user. The invention can not understand the problem of various dialects, provides an intelligent correction scheme for prisoners based on figure portraits, realizes 'one person one strategy' and solves the problem of insufficient police strength.

Description

Intelligent figure portrait terminal based on dialect recognition
Technical Field
The invention belongs to the field of artificial intelligence and provides a portrait intelligent terminal based on dialect recognition.
Background
The main application fields of the artificial intelligence enterprises mainly focus on the fields of finance, e-commerce, security protection, education and the like. The method is applied to the field of prisons for the first time, and comprises a computer vision technology, a voice recognition technology, a deep learning search algorithm and a recommendation algorithm, character image recognition, fingerprint recognition and the like, so that basic information management of prisoners in prisons, monitoring and analysis of language communication (including dialects) of the prisoners and prisoners, intelligent management of daily life of the prisoners and a correction and transformation policy of one person for the prisoners are realized. The system can save manpower and material resources in prison management, fills up a gap in prison informatization work, and has great significance for promoting prison supervision and prisoner transformation and correction in China and deeply developing prison informatization construction.
Disclosure of Invention
The purpose of the invention is: the invention provides a person portrait intelligent terminal based on dialect recognition, which can solve the problem that a plurality of dialects cannot be understood, and provides an intelligent correction scheme for criminals based on person portraits, thereby realizing 'one person and one policy' and solving the problem of insufficient police strength.
In order to achieve the technical purpose, the technical scheme adopted by the invention is as follows: (1) inputting keywords of different dialects; (2) preprocessing dialect keywords; (3) carrying out feature extraction; (4) training dialect samples and testing the samples; (5) forming a dialect library; (6) inputting an audio file of a prisoner, matching the audio file with data of a dialect library, and outputting a result; (7) inputting a video file of a prisoner; (8) performing facial emotion capture; (9) matching data in a tag library; (10) combining the audio output file of the prisoner with the data of the tag library; (11) outputting the portrait of the prisoner user.
The content also comprises a panel terminal fixed on the wall surface or the desktop for real-time voice-to-text display, and the panel terminal is used for assisting the monitoring work of the prison police like a subtitle and relieving the mental pressure of the prison police in the monitoring process. Meanwhile, special identification can be rapidly carried out on a certain section of speech, such as danger, focus attention, inaudibility and the like, and the monitoring and the verification of a background are facilitated.
The content also comprises a voice recognition technology, a voice recognition technology based on MFCC feature analysis, an HMM-GMM model and a deep neural network, and the voice recognition technology is used for the voice recognition of multiple dialects used by a prisoner. Sensitive keywords are screened out through edge calculation and stored in a cloud server, the problem that prison managers cannot understand dialects is solved, meanwhile, psychological emotion analysis can be conducted according to the sensitive keywords in words of prisoners, figures of people are analyzed, and a knowledge map of the prisoners is built.
The content also comprises dialects used for identifying all local dialects used by prisoners in real time in the conversation monitoring process, early warning and labeling are carried out by automatically identifying sensitive keywords, emotion labels are generated, and a user portrait knowledge graph is established.
The content also comprises a text generated by performing voice recognition on audio or video which is met by a prisoner through voice recognition; the jail map visualization and prisoner list are realized.
The content also comprises the steps of separating the content of the talker through role analysis and generating a conversation text and a waveform chart, so as to realize sound-text synchronization, freely selecting roles, playing voice segments and the like.
The content also comprises a video file stored in the remote meeting, wherein the audio can be extracted firstly, and then the text file is stored by performing voice recognition.
The content also comprises video education mainly related to prison correction and prison management in video conferences, and advanced prison correction ideas can be conveyed to prisons and cadres. Some video contents are released to the prisoner, so that the prisoner can understand the correction concept of the prisoner, and the prison management work is matched. For a conference video, a conference summary in a given format can be generated through keyword recognition.
The content also comprises the steps of realizing keyword recognition, short string semantic association, semantic indexing and context semantic understanding through a text processing technology and professional field vocabulary weight distribution.
The content also comprises the following steps of constructing a knowledge graph and analyzing the relationship of people: drawing figures and figure relation networks of prisoners on the basis of historical meeting data and basic information data of prisoners, and correcting the guideline knowledge maps of the prisoners.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of the principle of the present invention.
Detailed Description
The speech recognition technology is based on MFCC feature analysis, HMM-GMM model and deep neural network speech recognition technology and is used for speech recognition of multiple dialects used by prisoners. Sensitive keywords are screened out through edge calculation and stored in a cloud server, the problem that prison managers cannot understand dialects is solved, meanwhile, psychological emotion analysis can be conducted according to the sensitive keywords in words of prisoners, figures of people are analyzed, and a knowledge map of the prisoners is built.
Based on a search algorithm and a recommendation algorithm of deep learning, when a criminal and an intelligent correction robot perform question-answer interaction, the intelligent correction robot automatically extracts and matches portrait of a user of the criminal (including voice recording, psychological emotion analysis recording, question-answer recording and the like), makes specific information feedback for the criminal, and recommends an optimal answer.
And the face detection algorithm based on deep learning performs optimized analysis on the complexity and unstructured features of the face and the emotion in the project. The method is combined with automatic analysis based on deep learning and visual analysis methods to realize semi-automatic emotion analysis on characters, and a more optimized emotion analysis result is achieved. Specifically, a video shot by a serviceman is firstly divided into a plurality of frames of images, and each frame of images is analyzed by using the technologies of face detection and person identification so as to analyze the facial expression of a person. In the description of emotions for the average person, we apply 7 basic types: angry, surprise, happiness, neutrality, sadness, aversion and fear. And a specific optimization method is used for refining the basic emotion classification, so that the emotion analysis accuracy of the prisoner is improved. The project collects video data shot in the prison, carries out actual measurement on a face detection technology, and carries out targeted optimization. In order to better understand the prisoner, the following characteristics are added in the process of identifying the person, including: facial markers, frontal side faces, degree of occlusion, etc. In order to improve the accuracy of emotion analysis, the project also acquires the data of the published ted talks and other published data sets which can be applied to emotion analysis.
And the voice synthesis technology is used for synthesizing the text content into voice through the voice synthesis technology and playing a role in man-machine interaction.
And the signal processing technology is used for carrying out mute cutting on the head end and the tail end of the audio file, so that the interference is reduced. Meanwhile, fuzzy sound segments (namely, the audio frequency is low and is difficult to clearly listen) can be marked, and specific audio can be listened intensively and repeatedly, so that important conversation contents are prevented from being omitted. Meanwhile, an audio waveform diagram is generated through audio, so that sound and text synchronization is realized, and the user experience and the data display effect are improved.
And the semantic analysis technology realizes the short string semantic association, semantic indexing and context semantic understanding through a text processing technology and professional field vocabulary weight distribution. On the basis, the relation between the knowledge graph and the person can be constructed.
The method is used for identifying dialects in various places used by prisoners in real time in a conversation monitoring process, carrying out early warning and labeling by automatically identifying sensitive keywords, generating emotion labels and establishing a user portrait knowledge map.
Carrying out voice recognition on audio or video met by a prisoner through voice recognition to generate a text; the jail map visualization and prisoner list are realized.
The content of the talker is separated through role analysis and a dialogue text and a oscillogram are generated, so that the sound and text synchronization, the free role selection, the voice segment playing and the like are realized. The video file stored in the remote meeting can extract the audio firstly and then carry out voice recognition to store the text file
For the video conference, the video education mainly relates to prison correction and prison management, and an advanced prison correction concept can be transmitted to prisons and cadres. Some video contents are released to the prisoner, so that the prisoner can understand the correction concept of the prisoner, and the prison management work is matched. For a conference video, a conference summary in a given format can be generated through keyword recognition.
The voice question answering method is used for carrying out intelligent voice question answering with prisoners in the education and correction process, pushing one person and one strategy of transformation scheme contents in real time by deep learning and user portrait and utilizing an optimal recommendation algorithm, carrying out scene education and transformation with excellent voice and literature, improving the transformation effect and remarkably reducing the management cost. Semantic analysis: by means of text processing technology and professional field vocabulary weight distribution, keyword recognition, short string semantic association, semantic indexing and context semantic understanding are achieved.
Constructing a knowledge graph and analyzing the relationship of people: drawing figures and figure relation networks of prisoners on the basis of historical meeting data and basic information data of prisoners, and correcting the guideline knowledge maps of the prisoners.
Emotion analysis based on face detection: emotion analysis mainly studies opinions and emotions of people about something, and a common emotion expression form is to publish positive or negative opinions on the internet, so text emotion analysis is also widely studied and applied. This institute is based on textual and semantic analysis, which is a subjective type of sentiment analysis. The item focuses on objective type emotion analysis and is a beneficial supplement to subjective type emotion analysis. The emotional expression form is the expression of human face, facial expression and movement of four limbs of a character in a video, and when the character has involuntary emotional expression, the joy, anger and sadness of the character can be known more accurately by using an automatic analysis method. This method of automatic analysis does not require the aid of subjective descriptions and can be applied in specific scenarios. For example, in the daily supervision of prisoners, a prisoner does not have too much subjective description of his own feelings. The method improves the accuracy of video emotion analysis by combining a deep learning technology with a video analysis technology, so that emotion is better analyzed, emotion and behavior of a person are analyzed in a correlation mode, and emotion and behavior prediction is carried out.

Claims (10)

1. Personage portrait intelligent terminal based on dialect discernment, its characterized in that: (1) inputting keywords of different dialects; (2) preprocessing dialect keywords; (3) carrying out feature extraction; (4) training dialect samples and testing the samples; (5) forming a dialect library; (6) inputting an audio file of a prisoner, matching the audio file with data of a dialect library, and outputting a result; (7) inputting a video file of a prisoner; (8) performing facial emotion capture; (9) matching data in a tag library; (10) combining the audio output file of the prisoner with the data of the tag library; (11) outputting a portrait of a user of the prisoner; the invention can not understand the problem of various dialects, provides an intelligent correction scheme for prisoners based on figure portraits, realizes 'one person one strategy' and solves the problem of insufficient police strength.
2. The dialect identification-based character image intelligent terminal of claim 1, wherein: a panel terminal is fixed on the wall surface or the desktop, real-time voice-to-text display is carried out, and the monitoring work of the prison is assisted like subtitles, so that the mental stress of the prison in the monitoring process is relieved; meanwhile, special identification can be rapidly carried out on a certain section of speech, such as danger, focus attention, inaudibility and the like, and the monitoring and the verification of a background are facilitated.
3. The dialect identification-based character image intelligent terminal of claim 2, wherein: the speech recognition technology is based on MFCC feature analysis, HMM-GMM model and deep neural network and is used for speech recognition of multiple dialects used by prisoners; sensitive keywords are screened out through edge calculation and stored in a cloud server, the problem that prison managers cannot understand dialects is solved, meanwhile, psychological emotion analysis can be conducted according to the sensitive keywords in words of prisoners, figures of people are analyzed, and a knowledge map of the prisoners is built.
4. The dialect identification-based character image intelligent terminal of claim 3, wherein: the method is used for identifying dialects in various places used by prisoners in real time in a conversation monitoring process, carrying out early warning and labeling by automatically identifying sensitive keywords, generating emotion labels and establishing a user portrait knowledge map.
5. The dialect identification-based character image intelligent terminal of claim 4, wherein: carrying out voice recognition on audio or video met by a prisoner through voice recognition to generate a text; the jail map visualization and prisoner list are realized.
6. The dialect identification-based character image intelligent terminal of claim 5, wherein: the content of the talker is separated through role analysis and a dialogue text and a oscillogram are generated, so that the sound and text synchronization, the free role selection, the voice segment playing and the like are realized.
7. The dialect identification-based character image intelligent terminal of claim 6, wherein: the video file stored in the remote meeting can extract audio firstly, and then perform voice recognition to store the text file.
8. The dialect identification-based character image intelligent terminal of claim 7, wherein: for the video conference, the video education mainly relates to prison correction and prison management, and an advanced prison correction concept can be transmitted to prisons and cadres; some video contents are released to the prisoner, so that the prisoner can understand the prisoner correction concept, and the prison management work is matched; for a conference video, a conference summary in a given format can be generated through keyword recognition.
9. The dialect identification-based character image intelligent terminal of claim 8, wherein: semantic analysis: by means of text processing technology and professional field vocabulary weight distribution, keyword recognition, short string semantic association, semantic indexing and context semantic understanding are achieved.
10. The dialect identification-based character image intelligent terminal of claim 9, wherein: constructing a knowledge graph and analyzing the relationship of people: drawing figures and figure relation networks of prisoners on the basis of historical meeting data and basic information data of prisoners, and correcting the guideline knowledge maps of the prisoners.
CN201911300189.5A 2019-12-18 2019-12-18 Intelligent figure portrait terminal based on dialect recognition Pending CN113076770A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911300189.5A CN113076770A (en) 2019-12-18 2019-12-18 Intelligent figure portrait terminal based on dialect recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911300189.5A CN113076770A (en) 2019-12-18 2019-12-18 Intelligent figure portrait terminal based on dialect recognition

Publications (1)

Publication Number Publication Date
CN113076770A true CN113076770A (en) 2021-07-06

Family

ID=76608259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911300189.5A Pending CN113076770A (en) 2019-12-18 2019-12-18 Intelligent figure portrait terminal based on dialect recognition

Country Status (1)

Country Link
CN (1) CN113076770A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210272225A1 (en) * 2017-04-19 2021-09-02 Global Tel*Link Corporation Mobile correctional facility robots
CN115658933A (en) * 2022-12-28 2023-01-31 四川大学华西医院 Psychological state knowledge base construction method and device, computer equipment and storage medium
CN116884392A (en) * 2023-09-04 2023-10-13 浙江鑫淼通讯有限责任公司 Voice emotion recognition method based on data analysis
US11959733B2 (en) 2017-04-19 2024-04-16 Global Tel*Link Corporation Mobile correctional facility robots

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210272225A1 (en) * 2017-04-19 2021-09-02 Global Tel*Link Corporation Mobile correctional facility robots
US11959733B2 (en) 2017-04-19 2024-04-16 Global Tel*Link Corporation Mobile correctional facility robots
CN115658933A (en) * 2022-12-28 2023-01-31 四川大学华西医院 Psychological state knowledge base construction method and device, computer equipment and storage medium
CN116884392A (en) * 2023-09-04 2023-10-13 浙江鑫淼通讯有限责任公司 Voice emotion recognition method based on data analysis
CN116884392B (en) * 2023-09-04 2023-11-21 浙江鑫淼通讯有限责任公司 Voice emotion recognition method based on data analysis

Similar Documents

Publication Publication Date Title
Wu et al. Audio caption: Listen and tell
CN113076770A (en) Intelligent figure portrait terminal based on dialect recognition
CN107993665B (en) Method for determining role of speaker in multi-person conversation scene, intelligent conference method and system
McKeown et al. The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent
CN110751208A (en) Criminal emotion recognition method for multi-mode feature fusion based on self-weight differential encoder
CN112101045B (en) Multi-mode semantic integrity recognition method and device and electronic equipment
CN111833861A (en) Artificial intelligence based event evaluation report generation
CN113592251B (en) Multi-mode integrated teaching state analysis system
Chakraborty et al. Literature Survey
CN109714608A (en) Video data handling procedure, device, computer equipment and storage medium
CN114495217A (en) Scene analysis method, device and system based on natural language and expression analysis
CN117198338B (en) Interphone voiceprint recognition method and system based on artificial intelligence
Jain et al. Student’s Feedback by emotion and speech recognition through Deep Learning
Mircoli et al. Automatic Emotional Text Annotation Using Facial Expression Analysis.
Jia et al. A deep learning system for sentiment analysis of service calls
CN116883888A (en) Bank counter service problem tracing system and method based on multi-mode feature fusion
US20230095952A1 (en) Automated interview apparatus and method using telecommunication networks
Hussien et al. Multimodal sentiment analysis: a comparison study
CN114492579A (en) Emotion recognition method, camera device, emotion recognition device and storage device
Sun et al. Automatic understanding of affective and social signals by multimodal mimicry recognition
Sánchez-Ancajima et al. Gesture Phase Segmentation Dataset: An Extension for Development of Gesture Analysis Models.
Ramanarayanan et al. An analysis of time-aggregated and time-series features for scoring different aspects of multimodal presentation data
Böck Multimodal automatic user disposition recognition in human-machine interaction
Taralrud et al. Multimodal Sentiment Analysis for Personality Prediction
Wang et al. EmoAsst: emotion recognition assistant via text-guided transfer learning on pre-trained visual and acoustic models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210706

WD01 Invention patent application deemed withdrawn after publication