CN113076770A

CN113076770A - Intelligent figure portrait terminal based on dialect recognition

Info

Publication number: CN113076770A
Application number: CN201911300189.5A
Authority: CN
Inventors: 李国栋; 邬玉香; 李兴华
Original assignee: Guangzhou Jieshigao Information Technology Co ltd
Current assignee: Guangzhou Jieshigao Information Technology Co ltd
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2021-07-06

Abstract

The invention belongs to the field of artificial intelligence, and provides a portrait intelligent terminal based on dialect recognition. The method comprises the following steps: step (1) inputting keywords of different dialects; (2) preprocessing dialect keywords; (3) carrying out feature extraction; (4) training dialect samples and testing the samples; (5) forming a dialect library; (6) inputting an audio file of a prisoner, matching the audio file with data of a dialect library, and outputting a result; (7) inputting a video file of a prisoner; (8) performing facial emotion capture; (9) matching data in a tag library; (10) combining the audio output file of the prisoner with the data of the tag library; (11) outputting the portrait of the prisoner user. The invention can not understand the problem of various dialects, provides an intelligent correction scheme for prisoners based on figure portraits, realizes 'one person one strategy' and solves the problem of insufficient police strength.

Description

Intelligent figure portrait terminal based on dialect recognition

Technical Field

The invention belongs to the field of artificial intelligence and provides a portrait intelligent terminal based on dialect recognition.

Background

The main application fields of the artificial intelligence enterprises mainly focus on the fields of finance, e-commerce, security protection, education and the like. The method is applied to the field of prisons for the first time, and comprises a computer vision technology, a voice recognition technology, a deep learning search algorithm and a recommendation algorithm, character image recognition, fingerprint recognition and the like, so that basic information management of prisoners in prisons, monitoring and analysis of language communication (including dialects) of the prisoners and prisoners, intelligent management of daily life of the prisoners and a correction and transformation policy of one person for the prisoners are realized. The system can save manpower and material resources in prison management, fills up a gap in prison informatization work, and has great significance for promoting prison supervision and prisoner transformation and correction in China and deeply developing prison informatization construction.

Disclosure of Invention

The purpose of the invention is: the invention provides a person portrait intelligent terminal based on dialect recognition, which can solve the problem that a plurality of dialects cannot be understood, and provides an intelligent correction scheme for criminals based on person portraits, thereby realizing 'one person and one policy' and solving the problem of insufficient police strength.

In order to achieve the technical purpose, the technical scheme adopted by the invention is as follows: (1) inputting keywords of different dialects; (2) preprocessing dialect keywords; (3) carrying out feature extraction; (4) training dialect samples and testing the samples; (5) forming a dialect library; (6) inputting an audio file of a prisoner, matching the audio file with data of a dialect library, and outputting a result; (7) inputting a video file of a prisoner; (8) performing facial emotion capture; (9) matching data in a tag library; (10) combining the audio output file of the prisoner with the data of the tag library; (11) outputting the portrait of the prisoner user.

The content also comprises a panel terminal fixed on the wall surface or the desktop for real-time voice-to-text display, and the panel terminal is used for assisting the monitoring work of the prison police like a subtitle and relieving the mental pressure of the prison police in the monitoring process. Meanwhile, special identification can be rapidly carried out on a certain section of speech, such as danger, focus attention, inaudibility and the like, and the monitoring and the verification of a background are facilitated.

The content also comprises a voice recognition technology, a voice recognition technology based on MFCC feature analysis, an HMM-GMM model and a deep neural network, and the voice recognition technology is used for the voice recognition of multiple dialects used by a prisoner. Sensitive keywords are screened out through edge calculation and stored in a cloud server, the problem that prison managers cannot understand dialects is solved, meanwhile, psychological emotion analysis can be conducted according to the sensitive keywords in words of prisoners, figures of people are analyzed, and a knowledge map of the prisoners is built.

The content also comprises dialects used for identifying all local dialects used by prisoners in real time in the conversation monitoring process, early warning and labeling are carried out by automatically identifying sensitive keywords, emotion labels are generated, and a user portrait knowledge graph is established.

The content also comprises a text generated by performing voice recognition on audio or video which is met by a prisoner through voice recognition; the jail map visualization and prisoner list are realized.

The content also comprises the steps of separating the content of the talker through role analysis and generating a conversation text and a waveform chart, so as to realize sound-text synchronization, freely selecting roles, playing voice segments and the like.

The content also comprises a video file stored in the remote meeting, wherein the audio can be extracted firstly, and then the text file is stored by performing voice recognition.

The content also comprises video education mainly related to prison correction and prison management in video conferences, and advanced prison correction ideas can be conveyed to prisons and cadres. Some video contents are released to the prisoner, so that the prisoner can understand the correction concept of the prisoner, and the prison management work is matched. For a conference video, a conference summary in a given format can be generated through keyword recognition.

The content also comprises the steps of realizing keyword recognition, short string semantic association, semantic indexing and context semantic understanding through a text processing technology and professional field vocabulary weight distribution.

The content also comprises the following steps of constructing a knowledge graph and analyzing the relationship of people: drawing figures and figure relation networks of prisoners on the basis of historical meeting data and basic information data of prisoners, and correcting the guideline knowledge maps of the prisoners.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of the principle of the present invention.

Detailed Description

The speech recognition technology is based on MFCC feature analysis, HMM-GMM model and deep neural network speech recognition technology and is used for speech recognition of multiple dialects used by prisoners. Sensitive keywords are screened out through edge calculation and stored in a cloud server, the problem that prison managers cannot understand dialects is solved, meanwhile, psychological emotion analysis can be conducted according to the sensitive keywords in words of prisoners, figures of people are analyzed, and a knowledge map of the prisoners is built.

Based on a search algorithm and a recommendation algorithm of deep learning, when a criminal and an intelligent correction robot perform question-answer interaction, the intelligent correction robot automatically extracts and matches portrait of a user of the criminal (including voice recording, psychological emotion analysis recording, question-answer recording and the like), makes specific information feedback for the criminal, and recommends an optimal answer.

And the face detection algorithm based on deep learning performs optimized analysis on the complexity and unstructured features of the face and the emotion in the project. The method is combined with automatic analysis based on deep learning and visual analysis methods to realize semi-automatic emotion analysis on characters, and a more optimized emotion analysis result is achieved. Specifically, a video shot by a serviceman is firstly divided into a plurality of frames of images, and each frame of images is analyzed by using the technologies of face detection and person identification so as to analyze the facial expression of a person. In the description of emotions for the average person, we apply 7 basic types: angry, surprise, happiness, neutrality, sadness, aversion and fear. And a specific optimization method is used for refining the basic emotion classification, so that the emotion analysis accuracy of the prisoner is improved. The project collects video data shot in the prison, carries out actual measurement on a face detection technology, and carries out targeted optimization. In order to better understand the prisoner, the following characteristics are added in the process of identifying the person, including: facial markers, frontal side faces, degree of occlusion, etc. In order to improve the accuracy of emotion analysis, the project also acquires the data of the published ted talks and other published data sets which can be applied to emotion analysis.

And the voice synthesis technology is used for synthesizing the text content into voice through the voice synthesis technology and playing a role in man-machine interaction.

And the signal processing technology is used for carrying out mute cutting on the head end and the tail end of the audio file, so that the interference is reduced. Meanwhile, fuzzy sound segments (namely, the audio frequency is low and is difficult to clearly listen) can be marked, and specific audio can be listened intensively and repeatedly, so that important conversation contents are prevented from being omitted. Meanwhile, an audio waveform diagram is generated through audio, so that sound and text synchronization is realized, and the user experience and the data display effect are improved.

And the semantic analysis technology realizes the short string semantic association, semantic indexing and context semantic understanding through a text processing technology and professional field vocabulary weight distribution. On the basis, the relation between the knowledge graph and the person can be constructed.

The method is used for identifying dialects in various places used by prisoners in real time in a conversation monitoring process, carrying out early warning and labeling by automatically identifying sensitive keywords, generating emotion labels and establishing a user portrait knowledge map.

Carrying out voice recognition on audio or video met by a prisoner through voice recognition to generate a text; the jail map visualization and prisoner list are realized.

The content of the talker is separated through role analysis and a dialogue text and a oscillogram are generated, so that the sound and text synchronization, the free role selection, the voice segment playing and the like are realized. The video file stored in the remote meeting can extract the audio firstly and then carry out voice recognition to store the text file

For the video conference, the video education mainly relates to prison correction and prison management, and an advanced prison correction concept can be transmitted to prisons and cadres. Some video contents are released to the prisoner, so that the prisoner can understand the correction concept of the prisoner, and the prison management work is matched. For a conference video, a conference summary in a given format can be generated through keyword recognition.

The voice question answering method is used for carrying out intelligent voice question answering with prisoners in the education and correction process, pushing one person and one strategy of transformation scheme contents in real time by deep learning and user portrait and utilizing an optimal recommendation algorithm, carrying out scene education and transformation with excellent voice and literature, improving the transformation effect and remarkably reducing the management cost. Semantic analysis: by means of text processing technology and professional field vocabulary weight distribution, keyword recognition, short string semantic association, semantic indexing and context semantic understanding are achieved.

Constructing a knowledge graph and analyzing the relationship of people: drawing figures and figure relation networks of prisoners on the basis of historical meeting data and basic information data of prisoners, and correcting the guideline knowledge maps of the prisoners.

Emotion analysis based on face detection: emotion analysis mainly studies opinions and emotions of people about something, and a common emotion expression form is to publish positive or negative opinions on the internet, so text emotion analysis is also widely studied and applied. This institute is based on textual and semantic analysis, which is a subjective type of sentiment analysis. The item focuses on objective type emotion analysis and is a beneficial supplement to subjective type emotion analysis. The emotional expression form is the expression of human face, facial expression and movement of four limbs of a character in a video, and when the character has involuntary emotional expression, the joy, anger and sadness of the character can be known more accurately by using an automatic analysis method. This method of automatic analysis does not require the aid of subjective descriptions and can be applied in specific scenarios. For example, in the daily supervision of prisoners, a prisoner does not have too much subjective description of his own feelings. The method improves the accuracy of video emotion analysis by combining a deep learning technology with a video analysis technology, so that emotion is better analyzed, emotion and behavior of a person are analyzed in a correlation mode, and emotion and behavior prediction is carried out.

Claims

1. Personage portrait intelligent terminal based on dialect discernment, its characterized in that: (1) inputting keywords of different dialects; (2) preprocessing dialect keywords; (3) carrying out feature extraction; (4) training dialect samples and testing the samples; (5) forming a dialect library; (6) inputting an audio file of a prisoner, matching the audio file with data of a dialect library, and outputting a result; (7) inputting a video file of a prisoner; (8) performing facial emotion capture; (9) matching data in a tag library; (10) combining the audio output file of the prisoner with the data of the tag library; (11) outputting a portrait of a user of the prisoner; the invention can not understand the problem of various dialects, provides an intelligent correction scheme for prisoners based on figure portraits, realizes 'one person one strategy' and solves the problem of insufficient police strength.

2. The dialect identification-based character image intelligent terminal of claim 1, wherein: a panel terminal is fixed on the wall surface or the desktop, real-time voice-to-text display is carried out, and the monitoring work of the prison is assisted like subtitles, so that the mental stress of the prison in the monitoring process is relieved; meanwhile, special identification can be rapidly carried out on a certain section of speech, such as danger, focus attention, inaudibility and the like, and the monitoring and the verification of a background are facilitated.

3. The dialect identification-based character image intelligent terminal of claim 2, wherein: the speech recognition technology is based on MFCC feature analysis, HMM-GMM model and deep neural network and is used for speech recognition of multiple dialects used by prisoners; sensitive keywords are screened out through edge calculation and stored in a cloud server, the problem that prison managers cannot understand dialects is solved, meanwhile, psychological emotion analysis can be conducted according to the sensitive keywords in words of prisoners, figures of people are analyzed, and a knowledge map of the prisoners is built.

4. The dialect identification-based character image intelligent terminal of claim 3, wherein: the method is used for identifying dialects in various places used by prisoners in real time in a conversation monitoring process, carrying out early warning and labeling by automatically identifying sensitive keywords, generating emotion labels and establishing a user portrait knowledge map.

5. The dialect identification-based character image intelligent terminal of claim 4, wherein: carrying out voice recognition on audio or video met by a prisoner through voice recognition to generate a text; the jail map visualization and prisoner list are realized.

6. The dialect identification-based character image intelligent terminal of claim 5, wherein: the content of the talker is separated through role analysis and a dialogue text and a oscillogram are generated, so that the sound and text synchronization, the free role selection, the voice segment playing and the like are realized.

7. The dialect identification-based character image intelligent terminal of claim 6, wherein: the video file stored in the remote meeting can extract audio firstly, and then perform voice recognition to store the text file.

8. The dialect identification-based character image intelligent terminal of claim 7, wherein: for the video conference, the video education mainly relates to prison correction and prison management, and an advanced prison correction concept can be transmitted to prisons and cadres; some video contents are released to the prisoner, so that the prisoner can understand the prisoner correction concept, and the prison management work is matched; for a conference video, a conference summary in a given format can be generated through keyword recognition.

9. The dialect identification-based character image intelligent terminal of claim 8, wherein: semantic analysis: by means of text processing technology and professional field vocabulary weight distribution, keyword recognition, short string semantic association, semantic indexing and context semantic understanding are achieved.

10. The dialect identification-based character image intelligent terminal of claim 9, wherein: constructing a knowledge graph and analyzing the relationship of people: drawing figures and figure relation networks of prisoners on the basis of historical meeting data and basic information data of prisoners, and correcting the guideline knowledge maps of the prisoners.