CN113076747A - Voice recognition recording method based on role recognition - Google Patents

Voice recognition recording method based on role recognition Download PDF

Info

Publication number
CN113076747A
CN113076747A CN202110346865.3A CN202110346865A CN113076747A CN 113076747 A CN113076747 A CN 113076747A CN 202110346865 A CN202110346865 A CN 202110346865A CN 113076747 A CN113076747 A CN 113076747A
Authority
CN
China
Prior art keywords
text
voice
talker
talked
speaking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110346865.3A
Other languages
Chinese (zh)
Inventor
黄星耀
熊倩
王宇骁
王枫
王学春
张志亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Fengyun Jihui Intelligent Technology Co ltd
Original Assignee
Chongqing Fengyun Jihui Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Fengyun Jihui Intelligent Technology Co ltd filed Critical Chongqing Fengyun Jihui Intelligent Technology Co ltd
Priority to CN202110346865.3A priority Critical patent/CN113076747A/en
Publication of CN113076747A publication Critical patent/CN113076747A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention relates to the technical field of voice recognition, in particular to a voice recognition recording method based on role recognition, which comprises the following steps: s1, collecting speaking voices of a talker and a talked person in the talking process in real time; s2, converting the speaking voice of the talker into a first text and converting the speaking voice of the talked into a second text; s3, identifying the wrong entries in the first text and the second text, and replacing the wrong entries according to preset keyword entries; s4, detecting the voice frequency of the speaking voice of the person to be conversed, and marking the corresponding position of the voice frequency in the second text, which is not in the preset frequency range; and S5, carrying out the hearing back on the speaking voice of the talker and the talked person, and carrying out the proofreading on the first text and the second text. The invention solves the technical problem that the prior art can not identify whether the psychological impedance appears in the talked person.

Description

Voice recognition recording method based on role recognition
Technical Field
The invention relates to the technical field of voice recognition, in particular to a voice recognition recording method based on role recognition.
Background
At present, for a basic court, the work load of simultaneously making inquiries and notes is large, and the recording, checking and rechecking procedures are complex. With the development of the voice recognition technology, in the court trial or meeting process of a court, the voice recognition technology can be adopted to convert the voice into characters, and the characters are inserted into the written document in real time in a role-divided manner, so that the workload of court trial or meeting recording personnel is reduced, and the problem of missing and wrong notes is avoided.
For example, chinese patent CN110751950A discloses a police talking voice recognition method and system based on big data, wherein the method comprises the steps of: completing the configuration of the identities of the talker and the talked and the parameter setting of the audio acquisition equipment, completing the control of the audio acquisition equipment, and automatically generating a talking file directory; displaying the text content of the talking voice recognition in real time through a talking interface display window, and storing a talking voice file and a text file; analyzing speech of a conversation through big data, designing a psychological state rating table, analyzing a psychological state index of a conversation object, and generating a big data analysis report; and associating the talking voice file and the text file into the judicial system to finish the sharing of the talking voice file and the text file.
In the technical scheme, the working strength of the working policeman is reduced, and the working efficiency is improved. However, during the course of a conversation, the person being conversed often presents psychological impedance. Psychological impedance refers to the fact that during conversation, the talker negates the analysis of the talker in a public or hidden manner, delays, and counteracts the request of the talker, thereby affecting the normal progress of the conversation and even making the conversation difficult. When the talked-about presents psychological impedance, the credibility of the content of the talking is greatly reduced, and the attention of the clerk should be reminded. With the prior art, it is impossible to recognize whether or not the person being talked presents psychological impedance.
Disclosure of Invention
The invention provides a voice recognition recording method based on role recognition, which solves the technical problem that whether a talked person has psychological impedance or not cannot be recognized in the prior art.
The basic scheme provided by the invention is as follows: the voice recognition recording method based on role recognition comprises the following steps:
s1, collecting speaking voices of a talker and a talked person in the talking process in real time;
s2, converting the speaking voice of the talker into a first text and converting the speaking voice of the talked into a second text;
s3, identifying the wrong entries in the first text and the second text, and replacing the wrong entries according to preset keyword entries;
s4, detecting the voice frequency of the speaking voice of the person to be conversed, and marking the corresponding position of the voice frequency in the second text, which is not in the preset frequency range;
and S5, carrying out the hearing back on the speaking voice of the talker and the talked person, and carrying out the proofreading on the first text and the second text.
The working principle and the advantages of the invention are as follows:
(1) during conversation, if the person being conversed presents psychological impedance, it is usually manifested as silence (e.g., refusal to answer questions, or long pauses), whisper (e.g., answer with phrases, simple sentences, and spoken Buddhist), or excrescence (e.g., taurulation to speak with little chance, avoid certain core questions, divert attention), which corresponds to a frequency that is too low or too high compared to normal speech. In this way, the psychological impedance of the talker can be identified in real time, so as to label the corresponding conversation content, thereby prompting the clerk that the reliability of the conversation content may be problematic.
(2) During the case proceeding process, the real-time speaking voice of the talker and the talked can be converted into the first text and the second text in real time according to the role of the talker, and the wrong terms in the second text can be replaced according to the key terms, so that the clerk can listen back to the speaking voice of the talker and the talked to perform proofreading conveniently. By the mode, the manufacturing quality of case notes can be improved, and the working pressure of case handling personnel is reduced.
The invention can identify the psychological impedance of the talker in real time and label the corresponding conversation content, thereby prompting the credibility of the conversation content of the case handling personnel to have problems.
Further, in S1, the array microphone is used to collect the speaking voice of the talker and the talked during the conversation.
Has the advantages that: the microphone adopts an array design, so that one microphone can distinguish two roles, and is safe and stable; meanwhile, the array design can distinguish fuzzy sound and noise and improve the voice recognition distance.
Further, in S1, synchronous video of the talker and the talked during the talking process is collected; in S5, the first text and the second text are collated with the synchronous recording of the talker and the talked during the talking.
Has the advantages that: the conversation process is recorded in real time in a whole-course synchronous video recording mode, so that the judicial public trust is improved; meanwhile, the first text and the second text are corrected by combining the synchronous record, so that the accuracy of the case handling record is improved.
Further, in S2, the speaking voice of the talker is converted into the first text, and the first text is synchronously displayed; and converting the speaking voice of the talked into a second text, and synchronously displaying the second text.
Has the advantages that: by the mode, the display can be synchronously performed while the conversion is performed, so that the on-site verification is facilitated, and the real-time supervision of the conversation process is facilitated.
Further, in S2, the speaking voice of the talker is converted into a first text, and the first text is broadcasted by synchronous voice; and converting the speaking voice of the talked into a second text, and synchronously voice-broadcasting the second text.
Has the advantages that: through the mode, the first text and the second text are synchronously broadcasted in voice, so that on-site prompt of a clerk to check is facilitated.
Further, in S3, the dialect terms in the first text and the second text are identified, and the dialect terms are replaced according to the predetermined mandarin terms.
Has the advantages that: through the mode, dialects can be replaced by corresponding Mandarin, and the case writing book can be read and understood by case handling personnel conveniently.
Drawings
Fig. 1 is a flowchart of an embodiment of a speech recognition recording method based on character recognition according to the present invention.
Detailed Description
The following is further detailed by the specific embodiments:
example 1
An embodiment is substantially as shown in figure 1, comprising:
s1, collecting speaking voices of a talker and a talked person in the talking process in real time;
s2, converting the speaking voice of the talker into a first text and converting the speaking voice of the talked into a second text;
s3, identifying the wrong entries in the first text and the second text, and replacing the wrong entries according to preset keyword entries;
s4, detecting the voice frequency of the speaking voice of the person to be conversed, and marking the corresponding position of the voice frequency in the second text, which is not in the preset frequency range;
and S5, carrying out the hearing back on the speaking voice of the talker and the talked person, and carrying out the proofreading on the first text and the second text.
In this embodiment, a voice recognition server is adopted, and the product parameters are specifically as follows: the system version is centros 6.7, the CPU type is strong lntel (R) Xeon (R), the CPU model is Xeon D-1521, the CPU frequency is 2.40GHz, the CPU core is 4 cores and 8 threads, the memory type is DDR4, the memory capacity is 64B, the capacity of the hard disk is 250GB SSD, the network interface is 1 kilomega network port, the HDMI output interfaces are 1, the number of power supplies is 1, and the power supply power is 80W.
The specific implementation process is as follows:
first, speech of a talker and a talked during a conversation is collected in real time. In the present embodiment, the array microphone is used to collect the speaking voice of the talker and the talked during the conversation. Specifically, a 4MEMS array microphone may be used, with the following parameters: the frequency response range is 20Hz-20KHz, the signal-to-noise ratio is larger than 70DB, the highest pointing resolution angle is approximately equal to 15 degrees, the output interface is a USB or 3.5 earphone interface, and the limited range of voice recognition is 5 meters. The array microphone has the functions of a pickup and a loudspeaker, and can collect audition sound and play the collected sound. The array microphone is wireless and can be connected with the workstation, and collected voice and text content obtained after voice recognition can be transmitted to the workstation. In addition, the array microphone has a function of intelligent voice operation, for example, a talker may wake up the array microphone by saying "hello, XX", may start the recording operation by saying "start recording", and may also end the recording operation by saying "end recording". In addition, the main functions of the array microphone are to perform character recognition on characters, and two characters, namely a talker and a talked person, can be automatically separated when a recording job is performed. The microphone can distinguish fuzzy sound and noise through array design, the pickup distance is 5 meters, and the effective voice recognition distance is 2 meters.
Then, the speaking voice of the talker is converted into the first text, and the speaking voice of the talked is converted into the second text. In the present embodiment, the conversion between the speaking voice of the talker and the first text and the conversion between the speaking voice of the talked and the second text can be performed using the existing voice recognition technology. In addition, the first text and the second text are synchronously displayed through the display screen while the conversion is carried out, so that the on-site verification is facilitated, and the real-time supervision on the conversation process is facilitated; and the first text and the second text are synchronously broadcasted through the loudspeaker in a voice mode, so that the on-site prompt of the personnel handling the case is facilitated.
And then, identifying wrong entries in the first text and the second text, and replacing the wrong entries according to preset key entries. In this embodiment, the keyword entries may be predefined by a bookkeeper, the keyword entries include a place name and a person name, and when the incorrect entry in the first text and the second text is identified, the incorrect entry is replaced with the corresponding keyword entry. Meanwhile, the dialect entries in the first text and the second text are also identified, and the dialect entries are replaced according to the preset Mandarin entry in a similar mode, so that the dialect is replaced by the corresponding Mandarin, and the case entry is read and understood by the case handling personnel conveniently.
And then, detecting the voice frequency of the speaking voice of the person to be conversed, and marking the corresponding position of the voice frequency in the second text, which is not in the preset frequency range. For example, the speech frequency of normal adult speech ranges from 50 to 500Hz, if during the conversation, the person to be conversed is silent (refusing to answer questions, or pausing for a long time), whisper (answering with phrases, simple sentences, and spoken Buddhists), or excrescence (speaking with tautology to minimize the chance of speaking, avoiding some core questions, or distracting). Correspondingly, for the former, the voice frequency is lower than 50Hz, i.e. the frequency is too low compared with the normal speaking; in the latter case, the speech frequency may be higher than 500Hz, i.e. too high compared to the frequency of normal speech. In this way, the psychological impedance of the talker can be identified in real time, and the words spoken by the talker during the period of the psychological impedance are labeled, for example, the words are bolded or red, so as to prompt the clerk that the speech content of the session may be questioned.
Finally, the speaking voice of the talker and the talked person is listened back, and the first text and the second text are collated. For example, the clerk reads the first text and the second text while playing the speaking voice of the talker and the talked, thereby completing the proof reading. By the mode, in the case proceeding process, the real-time speaking voice of the talker and the talked can be converted into the first text and the second text in real time according to the role of the talker, and the case handling personnel can perform proofreading, so that the case record making quality is improved, and the working pressure of the case handling personnel is reduced.
Example 2
The difference from the embodiment 1 is only that, while the speech voices of the talker and the talked during the conversation are collected, the synchronous video recording of the talker and the talked during the conversation is also collected; and when the first text and the second text are corrected, the first text and the second text are corrected by combining the speaking voice and the synchronous recording. For example, the clerk reads the first text and the second text while playing the speaking voice of the talker and the talked, thereby completing the first proofreading work; the counter can read the first text and the second text while playing the synchronous video of the talker and the talked, thereby completing the second proofreading. And the accuracy of case handling and writing records is improved by twice proofreading.
Example 3
The only difference from embodiment 2 is that, in the present embodiment, the utterance voice is segmented in the form of a cut point to obtain a plurality of voice segments before detecting the voice frequency of the utterance voice of the talked person. First, it is determined whether the cut point is located in a blank area of the speech sound, that is, whether a sound is present at the position of the speech sound where the cut point is located: if sound exists at the position of the speaking voice where the tangent point is located, the tangent point is not located in the blank area of the speaking voice; on the contrary, if no sound exists at the position of the speaking voice where the cut point is located, the cut point is located in the blank area of the speaking voice. If the cutting point is positioned in the blank area of the speaking voice, the voice characteristics of the speaker cannot be lost by direct cutting, so that the cutting is directly carried out; otherwise, the segmentation is not directly performed. Then, if the cut point is not located at the blank region of the speech voice, it is determined whether the number of speakers including the talker and the talked person has changed, that is, whether the number of voiceprint features in the speech voice has changed is detected: if the number of the voiceprint features in the speaking voice is increased, the number of the speakers is increased, and if the number of the voiceprint features in the speaking voice is reduced, the number of the speakers is reduced, at the moment, the tangent point is moved to the position where the number of the speakers in the speaking voice is changed; on the contrary, if the number of the voiceprint features in the speaking voice is not changed, the number of the speakers is not changed, and the tangent point is not moved. In this way, the segmentation process can be properly simplified without losing the speaker's voice characteristics.
The foregoing is merely an example of the present invention, and common general knowledge in the field of known specific structures and characteristics is not described herein in any greater extent than that known in the art at the filing date or prior to the priority date of the application, so that those skilled in the art can now appreciate that all of the above-described techniques in this field and have the ability to apply routine experimentation before this date can be combined with one or more of the present teachings to complete and implement the present invention, and that certain typical known structures or known methods do not pose any impediments to the implementation of the present invention by those skilled in the art. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be determined by the contents of the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.

Claims (6)

1. The voice recognition recording method based on role recognition is characterized by comprising the following steps:
s1, collecting speaking voices of a talker and a talked person in the talking process in real time;
s2, converting the speaking voice of the talker into a first text and converting the speaking voice of the talked into a second text;
s3, identifying the wrong entries in the first text and the second text, and replacing the wrong entries according to preset keyword entries;
s4, detecting the voice frequency of the speaking voice of the person to be conversed, and marking the corresponding position of the voice frequency in the second text, which is not in the preset frequency range;
and S5, carrying out the hearing back on the speaking voice of the talker and the talked person, and carrying out the proofreading on the first text and the second text.
2. The character recognition-based voice recognition recording method of claim 1, wherein in S1, the array microphone is used to collect the speaking voice of the talker and the talked during the conversation.
3. The character recognition-based voice recognition recording method according to claim 2, wherein in S1, synchronous videos of the talker and the talked during the conversation are collected; in S5, the first text and the second text are collated with the synchronous recording of the talker and the talked during the talking.
4. The character recognition-based voice recognition recording method according to claim 3, wherein in S2, the utterance voice of the talker is converted into the first text, and the first text is synchronously displayed; and converting the speaking voice of the talked into a second text, and synchronously displaying the second text.
5. The character recognition-based voice recognition recording method according to claim 4, wherein in S2, the speaking voice of the talker is converted into the first text, and the first text is broadcasted in synchronization with the voice; and converting the speaking voice of the talked into a second text, and synchronously voice-broadcasting the second text.
6. The character recognition-based voice recognition recording method of claim 5, wherein in S3, the dialect terms in the first text and the second text are further recognized and replaced according to a predetermined mandarin term.
CN202110346865.3A 2021-03-31 2021-03-31 Voice recognition recording method based on role recognition Withdrawn CN113076747A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110346865.3A CN113076747A (en) 2021-03-31 2021-03-31 Voice recognition recording method based on role recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110346865.3A CN113076747A (en) 2021-03-31 2021-03-31 Voice recognition recording method based on role recognition

Publications (1)

Publication Number Publication Date
CN113076747A true CN113076747A (en) 2021-07-06

Family

ID=76614132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110346865.3A Withdrawn CN113076747A (en) 2021-03-31 2021-03-31 Voice recognition recording method based on role recognition

Country Status (1)

Country Link
CN (1) CN113076747A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096669A (en) * 2021-03-31 2021-07-09 重庆风云际会智慧科技有限公司 Voice recognition system based on role recognition
CN113542810A (en) * 2021-07-14 2021-10-22 上海眼控科技股份有限公司 Video processing method and device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096669A (en) * 2021-03-31 2021-07-09 重庆风云际会智慧科技有限公司 Voice recognition system based on role recognition
CN113096669B (en) * 2021-03-31 2022-05-27 重庆风云际会智慧科技有限公司 Speech recognition system based on role recognition
CN113542810A (en) * 2021-07-14 2021-10-22 上海眼控科技股份有限公司 Video processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110298252A (en) Meeting summary generation method, device, computer equipment and storage medium
CN207149252U (en) Speech processing system
Morgan et al. The meeting project at ICSI
US20100179811A1 (en) Identifying keyword occurrences in audio data
CN102903361A (en) Instant call translation system and instant call translation method
CN111128223A (en) Text information-based auxiliary speaker separation method and related device
US20210232776A1 (en) Method for recording and outputting conversion between multiple parties using speech recognition technology, and device therefor
CN113076747A (en) Voice recognition recording method based on role recognition
JP2006301223A (en) System and program for speech recognition
JP2020071675A (en) Speech summary generation apparatus, speech summary generation method, and program
CN114449105A (en) Voice-based electric power customer service telephone traffic quality inspection system
CN111402892A (en) Conference recording template generation method based on voice recognition
JP5099211B2 (en) Voice data question utterance extraction program, method and apparatus, and customer inquiry tendency estimation processing program, method and apparatus using voice data question utterance
EP2763136B1 (en) Method and system for obtaining relevant information from a voice communication
US20190121860A1 (en) Conference And Call Center Speech To Text Machine Translation Engine
JP3859612B2 (en) Conference recording and transcription system
CN110460798B (en) Video interview service processing method, device, terminal and storage medium
JP2020071676A (en) Speech summary generation apparatus, speech summary generation method, and program
CN111739536A (en) Audio processing method and device
CN110767233A (en) Voice conversion system and method
Stupakov et al. COSINE-a corpus of multi-party conversational speech in noisy environments
KR20190143116A (en) Talk auto-recording apparatus method
CN101419796A (en) Device and method for automatically splitting speech signal of single character
KR102407055B1 (en) Apparatus and method for measuring dialogue quality index through natural language processing after speech recognition
Wang et al. Fusion of MFCC and IMFCC for Whispered Speech Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210706

WW01 Invention patent application withdrawn after publication