CN112820293A

CN112820293A - Voice recognition method and related device

Info

Publication number: CN112820293A
Application number: CN202011622349.0A
Authority: CN
Inventors: 赵红杰; 汪冬雪; 柳聪; 赵娇娇
Original assignee: Iflytek Information Technology Co Ltd
Current assignee: Iflytek Information Technology Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-05-18

Abstract

The present application discloses a speech recognition method and a related device. The method includes: acquiring to-be-processed audio generated by at least one target person and physiological parameters of the target person; transcribing the to-be-processed audio into speech to obtain a speech transcription result ; Display the result of voice transcription, and prompt the physiological parameters of the target person when the physiological parameters of the target person meet the warning conditions. Through the technical solution provided by the present application, it can be realized that the user can directly and intuitively locate the position of the physiological parameter warning of the target person according to the prompt, and more intuitively display the physiological parameters that meet the warning conditions.

Description

A kind of speech recognition method and related device

技术领域technical field

本申请涉及语音识别技术领域，特别是涉及一种语音识别的方法及相关装置。The present application relates to the technical field of speech recognition, and in particular, to a speech recognition method and related apparatus.

背景技术Background technique

在谈话领域针对谈话对象的突破时，对谈话人的经验要求极高，需要谈话人能够精准的“察言观色”，根据谈话对象的供述判断及时的调整问话方向，直指尖锐问题，去伪存真，从而快速的击溃被谈话对象的心理底线快速突破。对于工作经验不是很丰富的工作人员来说，如果能有一套辅助的谈话分析系统，能对谈话对象的供述真伪和敏感性进行判断，无疑是给工作添加了一个利器。When making breakthroughs in the field of conversation, the experience of the interviewer is extremely high, and the interviewer needs to be able to accurately "observe words and colors", adjust the direction of the question in a timely manner according to the confession of the interviewee, point directly to acute problems, remove the false and preserve the truth, so as to Quickly defeat the psychological bottom line of the person being talked to. For staff who are not very experienced in work, if there is a set of auxiliary conversation analysis system, which can judge the authenticity and sensitivity of the confession of the conversation object, it will undoubtedly add a sharp tool to the work.

现有的方案针对谈话过程需要编制固定题目，无法针对开放性问题进行预警，虽然可以对情绪异常状态进行预警提示，但无法直观定位到具体的谈话内容，谈话人必须中断谈话过程回放分析，或者根据谈话的经验记忆判断关键要点，容易造成谈话的中断，也无法在不中断的情况下对谈话过程中已经产生预警的回答内容进行整体回顾，影响谈话的连续性和突破的关键时机把握，故需要一种可以解决上述技术问题的技术方案。The existing solution needs to prepare fixed topics for the conversation process, and cannot provide early warning for open-ended questions. Although it can provide early warning prompts for abnormal emotional states, it cannot intuitively locate the specific conversation content. The speaker must interrupt the conversation process for playback analysis, or Judging the key points based on the experience and memory of the conversation can easily cause the interruption of the conversation, and it is impossible to review the response content that has generated warnings during the conversation without interruption, which affects the continuity of the conversation and the key timing of breakthroughs. Therefore, There is a need for a technical solution that can solve the above technical problems.

发明内容SUMMARY OF THE INVENTION

本申请主要解决的技术问题是提供一种语音识别的方法及相关装置，可以实现使得用户根据提示直接直观定位到目标人物的生理参数预警的位置，较为直观的展示满足预警条件的生理参数。The main technical problem to be solved by this application is to provide a voice recognition method and related device, which can enable the user to directly and intuitively locate the position of the target person's physiological parameter warning according to the prompt, and more intuitively display the physiological parameters that meet the warning conditions.

为解决上述技术问题，本申请采用的一个技术方案是：提供一种语音识别方法，所述方法包括：In order to solve the above-mentioned technical problems, a technical solution adopted in this application is to provide a speech recognition method, the method comprising:

获取由至少一个目标人物产生的所述待处理音频以及所述目标人物的生理参数；Obtain the audio to be processed generated by at least one target person and the physiological parameters of the target person;

将所述待处理音频进行语音转写，得到语音转写结果；The audio to be processed is transcribed into speech to obtain a transcribed result;

显示所述语音转写结果，且在所述目标人物的生理参数满足预警条件的情况下，提示所述目标人物的生理参数。The voice transcription result is displayed, and the physiological parameters of the target person are prompted when the physiological parameters of the target person meet the warning condition.

为解决上述技术问题，本申请采用的另一个技术方案是：提供一种语音识别设备，所述语音识别设备包括处理器以及与所述处理器耦接的存储器和通信接口；其中，In order to solve the above technical problem, another technical solution adopted in the present application is to provide a speech recognition device, the speech recognition device includes a processor, a memory and a communication interface coupled to the processor; wherein,

所述通信接口用于与其他电子设备通信；the communication interface is used to communicate with other electronic devices;

所述存储器用于存储计算机程序；the memory is used to store computer programs;

所述处理器用于运行所述计算机程序以执行如上所述语音识别的方法。The processor is adapted to run the computer program to perform the method of speech recognition as described above.

为解决上述技术问题，本申请采用的又一个技术方案是：提供一种计算机可读存储介质，所述计算机可读存储介质存储有能够被处理器运行的计算机程序，所述计算机程序用于实现如上所述的语音识别的方法。In order to solve the above technical problem, another technical solution adopted in this application is to provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program that can be run by a processor, and the computer program is used to realize The method of speech recognition as described above.

本申请的有益效果是：区别于现有技术的情况，本申请所提供的技术方案，通过获取由至少一个目标人物产生的待处理音频以及目标人物在该待处理音频中的生理参数，然后将待处理音频按照设定的语音转写规则进行语音转写，进而得到语音转写结果，并显示语音转写结果。其中，在显示语音转写结果时，会在目标人物的生理参数满足预警条件时，提示目标人物的生理参数。即本申请所提供的技术方案，通过对待处理音频中目标人物的生理参数进行分析，可以实现对答复开放性问题的目标人物的生理参数进行预警；同时通过在目标人物的生理参数满足预警条件时，提示目标人物的生理参数，较为直观的展示满足预警条件的生理参数，可以实现使得用户可根据提示直接直观定位到目标人物的生理参数预警的位置，进而改善了语音识别，更加适应开放性问答中的语音识别的需求。The beneficial effects of the present application are: different from the situation in the prior art, the technical solution provided by the present application, by acquiring the audio to be processed generated by at least one target person and the physiological parameters of the target person in the audio to be processed, and then The to-be-processed audio is transcribed according to the set speech transcription rules, and then a speech transcription result is obtained, and the speech transcription result is displayed. Wherein, when the voice transcription result is displayed, the physiological parameters of the target person will be prompted when the physiological parameters of the target person meet the warning condition. That is, in the technical solution provided by the present application, by analyzing the physiological parameters of the target person in the audio to be processed, early warning can be realized for the physiological parameters of the target person who answers the open-ended questions; , prompts the physiological parameters of the target person, and more intuitively displays the physiological parameters that meet the warning conditions, so that the user can directly and intuitively locate the position of the target person's physiological parameter warning according to the prompt, thereby improving speech recognition and more adaptable to open question and answer. needs for speech recognition.

附图说明Description of drawings

图1为本申请一种语音识别方法一实施例中的流程示意图；1 is a schematic flowchart of an embodiment of a speech recognition method of the present application;

图2为本申请一种语音识别方法另一实施例中的流程示意图；2 is a schematic flowchart of another embodiment of a speech recognition method of the present application;

图3为本申请一种语音识别方法又一实施例中的流程示意图；3 is a schematic flowchart of another embodiment of a speech recognition method of the present application;

图4为本申请一种语音识别方法再一实施例中的流程示意图；FIG. 4 is a schematic flowchart of still another embodiment of a speech recognition method of the present application;

图5为本申请一种语音识别方法又一实施例中的流程示意图；5 is a schematic flowchart of another embodiment of a speech recognition method of the present application;

图6为本申请一种语音识别方法又一实施例中的流程示意图；6 is a schematic flowchart of another embodiment of a speech recognition method of the present application;

图7为本申请一种语音识别设备一实施例结构示意图；FIG. 7 is a schematic structural diagram of an embodiment of a speech recognition device according to the present application;

图8为本申请一种计算机可读存储介质一实施例结构示意图。FIG. 8 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述。可以理解的是，此处所描述的具体实施例仅用于解释本申请，而非对本申请的限定。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

本申请的描述中，“多个”的含义是至少两个，例如两个，三个等，除非另有明确具体的限定。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。In the description of the present application, "a plurality of" means at least two, such as two, three, etc., unless otherwise expressly and specifically defined. Furthermore, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

请参见图1，图1为本申请一种语音识别方法一实施例中的流程示意图。在当前实施例中，本申请所提供的方法包括：Please refer to FIG. 1. FIG. 1 is a schematic flowchart of an embodiment of a speech recognition method of the present application. In the current embodiment, the method provided by this application includes:

S110：获取由至少一个目标人物产生的待处理音频以及目标人物的生理参数。S110: Acquire the audio to be processed generated by at least one target person and the physiological parameters of the target person.

在语音识别设备进行语音识别时，首先获取由至少一个目标人物产生的待处理音频。其中，待处理音频包括由至少一个目标人物产生的、需要进行语音识别的音频数据。其中，音频数据包括语音数据和视频数据，即待处理音频数据中包括目标人物产生的语音和目标人物产生上述语音时对目标人物进行录制得到的视频，目标人物为需要进行语音识别的人物。When the speech recognition device performs speech recognition, the to-be-processed audio generated by at least one target person is first acquired. Wherein, the audio to be processed includes audio data generated by at least one target person and needs to be recognized by speech. The audio data includes voice data and video data, that is, the to-be-processed audio data includes the voice generated by the target person and the video obtained by recording the target person when the target person generates the above-mentioned voice, and the target person is a person who needs voice recognition.

其中，需要说明的是，待处理音频中包括两个或两个以上的人物的音频数据，目标人物为待处理音频中所包括的、且需要进行语音识别和生理参数分析的人物，待处理音频中所包括的人物的数量大于或等于其所包括的目标人物的数量。在一实施例中，待处理音频中包括一个目标人物。在另一实施例中，可以同时对待处理音频中的两个人物同时进行语音识别和生理参数分析，即待处理音频中可以包括两个及两个以上的目标人物，具体在此不做限定，以实际的设定为准。Among them, it should be noted that the audio data to be processed includes audio data of two or more characters, and the target characters are included in the audio to be processed and need to be subjected to speech recognition and physiological parameter analysis. The audio data to be processed The number of characters included in is greater than or equal to the number of target characters included in it. In one embodiment, a target person is included in the audio to be processed. In another embodiment, speech recognition and physiological parameter analysis may be performed simultaneously on two characters in the audio to be processed, that is, the audio to be processed may include two or more target characters, which is not specifically limited here. The actual setting shall prevail.

在当前实施例中，待处理音频可以是由与语音识别设备连接的外部模组对目标人物进行实时采集获取，并输入至语音识别设备进行语音识别的。其中，用于获取待处理音频的外部模组至少包括麦克风和摄像机。进一步地，摄像机包括USB摄像机。可以理解的是，在其他实施例中，用于获取待处理音频的外部模组还可以包括其他类型的模组，在此不一一列举。In the current embodiment, the audio to be processed may be acquired by an external module connected to the speech recognition device that collects and obtains the target person in real time, and is input to the speech recognition device for speech recognition. Wherein, the external module for acquiring the audio to be processed includes at least a microphone and a camera. Further, the camera includes a USB camera. It can be understood that, in other embodiments, the external module for acquiring the audio to be processed may also include other types of modules, which are not listed one by one here.

在另一实施例中，待处理音频也可以是预先获取的非实时采集的音频数据。具体地，待处理音频可以是由其他设备直接输入的已有的音频数据，也可以是由语音识别设备通过网络直接获取到的音频数据。In another embodiment, the audio to be processed may also be pre-acquired audio data that is not collected in real time. Specifically, the audio to be processed may be existing audio data directly input by other devices, or may be audio data directly acquired by a speech recognition device through a network.

其中，待处理音频是利用麦克风阵列采集得到。进一步地，当待处理音频包括语音数据和视频数据时，待处理音频中的语音数据是由麦克风阵列采集得到，待处理音频中的视频数据是由摄像机同步采集得到。The to-be-processed audio is acquired by using a microphone array. Further, when the audio to be processed includes voice data and video data, the voice data in the audio to be processed is acquired by a microphone array, and the video data in the audio to be processed is acquired synchronously by a camera.

在获取由至少一个目标人物产生的待处理音频之后，进一步自待处理音频中获取目标人物的生理参数。其中，目标人物的生理参数包括目标人物的紧张值、心率、血氧、呼吸频率、血压、心率变异性中的至少一种生理参数。可以理解的是，在其他实施例中，目标人物的生理参数还可以包括其他类型的参数，在此不一一列举。After acquiring the audio to be processed generated by the at least one target person, the physiological parameters of the target person are further acquired from the audio to be processed. Wherein, the physiological parameters of the target person include at least one physiological parameter of the target person's stress value, heart rate, blood oxygen, respiratory rate, blood pressure, and heart rate variability. It can be understood that, in other embodiments, the physiological parameters of the target person may also include other types of parameters, which are not listed one by one here.

进一步地，在一实施例中，是通过利用生理参数获取模块对待处理音频进行分析，进而获得目标人物的生理参数。Further, in an embodiment, the physiological parameters of the target person are obtained by analyzing the audio to be processed by using the physiological parameter obtaining module.

在另一实施例中，还可以辅助其他的采集设备获取目标人物的生理参数。其中，其他采集设备包括：手环。In another embodiment, other acquisition devices can also be assisted to acquire the physiological parameters of the target person. Among them, other collection devices include: bracelets.

进一步地，在一实施例中，当待处理音频是实时采集获取的，则对待处理音频实时进行分析，以获取目标人物的生理参数。更进一步地，对待处理音频进行分析，进而获取目标人物在各个采集时刻的生理参数。Further, in an embodiment, when the audio to be processed is acquired in real time, the audio to be processed is analyzed in real time to obtain the physiological parameters of the target person. Further, the to-be-processed audio is analyzed to obtain the physiological parameters of the target person at each collection moment.

进一步地，在另一实施例中，当待处理音频是实时采集获取的，可以按照预先设定的时间周期对处理音频进行分析，以获取目标人物在该设定的时间周期内的生理参数。如，可根据预先设定，当是实时采集得到待处理音频，按照预先设定的10s的周期，对所采集到的最新的10s的待处理音频进行分析，进而获得目标人物在该最新的10s中的生理参数。如，自开始实时采集待处理音频，按照设定，当采集了第一个10s的待处理音频之后，对此前所采集的第一个10s的待处理音频进行分析进而获取目标人物在该第一个10s的待处理音频中的生理参数，以此类推。在当前实施例中，通过按照预先设定的时间周期对待处理音频进行分析进而获取目标人物的生理参数，可以实现更为准确的获取目标人物的生理参数。Further, in another embodiment, when the audio to be processed is acquired in real time, the processed audio may be analyzed according to a preset time period to obtain the physiological parameters of the target person within the preset time period. For example, according to the preset, when the audio to be processed is acquired in real time, according to the preset period of 10s, the latest 10s of the collected audio to be processed can be analyzed, and then the target person in the latest 10s can be obtained. Physiological parameters in . For example, since the real-time collection of the audio to be processed is started, according to the settings, after the audio to be processed for the first 10s is collected, the audio to be processed for the first 10s collected before is analyzed to obtain the target character in the first audio. Physiological parameters in 10s of audio to be processed, and so on. In the current embodiment, by analyzing the audio to be processed according to a preset time period to obtain the physiological parameters of the target person, it is possible to obtain the physiological parameters of the target person more accurately.

进一步地，当待处理音频不是实时获取的，则对应可以是按照设定分析规则对待处理音频进行分析，进而获取得到目标人物在各个采集时刻的生理参数。如可以根据预先设定，每两个相邻的采集时刻相隔1s，对应的，设定1s、2s、3s、4s、5s......ns为采集时刻，对应采样得到目标人物在1s、2s、3s、4s、5s......ns等各个采集时刻的生理参数。Further, when the audio to be processed is not acquired in real time, the corresponding audio to be processed may be analyzed according to the set analysis rules, and then the physiological parameters of the target person at each collection moment are acquired. For example, according to the preset, every two adjacent collection moments are separated by 1s. Correspondingly, set 1s, 2s, 3s, 4s, 5s... , 2s, 3s, 4s, 5s...ns and other physiological parameters at each acquisition time.

S120：将待处理音频进行语音转写，得到语音转写结果。S120: Perform voice transcription on the audio to be processed to obtain a voice transcription result.

在获取到待处理音频以目标人物的生理参数之后，进一步将待处理音频进行语音转写，进而获得对应的语音转写结果。其中，语音转写即是识别待处理音频中所包括的文本并输出，语音转写结果为文本数据。After the audio to be processed and the physiological parameters of the target person are acquired, the audio to be processed is further transcribed into speech, and a corresponding speech transcribed result is obtained. Among them, the speech transcription is to recognize and output the text included in the audio to be processed, and the speech transcription result is text data.

进一步地，在当前实施例中，步骤S120中是对待处理音频中所包括的所有的文本进行识别，并输出作为语音转写结果。Further, in the current embodiment, step S120 is to recognize all the texts included in the audio to be processed, and output it as a result of speech transcription.

进一步地，如若待处理音频是实时采集的，对应的也可以实时对所采集的待处理音频进行语音转写。Further, if the audio to be processed is collected in real time, correspondingly, the collected audio to be processed can also be transcribed in real time.

在另一实施例中，也可以是预留设定缓存周期用于采集一定时长的待处理音频，然后再结合设定缓存周期内的待处理音频进行语音转写，进而得到语音转写结果。In another embodiment, a set cache period may also be reserved for collecting to-be-processed audio for a certain period of time, and then combined with the to-be-processed audio within the set cache period to perform voice transcription, thereby obtaining a voice transcription result.

进一步地，上述步骤S120将待处理音频进行语音转写，得到语音转写结果，进一步包括：将待处理音频进行分段，并对分段得到的若干段待处理音频进行语音转写，得到若干转写文本段，以作为语音转写结果。Further, the above-mentioned step S120 performs voice transcription on the audio to be processed to obtain a voice transcription result, further comprising: segmenting the audio to be processed, and performing voice transcription on several segments of the audio to be processed obtained from the segmentation to obtain a number of Transcribe text segments as a result of speech transcription.

更进一步地，上述步骤中的将待处理音频进行分段，包括以下①至③中的至少一者：Further, segmenting the audio to be processed in the above steps includes at least one of the following ① to ③:

①.将待处理音频中属于不同人物的音频进行分段。具体可以根据音色将待处理音频中属于不同人物的音频进行分段。又或者，当待处理音频中只包括两个人物，且分别处于不同麦克风阵列的不同方位时，则可以根据待处理音频中各个部分音频所处的方位，将属于不同人物的音频进行分段。①. Segment the audio belonging to different characters in the audio to be processed. Specifically, audios belonging to different characters in the audio to be processed may be segmented according to the timbre. Alternatively, when the audio to be processed includes only two characters and they are located in different orientations of different microphone arrays, the audio belonging to different characters can be segmented according to the orientation of each part of the audio in the audio to be processed.

②.若待处理音频中属于同一人物的第一音频存在预设停顿，则将第一音频按照预设停顿进行分段。其中，预设停顿为超过预设时间的语音停顿，预设时间为预先设定的时间，在此不做特别限定，第一音频为分段得到的同属于同一个人物的音频。②. If there is a preset pause in the first audio belonging to the same person in the audio to be processed, the first audio is segmented according to the preset pause. The preset pause is a speech pause that exceeds a preset time, and the preset time is a preset time, which is not particularly limited here, and the first audio is segmented audio that belongs to the same character.

③.若待处理音频中属于同一人物的第二音频的字数大于预设字数，则将第二音频分段。其中，第二音频为分段得到的同属于同一个人物，且字数大于设定字数的音频，预设字数可以根据实际的需求进行设定，在此不做限定。③. If the number of words of the second audio belonging to the same person in the audio to be processed is greater than the preset number of words, the second audio is segmented. Wherein, the second audio is the audio obtained by segmentation and belonging to the same character, and the number of words is greater than the set number of words, and the preset number of words can be set according to actual needs, which is not limited here.

进一步地，当一个实施例中同时包括上述①至③，则对应的可以是依次执行上述①至③。即首先将待处理音频中属于不同人物的音频进行分段，然后再确定属于同一人物的第一音频是否存在预设停顿，若存在，则将第一音频按照预设停顿再次进行分段；然后进一步确定属于同一人物的第二音频的字数是否大于预设字数，若是，则对第二音频进一步进行分段。Further, when an embodiment includes the above ① to ③ at the same time, correspondingly, the above ① to ③ may be executed in sequence. That is, the audios belonging to different characters in the audio to be processed are firstly segmented, and then it is determined whether the first audio belonging to the same character has a preset pause, and if so, the first audio is segmented again according to the preset pause; then It is further determined whether the number of words of the second audio belonging to the same person is greater than the preset number of words, and if so, the second audio is further segmented.

进一步地，在此需要说明的是，在此不限定获取待处理音频中所包括的目标人物的生理参数和对待处理音频进行语音转写两者的执行顺序，在其他实施例中，这两个步骤可以同时执行，也可以是一前一后分别先后执行，具体以实际的设定为准。Further, it should be noted here that the execution order of acquiring the physiological parameters of the target person included in the audio to be processed and performing voice transcription of the audio to be processed is not limited here. The steps may be executed simultaneously, or may be executed in succession one after the other, depending on the actual setting.

S130：显示语音转写结果，且在目标人物的生理参数满足预警条件的情况下，提示目标人物的生理参数。S130: Display the voice transcription result, and prompt the physiological parameters of the target person when the physiological parameters of the target person meet the warning condition.

在将待处理音频进行语音转写，得到语音转写结果之后，进一步显示语音转写结果。如若经过判断得到目标人物的生理参数满足预警条件时，则会进一步在显示语音转写结果时将该满足预警条件的生理参数与对应的语音转写结果叠加输出，进而实现提示目标人物的生理参数。After voice transcription is performed on the audio to be processed, and a voice transcription result is obtained, the voice transcription result is further displayed. If it is judged that the physiological parameters of the target person meet the warning conditions, the physiological parameters that meet the warning conditions and the corresponding voice transcription results will be superimposed and output when the voice transcription result is displayed, so as to realize the prompting of the physiological parameters of the target person. .

语音转写结果包括若干转写文本段，若干转写文本段是由待处理音频中的若干部分音频转写得到。对应的，上述步骤S130中的显示语音转写结果进一步包括：显示若干转写文本段。The speech transcription result includes several transcribed text segments, and the several transcribed text segments are obtained by transcribing several audio parts in the audio to be processed. Correspondingly, the displaying of the speech transcription result in the above step S130 further includes: displaying several transcribed text segments.

进一步地，在对待处理音频进行语音转写得到语音转写结果时，会以对话的形式，选用不同的颜色的底纹输出并显示不同人物所对应的文本段。如，当待处理音频中包括甲乙两个人的音频数据时，则对应的，在对待处理音频进行语音转写时，则会选用不同颜色的底纹输出甲乙各自所对应的文本段，如，采用蓝色底纹输出甲所对应的文本段，采用黄色输出乙所对应的文本段。Further, when voice transcription of the audio to be processed is performed to obtain a voice transcription result, text segments corresponding to different characters will be output and displayed in the form of dialogues using shading of different colors. For example, when the audio to be processed includes the audio data of two persons, A and B, correspondingly, when the audio to be processed is transcribed, shading of different colors will be used to output the text segments corresponding to each of A and B. For example, using The text segment corresponding to A is output in blue shading, and the text segment corresponding to B is output in yellow.

在当前实施例中，预警条件为：生理参数在连续第一数量个采集时刻采集到的数值均大于或等于第一阈值。其中，第一数量为预先设定的用于判断生理参数是否满足预警条件的采集时刻的数量，第一阈值为预先设定的用于判断生理参数是否满足预警条件的比对值，具体依据生理参数的具体类型确定第一阈值的具体值。In the current embodiment, the pre-warning condition is: the values collected by the physiological parameter at the first number of consecutive collection moments are all greater than or equal to the first threshold. The first number is a preset number of collection moments used to determine whether the physiological parameter meets the warning condition, and the first threshold is a preset comparison value used to determine whether the physiological parameter meets the warning condition. The specific type of parameter determines the specific value of the first threshold.

具体地，由于生理参数所包括的参数的类型不同，故对应的第一阈值也可以不是一个值。如，当生理参数包括多种不同类型(种类)的参数时，则对应的第一阈值也可以包括分别对应多种不同类型的参数的比对值，比如当生理参数包括紧张值和心率时，则对应的，第一阈值包括分别对应的紧张值设定的第一比对值和对应心率设定的第二比对值，其中，第一比对值为一个用于判断目标人物的紧张值是否过大，即判断目标人物的紧张值是否符合预警条件的值，第二比对值为一个用于判断目标人物的心率是否过快，即目标人物的心率是否符合预警条件的值。Specifically, since the types of parameters included in the physiological parameters are different, the corresponding first threshold may not be a value. For example, when the physiological parameter includes multiple different types (types) of parameters, the corresponding first threshold value may also include comparison values corresponding to multiple different types of parameters, for example, when the physiological parameter includes stress value and heart rate, Correspondingly, the first threshold value includes a first comparison value set corresponding to the stress value and a second comparison value set corresponding to the heart rate, wherein the first comparison value is a stress value used for judging the target person. The second comparison value is a value used to judge whether the heart rate of the target person is too fast, that is, whether the heart rate of the target person meets the warning condition.

在另一实施例中，预警条件还可以为：在生理参数在连续第一数量个采集时刻采集到的数值中，存在第二数量个数值大于或等于第二阈值。其中，第二数量为预先设定的满足报警条件的采集时刻数量的比对值，第二数量小于或等于第一数量，第二阈值为用于判断生理参数是否满足预警条件的比对值。In another embodiment, the early warning condition may also be: among the values collected by the physiological parameter at the first consecutive collection moments, there are a second number of values greater than or equal to the second threshold. The second number is a preset comparison value of the number of collection moments that meet the warning condition, the second number is less than or equal to the first number, and the second threshold is a comparison value used to determine whether the physiological parameter meets the warning condition.

进一步地，在一实施例中，第二阈值可等于上述第一阈值，具体在此不做限定。Further, in an embodiment, the second threshold may be equal to the above-mentioned first threshold, which is not specifically limited herein.

本申请图1所对应的实施例中，通过获取由至少一个目标人物产生的待处理音频以及目标人物在该待处理音频中的生理参数，然后将待处理音频按照设定的语音转写规则进行语音转写，进而得到语音转写结果，并显示语音转写结果。其中，在显示语音转写结果时，会将满足预警条件的目标人物的生理参数与对应的语音转写结果叠加显示，以提示目标人物的生理参数。即本申请所提供的技术方案，通过对待处理音频中目标人物的生理参数进行分析，可以实现对答复开放性问题的目标人物的生理参数进行预警；同时通过在目标人物的生理参数满足预警条件时，提示目标人物的生理参数，可以实现使得用户可根据提示直接直观定位到目标人物的生理参数预警的位置，进而改善了语音识别，更加适应开放性问答中的语音识别的需求。In the embodiment corresponding to FIG. 1 of the present application, the audio to be processed generated by at least one target person and the physiological parameters of the target person in the audio to be processed are obtained, and then the audio to be processed is transcribed according to the set voice transcription rules. Phonetic transcription, and then get the phonetic transcription result, and display the phonetic transcription result. Wherein, when the voice transcription result is displayed, the physiological parameters of the target person satisfying the warning condition and the corresponding voice transcription result will be superimposed and displayed to prompt the physiological parameters of the target person. That is, in the technical solution provided by the present application, by analyzing the physiological parameters of the target person in the audio to be processed, early warning can be realized for the physiological parameters of the target person who answers the open-ended questions; , prompting the physiological parameters of the target person, so that the user can directly and intuitively locate the position of the target person's physiological parameter warning according to the prompt, thereby improving the voice recognition and more suitable for the needs of voice recognition in open question and answer.

请参见图2，图2为本申请一种语音识别方法另一实施例中的流程示意图。在当前实施例中，本申请所提供的方法包括：Please refer to FIG. 2 , which is a schematic flowchart of another embodiment of a speech recognition method of the present application. In the current embodiment, the method provided by this application includes:

S201：获取由至少一个目标人物产生的待处理音频以及目标人物的生理参数。S201: Acquire audio to be processed generated by at least one target person and physiological parameters of the target person.

其中，在当前实施例中，待处理音频是利用麦克风阵列采集得到，麦克风阵列包括一定数目的声学传感器，分别用于采集不同方位的人物产生的待处理音频。Wherein, in the current embodiment, the audio to be processed is acquired by using a microphone array, and the microphone array includes a certain number of acoustic sensors, which are respectively used to collect the audio to be processed generated by characters in different directions.

具体地，当待处理音频中包括语音数据和视频数据时，麦克风阵列是用于采集语音数据，且在获取目标人物产生的语音数据时，目标人物相对于麦克风阵列的方位是静止不动的，故在当前实施例中，可以直接基于麦克风阵列确定待处理音频中个部分音频所处方位。Specifically, when the audio to be processed includes voice data and video data, the microphone array is used to collect voice data, and when acquiring the voice data generated by the target person, the orientation of the target person relative to the microphone array is stationary, Therefore, in the current embodiment, the position of the audio part of the audio to be processed can be directly determined based on the microphone array.

故在当前实施例中，在对待处理音频进行分段之前，本申请所提供的方法还包括步骤S202至步骤S203。Therefore, in the current embodiment, before the audio to be processed is segmented, the method provided by the present application further includes steps S202 to S203.

S202：基于麦克风阵列确定待处理音频中各部分音频所处方位。S202: Determine, based on the microphone array, the position where each part of the audio to be processed is located.

获取由至少一个目标人物产生的待处理音频以及目标人物的生理参数之后，且待处理音频中的语音数据是由麦克风阵列采集获取，则进一步基于麦克风阵列确定待处理音频中各部分音频所处方位。其中，各部分音频所处方位为相对于麦克风阵列所处的方位。After acquiring the to-be-processed audio generated by at least one target person and the physiological parameters of the target person, and the voice data in the to-be-processed audio is collected and acquired by the microphone array, then further determine the orientation of each part of the audio in the to-be-processed audio based on the microphone array . Wherein, the orientation of each part of the audio is relative to the orientation of the microphone array.

其中，在当前实施例中，各个部分音频为待处理音频中不同的人物各自所对应的至少一段音频。更进一步地，由于多个不同人物之间的语音是呈现对话的形式，故每个人物所对应的音频可以包括至少一个部分音频。需要说明的是，当某一个待处理音频为甲乙两人之间进行交谈产生的，则该待处理音频中对应包括多个由甲所产生的部分音频，也包括多个由乙所产生的部分音频。Wherein, in the current embodiment, each partial audio is at least a piece of audio corresponding to different characters in the audio to be processed. Furthermore, since the voices between multiple different characters are in the form of dialogues, the audio corresponding to each character may include at least one partial audio. It should be noted that when a certain audio to be processed is generated by a conversation between two people, A and B, the audio to be processed correspondingly includes a plurality of partial audios generated by A, as well as a plurality of audios generated by B. audio.

各部分音频进一步包括各部分语音数据，进一步上述步骤S202包括：基于麦克风阵列确定待处理音频中各个部分语音数据所处方位。Each part of the audio further includes each part of the voice data, and the above step S202 further includes: determining the position where each part of the voice data in the audio to be processed is located based on the microphone array.

在另一实施例中，也可以是按照固定方式对待处理音频进行划分获得各部分音频。如，可以是将连续设定数量帧的音频看作是一个部分音频，也可以是按照音色或者停顿时间进行划分得到各个部分音频，具体依据实际的设定进行设定，在此不做限定。In another embodiment, the audio to be processed may also be divided in a fixed manner to obtain each part of the audio. For example, the audio of a continuously set number of frames may be regarded as a partial audio, or each partial audio may be obtained by dividing according to the timbre or the pause time, which is specifically set according to the actual setting, which is not limited here.

进一步地，当待处理音频包括语音数据时，各个部分音频包括各个部分语音，对应的是按照设定时间间隔将待处理音频中所包括的语音数据进行划分得到各个部分语音。如按照预先设定，将每1s的语音划分为一个部分语音。再或者，也可以是按照设定，将音色不同的语音划分为一个部分语音。又或者，也可以是按照设定，根据语音数据中所包括的停顿时间，将语音数据划分成多个部分语音。Further, when the audio to be processed includes voice data, each part of the audio includes each part of the voice, and correspondingly, the voice data included in the to-be-processed audio is divided according to a set time interval to obtain each part of the voice. If preset, the speech of every 1s is divided into a partial speech. Alternatively, the voices with different timbres may be divided into one partial voice according to the setting. Alternatively, the voice data may be divided into a plurality of partial voices according to the setting according to the pause time included in the voice data.

S203：基于各部分音频所处方位，确定各部分音频所属的人物。S203: Determine the person to which each part of the audio belongs based on the position of each part of the audio.

在基于麦克风阵列确定待处理音频中各部分音频所处方位之后，进一步基于各部分音频所处方位，进一步确定各部分音频所属的人物，并在确定各部分音频所属的人物之后，进一步执行下述步骤S204。After determining the location of each part of the audio in the audio to be processed based on the microphone array, further determine the person to which each part of the audio belongs based on the location of each part of the audio, and after determining the person to which each part of the audio belongs, further execute the following Step S204.

在另一实施例中，当待处理音频包括语音数据时，各个部分音频包括各个部分语音，即当步骤S203中确定的是各个部分语音所处方位时，则步骤S203为基于各部分语音所处方位，确定各部分语音所属的人物。In another embodiment, when the audio to be processed includes voice data, each part of the audio includes each part of the voice, that is, when it is determined in step S203 that the position of each part of the voice is located, then step S203 is based on the location of each part of the voice. Orientation, to determine the person to which each part of the voice belongs.

S204：将待处理音频进行分段，并对分段得到的若干段待处理音频进行语音转写，得到若干转写文本段，以作为语音转写结果。在当前实施例中，则是将确定了所属人物的部分音频进行分段，然后对分段得到待处理音频进行语音转写得到若干转写文本段。S204: Segment the audio to be processed, and perform voice transcription on several segments of the audio to be processed obtained from the segmentation to obtain a number of transcribed text segments, which are used as a result of speech transcription. In the current embodiment, a part of the audio for which the person to which the person belongs is determined is segmented, and then the segmented audio to be processed is voice transcribed to obtain several transcribed text segments.

进一步地，在一些实施例中，待处理音频中包括语音数据和视频数据时，则步骤S204可以是仅仅对语音数据进行划分得到若干段语音数据，然后仅仅对分段得到的若干段语音数据进行语音转写，进而得到若干转写文本段。需要说明的是，在这些实施例中，不对视频数据进行划分，视频数据用于获取目标人物的生理参数。Further, in some embodiments, when the audio to be processed includes voice data and video data, the step S204 may be to divide the voice data to obtain several pieces of voice data, and then only perform the segmentation on the several pieces of voice data obtained by segmentation. Speech transcription, and then several transcribed text segments are obtained. It should be noted that, in these embodiments, the video data is not divided, and the video data is used to obtain the physiological parameters of the target person.

S205：显示语音转写结果，且在目标人物的生理参数满足预警条件的情况下，提示目标人物的生理参数。S205: Display the voice transcription result, and prompt the physiological parameters of the target person when the physiological parameters of the target person meet the warning condition.

步骤S204与上述步骤S120的实施例中的步骤相同，步骤S205与上述步骤S130相同，具体可以对应参见上文图1对应部分的阐述，具体在此不再详述。Step S204 is the same as the steps in the embodiment of the above-mentioned step S120, and step S205 is the same as the above-mentioned step S130. For details, please refer to the description of the corresponding part of FIG. 1 above, which will not be described in detail here.

图2所示意的实施例中，通过利用麦克阵列采集待处理音频，可以更好地区分各个不同的人物，为针对不同人物的生理参数进行预警更具有针对性。如，在用于审讯领域的问答系统中，通过利用麦克阵列采集待处理音频，可以更好地实现只会在答话人回答的片段进行生理参数预警提示。In the embodiment shown in FIG. 2 , by using the microphone array to collect the audio to be processed, different characters can be better distinguished, and the early warning for the physiological parameters of different characters is more targeted. For example, in the question answering system used in the field of interrogation, by using the microphone array to collect the audio to be processed, it can better realize the early warning of physiological parameters only in the segment answered by the respondent.

同时，利用麦克风阵列，可以实现更好更为准确的对待处理音频进行分段，进而实现以句为单位提示满足预警条件的生理参数。At the same time, using the microphone array can achieve better and more accurate segmentation of the audio to be processed, and then realize the physiological parameters that meet the warning conditions in units of sentences.

请参见图3，图3为本申请一种语音识别又一实施例中的流程示意图。Please refer to FIG. 3 , which is a schematic flowchart of another embodiment of speech recognition according to the present application.

在当前实施例中，语音转写结果包括若干转写文本段，对应的，在步骤S120将待处理音频进行语音转写，得到语音转写结果之后，本申请所提供的方法还包括：显示若干转写文本段。In the current embodiment, the voice transcription result includes several transcribed text segments. Correspondingly, in step S120, the audio to be processed is voice-transcribed, and after the voice transcription result is obtained, the method provided by the present application further includes: displaying several Transcribe text segments.

对应的，在当前实施例中，上述步骤S130中的在目标人物的生理参数满足预警条件的情况下，提示目标人物的生理参数，进一步包括：Correspondingly, in the current embodiment, in the above step S130, when the physiological parameters of the target person meet the warning condition, prompting the physiological parameters of the target person further includes:

S301：在生理参数满足预警条件的情况下，基于生理参数的采集时间确定预警时间，并基于生理参数的数值确定预警参数值。S301: In the case that the physiological parameter satisfies the warning condition, determine the warning time based on the collection time of the physiological parameter, and determine the warning parameter value based on the value of the physiological parameter.

进一步地，预警时间为一个时间点。预警时间可以基于多个满足预警条件的生理参数所对应的采集时刻确定。Further, the early warning time is a time point. The warning time may be determined based on the collection moments corresponding to a plurality of physiological parameters that satisfy the warning condition.

在一实施例中，可以将满足预警条件的生理参数所对应的采集时刻的中点时间，确定为预警时间。如：满足预警条件的生理参数的采集时刻分别为6s、7s、8s、9、10s，对应的，则将满足预警条件的生理参数所对应的采集时刻的中点时间8s确定为预警时间。In one embodiment, the midpoint time of the collection time corresponding to the physiological parameter satisfying the warning condition may be determined as the warning time. For example, the collection times of the physiological parameters that meet the warning conditions are 6s, 7s, 8s, 9, and 10s, respectively. Correspondingly, the midpoint time 8s of the collection times corresponding to the physiological parameters that meet the warning conditions is determined as the warning time.

在另一实施例中，也可以理解为是将连续的、且满足预警条件的生理参数中处于中心位置的参数所对应的采集时刻输出为预警时间，比如：满足预警条件的生理参数的值依次分别为：32、33、49、31和46，则对应将49所对应的采集时刻输出为预警时间。In another embodiment, it can also be understood as outputting the acquisition time corresponding to the parameter in the central position among the continuous physiological parameters that meet the warning condition as the warning time, for example, the values of the physiological parameters that meet the warning condition are sequentially They are: 32, 33, 49, 31, and 46, respectively, then output the collection time corresponding to 49 as the early warning time.

进一步地，预警时间也可以为一个时间段。在一实施例中，可以将满足预警条件的生理参数所对应的采集时刻所包括的时间范围确定为预警时间。如，满足预警条件的生理参数的采集时刻分别为11s、13s、15s、17s、19s和21s，根据设定可以将11s至21s之间所包括的时间范围确定为预警时间。Further, the warning time may also be a time period. In one embodiment, the time range included in the collection moment corresponding to the physiological parameter satisfying the warning condition may be determined as the warning time. For example, the collection times of the physiological parameters that meet the warning conditions are 11s, 13s, 15s, 17s, 19s, and 21s, respectively. According to the setting, the time range included between 11s and 21s can be determined as the warning time.

在另一实施例中，也可以是将满足预警条件的生理参数所对应的采集时刻所包括的部分时间范围确定为预警时间。接上述例子，也可以根据预先设定将15s至17s之间的时间范围确定为预警时间。In another embodiment, a part of the time range included in the collection moment corresponding to the physiological parameter satisfying the warning condition may also be determined as the warning time. Following the above example, the time range between 15s and 17s may also be determined as the early warning time according to the preset.

其中，预警参数值是基于满足预警条件的各个采集时刻的生理参数确定的。Wherein, the early warning parameter value is determined based on the physiological parameters at each collection moment satisfying the early warning condition.

在一实施例中，预警参数值为各个满足预警条件的各个采集时刻的生理参数的平均值。当预警条件为：存在生理参数在连续第一数量个采集时刻采集到的数值均大于或等于第一阈值时，若判断得到目标人物的生理参数满足预警条件，则对应的求取连续第一数量个采集时刻采集得到生理参数的值的平均值，并将所求取的平均值输出为预警参数值。对应的，如若生理参数包括多种类型的参数值时，则分别求取连续第一数量个采集时刻采集所得的每个类型参数的均值，并将所得的各个类型的均值对应输出为各类型生理参数所对应的预警参数值。In one embodiment, the early warning parameter value is the average value of the physiological parameters at each collection moment that meets the early warning condition. When the pre-warning condition is: when the values collected by the physiological parameters at the first consecutive number of collection moments are all greater than or equal to the first threshold, if it is determined that the physiological parameters of the target person satisfy the pre-warning condition, the corresponding first consecutive number of The average value of the values of the physiological parameters is collected at each collection time, and the obtained average value is output as the warning parameter value. Correspondingly, if the physiological parameter includes multiple types of parameter values, the mean value of each type of parameter collected at the first number of consecutive collection moments is obtained respectively, and the obtained mean value of each type is correspondingly output as each type of physiological parameter. The warning parameter value corresponding to the parameter.

当预警条件为：在生理参数在连续第一数量个采集时刻采集到的数值中，存在第二数量个数值大于或等于第二阈值时，若判断得到目标人物的生理参数满足预警条件，则对应求取第一数量个采集时刻采集所得的生理参数的均值，并将所求取的均值输出为预警参数值。When the pre-warning condition is: among the values collected by the physiological parameters at the first consecutive number of acquisition moments, when there is a second number of values greater than or equal to the second threshold, if it is determined that the physiological parameters of the target person meet the pre-warning conditions, the corresponding The average value of the physiological parameters collected at the first number of collection moments is obtained, and the obtained average value is output as the warning parameter value.

在另一实施例中，预警参数值也可以为满足预警条件的各个采集时刻的生理参数的中间值。对应的，在当前实施例中，将各个满足预警条件的各个采集时刻的生理参数进行排序，然后将排序得到的中间值输出为预警参数值。In another embodiment, the early warning parameter value may also be an intermediate value of the physiological parameter at each collection moment that satisfies the early warning condition. Correspondingly, in the current embodiment, the physiological parameters at each collection time that satisfy the warning condition are sorted, and then the intermediate value obtained by sorting is output as the warning parameter value.

进一步地，当生理参数包括多种不同类型的参数时，则可以对应为不同类型的参数所对应的预警参数值设置不同的计算方式。如：当生理参数包括：紧张值和心率时，对应的，设置求取满足预警条件的采集时刻的紧张值的中值为紧张值预警参数值，求取满足预警条件的心率的平均值为心率预警参数值。Further, when the physiological parameter includes a variety of different types of parameters, different calculation methods can be set for the warning parameter values corresponding to the different types of parameters. For example, when the physiological parameters include stress value and heart rate, correspondingly, set the median value of the stress value at the collection time that meets the warning condition as the stress value warning parameter value, and obtain the average value of the heart rate that meets the warning condition as the heart rate Warning parameter value.

S302：将与预警时间匹配的转写文本段确定为目标转写文本段。S302: Determine the transcribed text segment that matches the warning time as the target transcribed text segment.

在基于生理参数的采集时间确定预警时间，并基于生理参数的数值确定预警参数值之后，进一步将与预警时间匹配的转写文本段确定为目标转写文本段。其中，目标转写文本为最终确定的与预警参数值叠加输出并显示的转写文本段。After the early warning time is determined based on the collection time of the physiological parameter, and the early warning parameter value is determined based on the value of the physiological parameter, the transcribed text segment matching the early warning time is further determined as the target transcribed text segment. The target transcribed text is the finally determined transcribed text segment that is superimposed and displayed with the warning parameter value.

在一实施例中，首先判断是否存在时间范围覆盖预警时间的转写文本段，若存在时间范围覆盖预警时间的转写文本段，则存在的转写文本段作为目标转写文本段，即将时间范围覆盖预警时间的转写文本段输出作为目标转写文本段。In one embodiment, it is first judged whether there is a transcribed text segment whose time range covers the warning time. If there is a transcribing text segment whose time range covers the warning time, the existing transcribing text segment is used as the target The transcribed text segment whose scope covers the warning time is output as the target transcribed text segment.

在另一实施例中，若经过判断得到不存在时间范围覆盖预警时间的转写文本段，则位于预警时间之后且最靠近预警时间的转写文本段作为目标转写文本段，即将位于预警之间之后且与预警时间紧邻的转写文本段作为目标转写文本段。In another embodiment, if it is determined that there is no transcribed text segment with a time range covering the warning time, the transcribed text segment located after the early warning time and closest to the warning time is used as the target transcribed text segment, which is about to be located in the early warning time. The transcribed text segment after the time and immediately adjacent to the warning time is used as the target transcribed text segment.

S303：将预警参数值显示在目标转写文本段的预设区域范围内。S303: Display the warning parameter value within the preset area range of the target transcription text segment.

在确定目标转写文本段之后，将预警参数值显示在目标转写文本段的预设区域范围内。其中，目标转写文本段的预设区域范围内为预先设定的、且靠近目标转写文本段的区域。如，根据设定预设区域范围为目标转写文本段的上方或下方。又或者，根据设定预设区域范围为目标转写文本段的右侧或者左侧，具体依据实际的设定为准，在此不做限定。After the target transcription text segment is determined, the warning parameter value is displayed within the range of the preset area of the target transcription text segment. Wherein, the preset area range of the target transcribed text segment is a preset area that is close to the target transcribed text segment. For example, according to the preset area range, it is above or below the target transcribed text segment. Alternatively, the preset area range is set as the right side or the left side of the target transcribed text segment, which is based on the actual setting, which is not limited here.

请参见图4，图4为本申请一种语音识别方法再一实施例中的流程示意图。在当前实施例，本申请所提供的方法包括：Please refer to FIG. 4 . FIG. 4 is a schematic flowchart of still another embodiment of a speech recognition method of the present application. In the current embodiment, the method provided by this application includes:

S401：获取由至少一个目标人物产生的待处理音频以及目标人物的生理参数。步骤S401与上述步骤S110相同，可参见上文S110对应部分的阐述。S401: Acquire audio to be processed generated by at least one target person and physiological parameters of the target person. Step S401 is the same as the above-mentioned step S110, and reference may be made to the description of the corresponding part of S110 above.

S402：基于麦克风阵列确定待处理音频中各部分音频所处方位。S402: Determine, based on the microphone array, the location where each part of the audio to be processed is located.

S403：基于各部分音频所处方位，确定各部分音频所属的人物。步骤S402与步骤S403分别与上述步骤S202与S203相同，具体在此不再赘述。S403: Determine the person to which each part of the audio belongs based on the position of each part of the audio. Steps S402 and S403 are the same as the above-mentioned steps S202 and S203 respectively, and details are not repeated here.

S404：将待处理音频进行分段，并对分段得到的若干段待处理音频进行语音转写，得到若干转写文本段，以作为语音转写结果。步骤S404中的分段规则可以对应参照上述步骤S120中的阐述，在此不再赘述。S404: Segment the audio to be processed, and perform voice transcription on several segments of the audio to be processed obtained from the segmentation to obtain several transcribed text segments, which are used as a result of the speech transcription. For the segmentation rule in step S404, reference may be made to the description in step S120 above, which will not be repeated here.

S405：显示若干转写文本段。S405: Display several transcribed text segments.

在对待处理音频进行分段，并对分段得到的若干段待处理音频进行语音转写得到若干转写文本段之后，进一步显示若干转写文本段。进一步地，在另一实施例中步骤S405与下述步骤S406至S409同步执行，具体在此不限定具体的执行顺序。After the audio to be processed is segmented, and several segments of the audio to be processed obtained from the segment are voice transcribed to obtain several transcribed text segments, the several transcribed text segments are further displayed. Further, in another embodiment, step S405 is executed synchronously with the following steps S406 to S409, and the specific execution sequence is not limited herein.

在当前实施例中，预警条件为生理参数在多个采集时刻采集得到的数值满足预设要求，则在当前实施例中，上述步骤S301中的基于生理参数的采集时间确定预警时间进一步包括步骤S406中的将多个采集时刻中的中心时间为预警时间。In the current embodiment, the pre-warning condition is that the values collected by the physiological parameters at multiple collection times meet the preset requirements, then in the current embodiment, determining the pre-warning time based on the collection time of the physiological parameters in the above step S301 further includes step S406 The central time among multiple collection moments is the early warning time.

S406：在生理参数满足预警条件的情况下，将多个采集时刻中的中心时间为预警时间。其中，将满足预警条件的多个采集时刻中最靠前的采集时刻与最靠后的采集时刻相加，然后除以2进而得到中心时间。S406: In the case that the physiological parameter satisfies the warning condition, the central time among the multiple collection moments is the warning time. Among them, the earliest collection time and the last collection time among the multiple collection times that satisfy the warning condition are added together, and then divided by 2 to obtain the central time.

在当前实施例中，上述步骤S301中的并基于生理参数的数值确定预警参数值进一步包括步骤S407。In the current embodiment, determining the warning parameter value in the above step S301 and based on the value of the physiological parameter further includes step S407.

S407：将生理参数在多个采集时刻上采集到的数值的均值作为预警参数值。其中，预警参数值为最终与目标转写文本段叠加输出显示的，用于向用户展示目标人物生理状态的参数。将满足预警条件的连续第一数量个采集时刻采集得到的生理参数的数值相加，并求取均值进而得到预警参数值。S407: Take the mean value of the values collected by the physiological parameter at multiple collection times as the warning parameter value. Among them, the warning parameter value is a parameter that is finally displayed superimposed with the target transcribed text segment and used to display the physiological state of the target person to the user. The values of the physiological parameters collected at the first consecutive collection moments satisfying the warning condition are added, and the mean value is obtained to obtain the warning parameter value.

S408：将与预警时间匹配的转写文本段确定为目标转写文本段。S408: Determine the transcribed text segment that matches the warning time as the target transcribed text segment.

S409：将预警参数值显示在目标转写文本段的预设区域范围内。步骤S408与步骤S409与上述步骤S302和步骤S303相同，具体也可以参见上文对应部分的阐述。S409: Display the warning parameter value within the preset area range of the target transcription text segment. Steps S408 and S409 are the same as the above-mentioned steps S302 and S303, and for details, please refer to the descriptions in the corresponding parts above.

在一实施例中，若目标人物的生理参数有多种，且存在一些种类生理参数满足预警条件，且一些种类生理参数不满足预警条件，则上述将预警参数值显示在目标转写文本段的预设区域范围内，进一步包括：In one embodiment, if there are many kinds of physiological parameters of the target person, and some kinds of physiological parameters meet the warning condition, and some kinds of physiological parameters do not meet the warning condition, the above-mentioned warning parameter value is displayed in the target transcription text segment. Within the preset area, it further includes:

将满足预警条件的生理参数对应的预警参数值显示在目标转写文本段的预设区域范围内。如，生理参数同时包括目标人物的紧张值、心率、血氧、呼吸频率、血压、心率变异性，但是经过判断得到只有紧张值和心率值满足预警条件，对应的，则只会将紧张值对应的预警参数值和心率对应的预警参数值显示在目标转写文本段的预设区域范围内。Display the warning parameter value corresponding to the physiological parameter that meets the warning condition within the preset area range of the target transcription text segment. For example, the physiological parameters also include the target person's stress value, heart rate, blood oxygen, respiratory rate, blood pressure, and heart rate variability, but after judgment, only the stress value and heart rate value meet the warning conditions. Correspondingly, only the stress value corresponds to The warning parameter value of , and the warning parameter value corresponding to the heart rate are displayed within the preset area of the target transcription text segment.

进一步地，是将预警参数值是以句为单位进行预警叠加至目标转写文本段上。在当前实施例中，通过将预警参数值是以句为单位进行预警叠加至目标转写文本段上，可以较好地解决实时预警时精准展示的问题，更加有利于实时定位敏感问题及其预警级别。Further, the warning parameter value is superimposed on the target transcribed text segment for warning in units of sentences. In the current embodiment, by superimposing the warning parameter value on the target transcription text segment in units of sentences, the problem of accurate display during real-time warning can be better solved, which is more conducive to real-time positioning of sensitive problems and their early warning level.

在另一实施例中，请参见图5，图5为本申请一种语音识别方法又一实施例中的流程示意图。若目标人物的生理参数有多种，上述步骤S409进一步包括步骤S501至步骤S502。In another embodiment, please refer to FIG. 5 , which is a schematic flowchart of another embodiment of a speech recognition method of the present application. If there are multiple physiological parameters of the target person, the above step S409 further includes steps S501 to S502.

S501：确定满足预警条件的生理参数的优先级高于不满足预警条件的生理参数。S501: It is determined that the priority of the physiological parameter that meets the warning condition is higher than the priority of the physiological parameter that does not meet the warning condition.

即经过判断得到存在一些种类生理参数满足预警条件，且存在一些种类生理参数不满足预警条件，则对应将满足预警条件的生理参数的优先级设置为高于不满足预警条件的生理参数的优先级，并对应在先显示满足预警条件的生理参数或是优先显示满足预警条件的生理参数。That is, it is judged that there are some kinds of physiological parameters that meet the warning conditions, and some kinds of physiological parameters do not meet the warning conditions, then the priority of the physiological parameters that meet the warning conditions is set to be higher than the priority of the physiological parameters that do not meet the warning conditions. , and correspondingly display the physiological parameters that meet the warning conditions first or display the physiological parameters that meet the warning conditions first.

S502：按照优先级将每种生理参数对应的预警参数值进行排序以组成待显示参数集，并将待显示参数集显示在目标转写文本段的预设区域范围内。S502: Sort the warning parameter values corresponding to each physiological parameter according to the priority to form a parameter set to be displayed, and display the parameter set to be displayed within the preset area range of the target transcription text segment.

经过步骤S501将满足预警条件的生理参数的优先级设定高于不满足预警条件的生理参数的优先级之后，则进一步按照优先级将每种生理参数对应的预警参数值进行排序以组成待显示参数集，然后将待显示参数集显示在目标转写文本段的预设区域范围内。其中，待显示参数集为与目标转写文本段叠加显示的参数。After step S501, the priority of the physiological parameter that meets the warning condition is set higher than the priority of the physiological parameter that does not meet the warning condition, then the warning parameter value corresponding to each physiological parameter is further sorted according to the priority to form a to-be-displayed parameter set, and then display the parameter set to be displayed in the preset area range of the target transcription text segment. The parameter set to be displayed is the parameters displayed superimposed with the target transcribed text segment.

在其他实施例中，也可以是预先为不同类型的生理参数设定处理优先级。如当生理参数包括目标人物的紧张值和心率，在可以设置紧张值的处理优先级高于心率的处理优先级。对应的，会先基于紧张值判断目标人物的生理参数是否符合预警条件，同理，如若紧张值和心率同时满足预警条件，则对应优先显示紧张值。In other embodiments, processing priorities may also be set for different types of physiological parameters in advance. For example, when the physiological parameter includes the stress value and heart rate of the target person, the priority of processing the stress value can be set higher than that of the heart rate. Correspondingly, it will first judge whether the physiological parameters of the target person meet the warning conditions based on the stress value. Similarly, if the stress value and heart rate meet the warning conditions at the same time, the corresponding stress value will be displayed first.

进一步地，在一些实施例中，会进一步结合各个生理参数对应的预警参数值超出比对值的比例，确定各个生理参数的显示顺序。即将超出比对值比例最大的生理参数优先显示。其中，超出比对值的比例为：生理参数对应的预警参数值减去该生理参数的比对值所得的差值，与比对值的之间的比值。更进一步地，本申请所提供的方法还包括：即将超出比对值比例最大的生理参数放大显示，以更好地提示用户首先需要注意的生理参数。Further, in some embodiments, the display order of each physiological parameter is determined by further combining the ratio of the warning parameter value corresponding to each physiological parameter exceeding the comparison value. The physiological parameter that is about to exceed the comparison value by the largest proportion will be displayed first. Wherein, the ratio of exceeding the comparison value is the ratio between the difference obtained by subtracting the comparison value of the physiological parameter from the warning parameter value corresponding to the physiological parameter and the comparison value. Furthermore, the method provided by the present application further includes: magnifying and displaying the physiological parameter that exceeds the maximum proportion of the comparison value, so as to better prompt the user to pay attention to the physiological parameter first.

进一步地，在另一实施例中，也可以预先设置预警级别(预警类型)，并哪找预警级别和预警参数值持续比对，只将满足预警条件且预警级别最高的生理参数进行展示。Further, in another embodiment, the warning level (warning type) can also be preset, and the warning level and the warning parameter value are continuously compared, and only the physiological parameter that meets the warning condition and has the highest warning level is displayed.

请参见图6，图6为本申请一种语音识别方法又一实施例中的流程示意图。本申请所提供的方法包括：Please refer to FIG. 6, which is a schematic flowchart of another embodiment of a speech recognition method of the present application. The methods provided in this application include:

S601：获取由至少一个目标人物产生的待处理音频以及目标人物的生理参数。S601: Acquire the audio to be processed generated by at least one target person and the physiological parameters of the target person.

S602：将待处理音频进行语音转写，得到语音转写结果。S602: Perform voice transcription on the audio to be processed to obtain a voice transcription result.

S603：显示语音转写结果，且在目标人物的生理参数满足预警条件的情况下，提示目标人物的生理参数。S603: Display the voice transcription result, and prompt the physiological parameters of the target person when the physiological parameters of the target person meet the warning condition.

当前实施例中步骤S601至步骤S603与上述步骤S110和步骤S130相同，具体可以参见上文对应部分的阐述。Steps S601 to S603 in the current embodiment are the same as the above-mentioned steps S110 and S130. For details, please refer to the descriptions in the corresponding parts above.

在当前实施例中，生理参数包括紧张值，预警条件包括紧张值满足预设要求，语音转写结果包括若干转写文本段，若干转写文本段是由待处理音频中的若干部分音频转写得到。In the current embodiment, the physiological parameter includes a stress value, the warning condition includes that the stress value meets a preset requirement, and the speech transcription result includes several transcribed text segments, and the several transcribed text segments are transcribed from several audio parts in the audio to be processed get.

在当前实施例中，在步骤显示语音转写结果，且在目标人物的生理参数满足预警条件的情况下，提示目标人物的生理参数之后，即本申请所提供的方法在步骤S603之后还包括步骤S604和步骤S605。In the current embodiment, the voice transcription result is displayed in the step, and when the physiological parameters of the target person meet the warning condition, after the physiological parameters of the target person are prompted, that is, the method provided by the present application further includes steps after step S603 S604 and step S605.

S604：将紧张值满足预设要求时所对应的第一转写文本段、与第一转写文本段关联的第二转写文本段、紧张值对应的预警级别，作为一组预警信息。S604: Use the first transcribed text segment corresponding to the tension value meeting the preset requirement, the second transcribed text segment associated with the first transcribed text segment, and the pre-warning level corresponding to the tension value as a set of pre-warning information.

其中，第一转写文本段为由目标人物的语音转写得到的、且目标人物在该段语音对应时间内的生理参数满足预警条件时所对应的语音转写结果。第二转写文本段为与第一转写文本段关联的语音转写结果，如第二转写文本段为提问人物提问某个问题对应的语音转写结果，对应的第一转写文本段为答复人物回答提问人物所提出的问题产生的语音的转写结果。Wherein, the first transcribed text segment is a speech transcription result corresponding to when the target person's physiological parameters within the time corresponding to the speech segment meet the warning condition obtained by transcribing the target person's speech. The second transcribed text segment is the voice transcription result associated with the first transcribed text segment. For example, the second transcribed text segment is the voice transcribed result corresponding to the question asked by the questioner, and the corresponding first transcribed text segment Transcription of the speech produced for the answering character to answer the question posed by the questioning character.

S605：基于预警信息，生成预警报告。S605: Generate an early warning report based on the early warning information.

在获得预警信息，进一步生成预警报告并输出。其中，预警报告至少可以为报表形式。如：在一实施例中，待处理音频为问答双方的所产生的，则对应可以生成如下表所示意的预警报告。After obtaining the warning information, further generate the warning report and output it. The early warning report may at least be in the form of a report. For example, in an embodiment, the audio to be processed is generated by both parties to the question and answer, and correspondingly, an early warning report as shown in the following table can be generated.

进一步地，生成预警报告时，可以是根据预警时间的前后顺序，将预警信息进行排列，并生成预警报告。Further, when the early warning report is generated, the early warning information may be arranged according to the sequence of the early warning time, and the early warning report is generated.

在另一实施例中，还可以是根据预警类型的严重程序，即将预警类型较为严重的预警信息排列在前，生成预警报告。如，可以优先显示高度紧张预警信息，其次是显示中度紧张预警信息，再显示轻度紧张的预警信息，最后再显示心率异常的预警信息。In another embodiment, the warning report may also be generated according to the severity program of the warning type, that is, arranging the warning information with the more serious warning type first. For example, the warning information of high stress can be displayed first, followed by the warning information of moderate stress, then the warning information of mild stress, and finally the warning information of abnormal heart rate.

在当前实施例中，通过基于预警信息，生成预警报告，在预警报告中展示预警的问答对，同时进一步支持按照预警级别(预警类型)和阈值进行排序，更加有利于对待处理音频内容进行分析汇总，直指关键问题点，便于谈话结束或暂停后对问题进行归类回溯，辅助高效分析找到可疑问题，为详细分析提供报告参考。In the current embodiment, an early warning report is generated based on the early warning information, and the early warning question and answer pairs are displayed in the early warning report, and at the same time, sorting according to the warning level (alarm type) and threshold is further supported, which is more conducive to the analysis and summary of the audio content to be processed. , pointing directly to the key problem points, which is convenient for classifying and backtracking the problem after the conversation is over or suspended, assisting efficient analysis to find suspicious problems, and providing report reference for detailed analysis.

本申请所提供的技术方案，当时用于对谈话领域，通过高性能一体机工作站作为运行平台，在本地运行用于语音转写的语音识别引擎和用于情绪分析的视频分析引擎，外部通过连接USB摄像机采集被问话人面部视频数据，语音通过麦克风阵列实现采集到语音数据，以及实现基于麦克风阵列实现所采集到语音数据实现问答角色区分，具体的语音识别的过程可包括如下：首先，利用情绪识别引擎通过实时采集的视频数据分析获得到被问话人的生理参数，其中，生理参数包括：紧张值、心率及其他如上所述的一些生理参数信息，具体可以以每秒1条数据的形式实时采集。The technical solution provided by this application was used in the field of conversation at that time, and the high-performance all-in-one workstation was used as the operating platform to run the speech recognition engine for speech transcription and the video analysis engine for sentiment analysis locally. The USB camera collects the face video data of the person being questioned, the voice is collected through the microphone array, and the question and answer roles are distinguished based on the collected voice data based on the microphone array. The specific speech recognition process can include the following: First, use The emotion recognition engine obtains the physiological parameters of the person being questioned by analyzing the video data collected in real time, wherein the physiological parameters include: stress value, heart rate and other physiological parameter information as mentioned above, which can be specified as 1 data per second. Form real-time collection.

然后，基于语音转写引擎通过麦克风阵列实时语音转写，并且通过方位区分问答双方的语音，并自动对待处理音频分段。其中，分段逻辑包括如下3点中的至少一种，①.将待处理音频中属于不同人物的音频进行分段。②.若待处理音频中属于同一人物的第一音频存在预设停顿，则将第一音频按照预设停顿进行分段。③.若待处理音频中属于同一人物的第二音频的字数大于预设字数，则将第二音频分段。Then, based on the voice transcription engine, the voice is transcribed in real time through the microphone array, and the voices of the two sides of the question and answer are distinguished by the orientation, and the audio segments to be processed are automatically segmented. The segmentation logic includes at least one of the following three points: ①. Segment audio belonging to different characters in the audio to be processed. ②. If there is a preset pause in the first audio belonging to the same person in the audio to be processed, the first audio is segmented according to the preset pause. ③. If the number of words of the second audio belonging to the same person in the audio to be processed is greater than the preset number of words, the second audio is segmented.

情绪识别预警可以包括两种类型：心率预警和紧张值预警。如，在一实施例中，心率预警的阈值为120，紧张值预警按照低中高分三个级别，阈值分别为30/45/70，心率和紧张值都有预警时，叠加到目标转写文本段的优先级紧张值预警大于心率预警，两者都产生预警时也可以根据设定只叠加紧张值预警。Emotion recognition alerts can include two types: heart rate alerts and stress alerts. For example, in one embodiment, the threshold of the heart rate warning is 120, and the stress warning is divided into three levels according to low, middle and high, and the thresholds are 30/45/70 respectively. When both the heart rate and the stress have warnings, they are superimposed on the target transcription text. The priority of the tension value warning of the segment is greater than the heart rate warning. When both of them generate an early warning, only the tension value warning can be superimposed according to the settings.

需要说明的是，每满足一次预警条件时产生一次预警，满足心率预警的预警条件为，每秒发送1条心率数据，连续设定数量个采集时刻都大于阈值，称之为一次满足预警条件，反之，则不满预警条件。预警的中心时间为预警时间。紧张值的预警是连续设定数量个采集时刻的紧张值大于对应级别阈值。在其他实施例中，也可以设置为连续第一设定数量中有第二设定数量个及以上大于对应级别阈值，则判定为满足当前报警级别的预警条件。如，当紧张值预警按照低中高分三个级别，阈值分别为30/45/70时，第一设定数量为5，第二设定数量为3，经过采集得到连续5个点的紧张值分别为32/33/49/31/46，则判定为轻度预警，预警值取平均值，则该级别为轻度预警，预警时间取紧张值产生数据49的时的时间。It should be noted that each time an early warning condition is met, an early warning is generated. The early warning condition for satisfying the heart rate warning is that one piece of heart rate data is sent every second, and the set number of consecutive collection times are greater than the threshold, which is called once the warning condition is met. On the contrary, the pre-warning condition is not satisfied. The center time of the warning is the warning time. The early warning of the tension value is that the tension value of a set number of consecutive collection moments is greater than the corresponding level threshold. In other embodiments, it can also be set that if the second set number or more of the first set number in a row is greater than the corresponding level threshold, it is determined that the early warning condition of the current alarm level is satisfied. For example, when the stress warning is divided into three levels according to low, medium and high, and the thresholds are 30/45/70, the first set number is 5, the second set number is 3, and the stress value of 5 consecutive points is obtained after collection. If they are 32/33/49/31/46 respectively, it is judged to be a mild early warning, the average early warning value is taken, then the level is a mild early warning, and the early warning time is the time when the tension value generates data 49.

其中，在预警时还会进一步判断预警时间是否落在答话人回答语音所对应的转写文本的时间区间内，如果落在时间区间内则该段转写文本段及为一次预警事件，如预警时间是16:19:20，答话人回答语音所对应的转写文本的时间区间是16:19:13—16:19:25，则系统判断该段话产生一条预警事件，预警类型及对应预警参数值均可以叠加到该转写文本。Among them, in the early warning, it will further judge whether the warning time falls within the time interval of the transcribed text corresponding to the answerer's answering voice, and if it falls within the time interval, the transcribed text segment is an early warning event, such as an early warning. The time is 16:19:20, and the time interval of the transcribed text corresponding to the answerer’s answer is 16:19:13—16:19:25, then the system judges that this paragraph generates an alert event, alert type and corresponding alert Parameter values can be superimposed on the transcribed text.

其中，需要说明的是，除参照答话人回答语音所对应的转写文本的时间区间外，还可以根据上述三个分段确定目标转写文本。如只是上述①中所述的角色切换时：只需按照转写文本的时间区间进行匹配判断。而当是②中所述的存在预设停顿：当在停顿期间产生的预警，预警的叠加依然归类到下个转写文本上，预警时间以实际预警时间点为准。又或者，当转写文本内容大于预设字数时：当预警的实际范围正好处于预设字数进行分段的临界处时，此条预警归类到下个段落的转写文本上。It should be noted that, in addition to referring to the time interval of the transcribed text corresponding to the answering voice of the respondent, the target transcribed text may also be determined according to the above three segments. For example, when the roles are switched as described in ① above: only need to judge the matching according to the time interval of the transcribed text. When there is a preset pause as described in ②: when an early warning is generated during the pause, the superimposition of the early warning is still classified into the next transcribed text, and the early warning time is subject to the actual early warning time point. Or, when the content of the transcribed text is larger than the preset number of words: when the actual range of the warning is just at the critical point of segmenting by the preset number of words, the warning is classified into the transliterated text of the next paragraph.

其中，如上所述，对于一个转写文本对应的时间范围内产生多个报警事件时，最终按照设定的优先级进行叠加显示。如可以根据设定，优先将紧张值叠加至目标转写文本段，紧张预警级别按照高中低的优先级顺序叠加，对于同一级别紧张值，叠加预警数值较大的数据。Wherein, as described above, when multiple alarm events are generated within a time range corresponding to one transcribed text, they are finally superimposed and displayed according to the set priority. For example, according to the settings, the stress value will be superimposed on the target transcribed text segment first, and the stress warning level will be superimposed in the order of high, middle and low priority. For the same level of stress value, the data with the larger warning value will be superimposed.

请参见图7，图7为本申请一种语音识别设备一实施例结构示意图。Please refer to FIG. 7 , which is a schematic structural diagram of an embodiment of a speech recognition device of the present application.

在当前实施例中，本申请所提供的语音识别设备700包括处理器701以及与处理器701耦接的存储器702和通信接口703，即存储器702和通信接口703分别与处理器701连接。其中语音识别设备700可以执行图1至图6及其对应的任意一个实施例中的方法。In the current embodiment, the speech recognition device 700 provided by the present application includes a processor 701 , a memory 702 and a communication interface 703 coupled to the processor 701 , that is, the memory 702 and the communication interface 703 are respectively connected to the processor 701 . The speech recognition device 700 may execute the method in any one of FIGS. 1 to 6 and its corresponding embodiments.

其中，处理器701分别与存储器702和通信接口703连接。The processor 701 is connected to the memory 702 and the communication interface 703 respectively.

通信接口703用于在处理器701的控制下与外部其他电子设备进行通信，以进行生理参数和指令的传输。其中，其他电子设备至少包括：用于采集待处理音频的外部模组。The communication interface 703 is used to communicate with other external electronic devices under the control of the processor 701 to transmit physiological parameters and instructions. Wherein, other electronic devices at least include: an external module for collecting audio to be processed.

存储器702包括本地储存，且存储有计算机程序，计算机程序被处理器701执行时可实现如上的任意一个实施例的方法。The memory 702 includes local storage and stores a computer program. When the computer program is executed by the processor 701, the method of any one of the above embodiments can be implemented.

处理器701用于运行存储器702存储的计算机程序，以执行图1至图6及其对应的任意一个实施例的方法。The processor 701 is configured to run the computer program stored in the memory 702 to execute the method of any one of the embodiments of FIG. 1 to FIG. 6 and its corresponding embodiments.

参见图8，图8为本申请一种计算机可读存储介质一实施例结构示意图。该计算机可读存储介质800存储有能够被处理器运行的计算机程序801，该计算机程序801用于实现如上图1至图6及其对应的任意一个实施例中所描述的语音识别方法。具体地，上述计算机可读存储介质800可以是存储器、个人计算机、服务器、网络设备，或者U盘等其中的一种，具体在此不做任何限定。Referring to FIG. 8, FIG. 8 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application. The computer-readable storage medium 800 stores a computer program 801 that can be executed by the processor, and the computer program 801 is used to implement the speech recognition method described in any one of the above embodiments in FIG. 1 to FIG. 6 and its corresponding embodiments. Specifically, the above-mentioned computer-readable storage medium 800 may be one of a memory, a personal computer, a server, a network device, or a USB flash drive, which is not specifically limited herein.

以上所述仅为本申请的实施方式，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利保护范围内。The above description is only an embodiment of the present application, and is not intended to limit the scope of the patent of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied to other related technologies Fields are similarly included within the scope of patent protection of this application.

Claims

1. A method of speech recognition, the method comprising:

acquiring audio to be processed generated by at least one target person and physiological parameters of the target person;

performing voice transcription on the audio to be processed to obtain a voice transcription result;

and displaying the voice transcription result, and prompting the physiological parameters of the target person under the condition that the physiological parameters of the target person meet early warning conditions.

2. The method according to claim 1, wherein the voice transcribing the audio to be processed to obtain a voice transcription result comprises:

and segmenting the audio to be processed, and performing voice transcription on a plurality of segments of audio to be processed obtained by segmentation to obtain a plurality of transcribed text segments to be used as the voice transcription result.

3. The method of claim 2, wherein the segmenting the audio to be processed comprises at least one of:

segmenting audio belonging to different characters in the audio to be processed;

if a preset pause exists in a first audio belonging to the same character in the audio to be processed, segmenting the first audio according to the preset pause; wherein the preset pause is a voice pause exceeding a preset time;

and if the word number of a second audio belonging to the same figure in the audio to be processed is greater than a preset word number, segmenting the second audio.

4. The method of claim 3, wherein the audio to be processed is captured using a microphone array; prior to the segmenting the audio to be processed, the method further comprises:

determining the directions of the parts of the audio to be processed based on the microphone array;

and determining the character to which each part of the audio belongs based on the position of each part of the audio.

5. The method of claim 1, wherein the voice transcription result comprises a plurality of transcription text segments, and the plurality of transcription text segments are obtained by transcription of a plurality of partial audios in the audio to be processed;

the displaying the voice transcription result comprises:

displaying the plurality of transcribed text segments;

under the condition that the physiological parameters of the target person meet the early warning condition, prompting the physiological parameters of the target person comprises the following steps:

determining early warning time based on the acquisition time of the physiological parameters under the condition that the physiological parameters meet early warning conditions, and determining early warning parameter values based on the values of the physiological parameters;

determining the transcription text segment matched with the early warning time as a target transcription text segment;

and displaying the early warning parameter value in a preset area range of the target transcription text segment.

6. The method of claim 5, wherein determining the transcribed text segment that matches the pre-warning time as a target transcribed text segment comprises:

if the transfer text segment with the time range covering the early warning time exists, taking the existing transfer text segment as the target transfer text segment;

and if the transcribed text segment of which the time range covers the early warning time does not exist, taking the transcribed text segment which is positioned after the early warning time and is closest to the early warning time as the target transcribed text segment.

7. The method according to claim 5, wherein the early warning condition is that the values acquired by the physiological parameters at a plurality of acquisition moments meet a preset requirement; the determining of the early warning time based on the acquisition time of the physiological parameter comprises:

taking the central time of the multiple acquisition moments as the early warning time;

the determining an early warning parameter value based on the value of the physiological parameter includes:

taking the average value of the values of the physiological parameters acquired at the plurality of acquisition moments as the early warning parameter value;

and/or the target person has multiple physiological parameters, wherein some types of the physiological parameters meet the early warning condition, and some types of the physiological parameters do not meet the early warning condition; the displaying the early warning parameter value in the preset area range of the target transcription text segment includes:

displaying the early warning parameter value corresponding to the physiological parameter meeting the early warning condition in a preset area range of the target transcription text segment; or,

determining that the physiological parameter meeting the early warning condition has higher priority than the physiological parameter not meeting the early warning condition; and sequencing the early warning parameter values corresponding to each physiological parameter according to the priority to form a parameter set to be displayed, and displaying the parameter set to be displayed in a preset area range of the target transcription text segment.

8. The method of claim 1, wherein the physiological parameter comprises a stress value, the warning condition comprises that the stress value meets a preset requirement, and the voice transcription result comprises a plurality of transcription text segments which are obtained by transcribing a plurality of parts of audio in the audio to be processed;

the method further comprises the following steps that after the voice transcription result is displayed and the physiological parameters of the target person are prompted under the condition that the physiological parameters of the target person meet early warning conditions, the method further comprises the following steps:

taking a first transcription text segment corresponding to the tension value meeting a preset requirement, a second transcription text segment associated with the first transcription text segment and an early warning level corresponding to the tension value as a group of early warning information;

and generating an early warning report based on the early warning information.

9. A speech recognition device comprising a processor and a memory and a communication interface coupled to the processor; wherein,

the communication interface is used for communicating with other electronic equipment;

the memory is used for storing a computer program;

the processor is configured to run the computer program to perform the method of any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that it stores a computer program executable by a processor for implementing the method of any one of claims 1 to 9.