CN112435512A

CN112435512A - Voice behavior assessment and evaluation method for rail transit simulation training

Info

Publication number: CN112435512A
Application number: CN202011261988.9A
Authority: CN
Inventors: 张文英; 赵捷; 黄成周; 李跃宗; 冉鹏雯; 章磊
Original assignee: Zhengzhou University; Chengdu Yunda Technology Co Ltd
Current assignee: Zhengzhou University; Chengdu Yunda Technology Co Ltd
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2021-03-02
Anticipated expiration: 2040-11-12
Also published as: CN112435512B

Abstract

The invention discloses a voice behavior assessment and evaluation method for rail transit simulation training, which relates to the technical field of rail transit simulation training and has the technical scheme that: converting the voice communication information into character information; extracting all keywords in the whole sentence, and classifying; automatically recognizing voice communication information through a deep neural network to obtain standard voice data, and matching the standard voice data with a voice evaluation database to obtain uniquely matched standard evaluation data; comparing and analyzing the standard voice data and the standard evaluation data, and then judging whether the semantics of the voice communication information and the trigger opportunity are correct; and performing comprehensive calculation according to the comprehensive membership function to obtain the autonomous evaluation score of the voice communication information. The invention can reliably, reasonably, accurately and quickly carry out autonomous and objective voice examination and evaluation on the voice communication information in the trainee training process, and the trainee can directly know whether the voice instruction is correct and standard or not through the evaluation score.

Description

Voice behavior assessment and evaluation method for rail transit simulation training

Technical Field

The invention relates to the technical field of rail transit transportation simulation training, in particular to a voice behavior assessment and evaluation method for rail transit transportation simulation training.

Background

The rail transportation has become one of the most main transportation modes in China, has become an artery for economic development in China, and is deeply loved by the masses of the vast society due to the unique advantages of safety, large transportation volume, rapidness, punctuality, comfort and the like.

Along with the development of rail transit transportation, the demand of more and more posts on talents is continuously increased, and the success of the current talent training mode is improved compared with the traditional talent culture mode. In the traditional talent training process, whether the language instruction of the student is correct and standard is mainly evaluated manually, the evaluation result is too subjective and random, the evaluation result is easily influenced by professional knowledge of a judge, and meanwhile, the student cannot know whether the voice instruction is correct and standard in time. At present, after a part of simulation systems perform language recognition on a required voice instruction, main keywords in voice information are extracted to serve as main factors of recognition evaluation, but the method is easily influenced by environmental noise and by the direct comparison result of the keywords, so that the precision span of an evaluation result is large, the error of the evaluation result is large, and high-precision evaluation cannot be realized; in addition, the mode of pushing real-time voice to the instructor for evaluation occupies a large amount of bandwidth, increases the workload of the instructor, and is poor in flexibility.

Therefore, how to research and design a voice behavior assessment and evaluation method for rail transit simulation training is a problem which is urgently needed to be solved at present.

Disclosure of Invention

The invention aims to solve the problems that the existing rail transit full-professional training simulation system cannot objectively and accurately evaluate the voice of students in the training process of each professional student and the students cannot know whether the voice instruction is correct and standard, and provides a voice behavior assessment and evaluation method for rail transit simulation training.

The technical purpose of the invention is realized by the following technical scheme: a voice behavior assessment and evaluation method for rail transit simulation training comprises the following steps:

s101: recognizing the voice communication information of the trainees in the training process by an intelligent voice recognition technology and converting the voice communication information into character information;

s102: judging and extracting all keywords in the whole sentence through a keyword identification technology, and classifying the extracted keywords according to semantic correlation, equipment correlation and professional term correlation in sequence;

s103: automatically recognizing voice communication information through a deep neural network to obtain standard voice data, and matching the standard voice data with a voice evaluation database to obtain uniquely matched standard evaluation data;

s104: comparing and analyzing the standard voice data and the standard evaluation data, and then judging whether the semantics of the voice communication information and the trigger opportunity are correct;

s105: if the semantics and the trigger time judgment are correct, performing incidence relation comparison analysis on keywords extracted from the voice communication information and standard keywords in the standard evaluation data, and obtaining keyword membership functions related to semantics, equipment and professional terms through a fuzzy control function;

s106: and establishing a comprehensive membership function according to the semantic related keyword membership function, the equipment related keyword membership function and the weight coefficient of the professional term related keyword membership function, and performing comprehensive calculation according to the comprehensive membership function to obtain the autonomous evaluation score of the voice communication information.

Further, the voice communication information identification and conversion specifically includes: a TTS engine, an SAPI interface and a Win32API interface in a Windows Speech SDK development kit are utilized to establish an application program unit for converting the voice into the text under an MFC framework, and the voice communication information is automatically converted into the text information after being input into the application program unit.

Further, the keyword recognition and extraction specifically comprises:

comparing the voice communication information with the keyword database one by one to respectively obtain semantic related keywords, equipment related keywords and professional term related keywords contained in the voice communication information;

the keyword database is constructed by reading keywords related to semantics, equipment and professional terms in the voice evaluation database, and the structure specifically comprises the following steps:

the device related keywords are used for identifying the accuracy of describing device information among students, the semantic related keywords are used for identifying semantics of describing between the students to judge, and the professional term keywords are used for identifying the specialty of describing professional information among the students.

Further, the standard voice data recognition specifically includes:

converting voice communication information into numbers by GBK coding and converting each number into an input matrix x_tOne element x in (1)_iTo obtain an input matrix x_t；

The current hidden layer value is obtained by calculating the voice communication information at the previous moment and the hidden value at the previous moment, which is specifically as follows:

wherein matrix U represents input matrix x_tThe dimension of the weight coefficient matrix is n m, and the numerical value is switched according to different work types; s represents a hidden layer value vector with a dimension of n; w represents a weight coefficient matrix of the hidden layer value, and the dimension is n m;

calculating through an output equation to obtain an output function matrix, specifically:

O_t＝g(Vs_t)*ξ

wherein, O_tRepresenting a matrix of output functions; g is an algorithm; v represents a hidden layer weight coefficient matrix; s_tRepresenting a current hidden layer value; xi represents a trigger time judgment coefficient, and if the trigger time is correct, xi is assigned to be 1; when the triggering time is wrong, xi is assigned to be 0;

matrix O_tGBK codes representing standard voice signals are converted into Chinese characters through the GBK codes, and standard voice data of voice communication information can be obtained.

Further, the voice evaluation database specifically includes:

the voice evaluation database comprises all standard expressions used by all professions, i pieces of voice data are shared, and the corresponding voice evaluation database has the following structure:

matching the standard voice data with a voice evaluation database to obtain uniquely matched standard semantics and trigger time i_x，i_xThe method comprises the following steps of representing a standard voice data signal, and carrying out autonomous and objective evaluation by taking the standard voice data signal as an input signal, wherein the specific structure is as follows:

[ number i_xSemantic relevance i_xDevice dependent i_xTerm of expertise correlation i_xStatement i_x]

The serial number is used for managing the serial number of each piece of data in the voice evaluation database; semantic correlation, equipment correlation and professional term correlation are used as key factors for autonomous and objective evaluation; the sentence represents a standard description method of the sentence as a key factor for standard voice data input matching.

Further, the trigger timing determination specifically includes:

judging the voice signal triggered by the student according to the system state feedback signal and the current environment, and judging to obtain a sentence membership function through the triggering time, wherein the method specifically comprises the following steps:

wherein a represents that the trigger time is subordinate to a critical value; when the triggering time of the voice signal is correct, x is larger than a; when the trigger time of the voice signal is wrong, x is less than or equal to a.

Further, the semantic related keyword membership function specifically includes:

Y_DC(y，b，c)＝c/(|y-b|+c)

b represents the semantic optimal membership value, the representative semantic judgment is completely consistent with the voice evaluation database, the value is the number of semantic related keywords in the voice evaluation database, and c is obtained by calculating the system characteristics and is 0.4 b; with simple expression or complex expression of semantics, the membership value of the semantic expression is reduced; when the membership value reaches b-c/b + c, the membership of the membership function drops to 1/3 where the highest membership is reached.

Further, the device-related keyword membership function specifically includes:

wherein d represents the equipment related keyword membership saturation point, and the value of the equipment membership function is 1; d-e represents a relevant point of equipment, and e is 0.6d calculated according to the system characteristics; when z ∈ (0, d-e), it indicates that the input voice signal does not use the device-related keyword; when z ∈ (d-e, d), it means that the input voice signal gradually increases using the device-related keyword; when z ∈ (d, + ∞), it means that the input speech signal uses other related device keywords in addition to all the device related keywords of the standard sentence.

Further, the term-of-expertise related keyword membership function is specifically:

wherein f represents that the term expression is completely accurate, the membership degree is 1, the membership degree of the term is continuously reduced along with the professional term expression reduction, and g is 0.5f calculated according to the system characteristics.

Further, the comprehensive membership function is specifically:

U＝(εY_DC+λY_SC+τY_RT)×100

wherein, U represents the independent and objective evaluation score of the student voice recognition and represents the voice standard level of the student in the training process; epsilon, lambda and tau are respectively the weight coefficients of the semantic related keywords, the equipment related keywords and the professional term related keywords; ε ∈ [0, 1], λ ∈ [0, 1], τ ∈ [0, 1], and ε + λ + τ ═ 1.

Compared with the prior art, the invention has the following beneficial effects: the invention can obtain completely standard voice data by matching the deep neural network technology with the voice evaluation database. Then, by carrying out keyword recognition technology on input voice and comparing with a standard voice database, standard and objective voice evaluation can be carried out on a student, manual evaluation is avoided, the objectivity of the system is increased, the workload of a teacher is reduced, and the system performance is improved by directly comparing with the voice evaluation database without occupying bandwidth; according to the invention, the comparison result is evaluated and calculated and then converted into the percentile score, so that the normative degree and the accuracy degree of the voice instruction of the student can be accurately and visually displayed, the voice assessment and evaluation of the independent and objective voice communication information in the training process of each professional student in the rail transit full-professional training simulation system can be reliably, reasonably, accurately and quickly carried out, and the student can directly know whether the voice instruction is correct and normative through the independent objective evaluation score.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is an overall flow chart in an embodiment of the present invention;

FIG. 2 is a diagram illustrating a semantic and trigger timing determination function according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating semantic dependent membership functions in an embodiment of the present invention;

FIG. 4 is a schematic diagram of device dependent membership functions in an embodiment of the present invention;

FIG. 5 is a diagram illustrating term dependent membership functions in an embodiment of the present invention;

FIG. 6 is a schematic diagram of a deep neural network in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the following examples and accompanying fig. 1-6, wherein the exemplary embodiments and descriptions of the present invention are only used for explaining the present invention and are not to be construed as limiting the present invention.

Example 1

A voice behavior assessment and evaluation method for rail transit simulation training is shown in figure 1 and comprises the following steps:

The voice communication information identification and conversion specifically comprises the following steps: a TTS engine, an SAPI interface and a Win32API interface in a Windows Speech SDK development kit are utilized to establish an application program unit for converting the voice into the text under an MFC framework, and the voice communication information is automatically converted into the text information after being input into the application program unit.

The keyword identification and extraction specifically comprises the following steps: comparing the voice communication information with the keyword database one by one to respectively obtain semantic related keywords, equipment related keywords and professional term related keywords contained in the voice communication information; the keyword database is constructed by reading keywords related to semantics, equipment and professional terms in the voice evaluation database, and the structure specifically comprises the following steps:

As shown in FIG. 6, standard speech data recognitionThe method specifically comprises the following steps: converting voice communication information into numbers by GBK coding and converting each number into an input matrix x_tOne element x in (1)_iTo obtain an input matrix x_t(ii) a The current hidden layer value is obtained by calculating the voice communication information at the previous moment and the hidden value at the previous moment, which is specifically as follows:

wherein matrix U represents input matrix x_tThe dimension of the weight coefficient matrix is n m, and the numerical value is switched according to different work types; s represents a hidden layer value vector with a dimension of n; w represents a weight coefficient matrix of hidden layer values with dimension n m.

O_t＝g(Vs_t)*ξ

wherein, O_tRepresenting a matrix of output functions; g is an algorithm; v represents a hidden layer weight coefficient matrix; s_tRepresenting a current hidden layer value; xi represents a trigger time judgment coefficient, and if the trigger time is correct, xi is assigned to be 1; when the triggering time is wrong, xi is assigned to be 0; matrix O_tGBK codes representing standard voice signals are converted into Chinese characters through the GBK codes, and standard voice data of voice communication information can be obtained.

The voice evaluation database specifically comprises: the voice evaluation database comprises all standard expressions used by all professions, i pieces of voice data are shared, and the corresponding voice evaluation database has the following structure:

matching the standard voice data with a voice evaluation database to obtain uniquely matched standard semantics and trigger time i_x，i_xRepresenting a standard speech data signal, using the standard speech data signal as an input signalThe self-contained and objective evaluation is carried out, and the specific structure is as follows:

As shown in fig. 2, the trigger timing determination specifically includes: judging the voice signal triggered by the student according to the system state feedback signal and the current environment, and judging to obtain a sentence membership function through the triggering time, wherein the method specifically comprises the following steps:

As shown in fig. 3, the semantic related keyword membership function specifically includes:

Y_DC(y，b，c)＝c/(|y-b|+c)

As shown in fig. 4, the device-related keyword membership function specifically includes:

As shown in fig. 5, the term-of-expertise related keyword membership functions are specifically:

The comprehensive membership function is specifically:

U＝(εY_DC+λY_SC+τY_RT)×100

Example 2

The voice communication information is used as an input signal for explanation, wherein the voice communication information is used as an input signal for the situation that the locomotive slightly moves, the allowed speed is 5 kilometers per hour, and the locomotive moves to the front of an outbound signal machine. Suppose the speech assessment database speech is "locomotive allowed speed 5 km/h, before arriving at the outbound signal". After the system recognizes the voice signal, firstly, the voice information is converted into the text information through an intelligent voice recognition program developed by a TTS engine in a Windows Speech SDK development kit.

The method detects whether the input signal contains the keywords or not through a standard keyword database recognition system with large vocabulary, and the method has high response speed.

The standard keyword database is actually data in a read standard voice database, and is mainly used for reading related keywords of semantics, equipment and professional terms in the standard voice database to form a new keyword database, wherein the structure of the keyword database is specifically as follows:

by comparing with the keyword database, it can be known that there are 4 semantic related keywords, 2 device related keywords, and 2 professional term related keywords in the input signal.

The trigger time judgment is mainly carried out through the analysis of the state of the system and the current environment, and the voice signal triggered by the student is judged. Since the trigger timing is correct in this example, Y_SEN(x，a)＝1。

The output signal matrix is obtained through deep neural network operation, and standard output voice signals are obtained through GBK Chinese character conversion, wherein the standard output voice signals are 'the speed of the locomotive is allowed to be 5 kilometers per hour, and the standard output voice signals are transmitted to the front of an outbound signal machine'.

By comparing with the speech evaluation database Q, the speech evaluation database entry corresponding to both the standard semantic meaning and the trigger time can be matched, and if the number of the speech evaluation database entry is 11253, the structure thereof is:

[11253 allowable speed of kilometer/hour locomotive, speed of outbound signal speed of 5 kilometer/hour locomotive, traveling to the front of outbound signal ]

By comparing the input signal with the voice database, 3 semantically related keywords of the voice evaluation database, 2 equipment related keywords and 1 professional term related keyword can be known.

Membership function Y for semantically related keywords_DCSince there are 3 semantic related keywords, b is 3, and it can be known from system characteristic calculation, and c is 1.2, specifically: y is_DC(Y, b, c) ═ 1.2/(| Y-3| +1.2), and Y was calculated_DC＝0.55。

Device-related keyword membership function Y_SCSince the number of device-related keywords is 2, d is 2, and e is 1.2 calculated from the system characteristics, it is possible to obtain

Further calculate Y_SC＝1。

Term-of-expertise related keyword membership function Y_RTSince the related keyword of the term of expertise is 1, f is 1, and g is 1, the term of expertise can be obtained

Can know Y_RT＝0。

Performing integration analysis on the three membership functions, establishing a new comprehensive membership function according to the weight ratio of each factor, and performing comprehensive calculation according to the comprehensive membership function to obtain the independent evaluation score of the voice information of the learner; for different scenes, the three factors have different weight coefficients, and the weight coefficients of the factor membership functions are represented by epsilon, lambda and T. By analyzing the scene, epsilon is 0.5, lambda is 0.3, and T is 0.2, and the comprehensive membership function is defined as follows:

U＝(εY_DC+λY_SC+τY_RT)×100＝(0.5×0.55+0.3×1+0.2×0)×100＝57.5

therefore, the voice recognition of the student can independently and objectively evaluate and assess the score of 57.5.

The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A voice behavior assessment and evaluation method for rail transit simulation training is characterized by comprising the following steps:

2. The method for assessing and evaluating the voice behavior of the rail transit simulation training as claimed in claim 1, wherein the voice communication information recognition and conversion specifically comprises: a TTS engine, an SAPI interface and a Win32API interface in a Windows Speech SDK development kit are utilized to establish an application program unit for converting the voice into the text under an MFC framework, and the voice communication information is automatically converted into the text information after being input into the application program unit.

3. The method for evaluating the voice behavior assessment of the rail transit simulation training as claimed in claim 1, wherein the keyword recognition and extraction specifically comprises:

4. The method for assessing and evaluating the voice behavior of the rail transit simulation training as claimed in claim 1, wherein the standard voice data recognition specifically comprises:

wherein matrix U represents input matrix x_tThe dimension of the weight coefficient matrix is n m, and the weight coefficient matrix is switched according to different work types of trainingA numerical value; s represents a hidden layer value vector with a dimension of n; w represents a weight coefficient matrix of the hidden layer value, and the dimension is n m;

O_t＝g(Vs_t)*ξ

5. The method for assessing and evaluating the voice behavior of the rail transit simulation training as claimed in claim 1, wherein the voice evaluation database specifically comprises:

6. The method for assessing and evaluating the voice behavior of the rail transit simulation training as claimed in claim 1, wherein the triggering time judgment specifically comprises:

7. The voice behavior assessment evaluation method for rail transit transportation simulation training according to any one of claims 1 to 6, wherein the semantic related keyword membership function is specifically:

YDC(y，b，c)＝c/(|y-b|+c)

8. The method for assessing and evaluating the voice behavior of the rail transit simulation training as claimed in claim 7, wherein the device-related keyword membership functions are specifically:

9. The method for assessing and evaluating the voice behavior of the rail transit simulation training as claimed in claim 8, wherein the membership function of the related keywords of the professional terms is specifically as follows:

10. The method for assessing and evaluating the voice behavior of the rail transit simulation training as claimed in claim 9, wherein the comprehensive membership function is specifically:

U＝(εY_DC+λY_SC+τY_RT)×100

wherein, U represents the independent and objective evaluation score of the student voice recognition and represents the standard level of the voice of the student in the training process; epsilon, lambda and T are respectively the weight coefficients of the semantic related keywords, the equipment related keywords and the professional term related keywords; ε ∈ [0, 1], λ ∈ [0, 1], τ ∈ [0, 1], and ε + λ + τ ═ 1.