CN117423260A

CN117423260A - Auxiliary teaching method based on classroom speech recognition and related equipment

Info

Publication number: CN117423260A
Application number: CN202311748488.1A
Authority: CN
Inventors: 郝磊
Original assignee: Hangzhou Smart Ear Technology Co ltd
Current assignee: Hangzhou Smart Ear Technology Co ltd
Priority date: 2023-12-19
Filing date: 2023-12-19
Publication date: 2024-01-19
Anticipated expiration: 2043-12-19
Also published as: CN117423260B

Abstract

The invention relates to the technical field of voice recognition, and discloses an auxiliary teaching method based on classroom voice recognition and related equipment. The method comprises the following steps: acquiring teaching audio data of a teaching corresponding to a target class, and extracting at least one type of audio marking information in the teaching audio data based on a preset audio storage strategy; extracting a plurality of audio accent features of the teaching audio data, and determining a target lecturer corresponding to the lecture in the target classroom based on the audio accent features; extracting teaching voice fragments corresponding to the teaching audio data based on the target teaching personnel, and performing language conversion on the teaching voice fragments based on the audio mark information to obtain target teaching information; based on the target teaching information, control instruction information and auxiliary teaching information of preset teaching equipment are generated, and an auxiliary teaching result is obtained. The method and the device improve the accuracy of classroom voice recognition and conversion of relevant development class types.

Description

Auxiliary teaching method based on classroom speech recognition and related equipment

Technical Field

The invention relates to the technical field of voice recognition, in particular to an auxiliary teaching method based on classroom voice recognition and related equipment.

Background

Along with the progress of science and technology and the development of teaching modernization, the amount of knowledge and knowledge surface required to be learned by students are more and more, and the requirement of classroom teaching is met by adopting an online and offline classroom mode. In order to meet the teaching needs of part of the class, effective teaching can be realized by adopting modes such as foreign language teachers or other nations for connecting the class, and the dilemma that part of schools do not have relevant lessons and teachers is met. However, as the bases of students in different areas are different, and the control modes of teaching equipment in different areas are different, the teaching voices of relevant teachers and the remote operation instructions of the classroom equipment are required to be subjected to voice recognition and conversion so as to complete the classroom teaching of relevant courses.

Currently, relevant online and offline classroom teaching provides students with understanding teaching contents of teaching teachers by configuring corresponding classroom translation teachers or implementing online translation captions. However, this way of teaching voice recognition and conversion sometimes translates insufficiently in time and translates proper nouns between different courses, and because of the insufficient accuracy of translation, the teaching teacher has errors in controlling proper classroom devices. I.e. the accuracy of the existing class speech recognition and conversion of the relevant class type is low.

Disclosure of Invention

The invention mainly aims to solve the problem that the accuracy of the existing class voice recognition and conversion of the related development class type is low.

The first aspect of the invention provides an auxiliary teaching method based on classroom speech recognition, which comprises the following steps: acquiring teaching audio data of a teaching corresponding to a target class, and extracting at least one type of audio marking information in the teaching audio data based on a preset audio storage strategy; extracting a plurality of audio accent features of the teaching audio data, and determining a target lecturer corresponding to the lecture in the target classroom based on the audio accent features; extracting teaching voice fragments corresponding to the teaching audio data based on the target teaching personnel, and performing language conversion on the teaching voice fragments based on the audio marking information to obtain target teaching information; and generating control instruction information and auxiliary teaching information of preset teaching equipment based on the target teaching information to obtain an auxiliary teaching result.

Optionally, in a first implementation manner of the first aspect of the present invention, the audio marking information includes longitude and latitude marking information, time marking information and language family marking information, and the extracting at least one audio marking information in the teaching audio data based on a preset audio storage policy includes: based on a preset audio storage strategy, extracting classroom position information and time mark information corresponding to the teaching audio data; calculating the right moment and the right degree of the right moment of the target classroom based on the classroom position information, and generating the longitude and latitude marking information of the teaching audio clip based on the right moment and the right degree of the right moment; and determining a language system distribution area of the target class based on the audio position information, and generating language system marking information of the teaching audio data based on the language system distribution area.

Optionally, in a second implementation manner of the first aspect of the present invention, the extracting a plurality of audio accent features of the teaching audio data includes: preprocessing the teaching audio data, and carrying out multi-frame windowing calculation on the preprocessed teaching audio data based on a preset audio frame number to obtain windowed teaching audio data; performing time-frequency conversion on the windowed teaching audio data, and performing frequency spectrum filtering and coefficient operation on the time-frequency converted teaching audio data based on the human voice perception frequency to obtain teaching audio coefficients; and carrying out cepstrum transformation on the teaching audio coefficient, and selecting a plurality of audio accent features meeting the preset frequency spectrum orders from the cepstrum transformation result.

Optionally, in a third implementation manner of the first aspect of the present invention, the determining, based on the audio accent feature, a target lecturer corresponding to the lecture in the target classroom includes: respectively calculating the feature similarity between the audio accent features by adopting a preset voice recognition model; and selecting feature similarity larger than a preset similarity threshold, and determining a corresponding number of target lecturer in the target class based on the selected result.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the target lecturer includes a first target lecturer and a second target lecturer, the lecture voice segment includes a first lecture voice segment and a second lecture voice segment, and the extracting, based on the target lecturer, the lecture voice segment corresponding to the teaching audio data includes: performing audio cutting on the teaching audio data based on the target teaching personnel to obtain a plurality of audio cutting fragments, and identifying audio endpoints of each audio cutting fragment; based on the audio endpoint, the audio cutting segment is subjected to personnel secondary segmentation, a first teaching personnel marking is carried out on the secondary segmentation result to obtain a first teaching voice segment corresponding to a first target teaching personnel, and a second teaching personnel marking is carried out on the secondary segmentation result based on equipment authority to obtain a second teaching voice segment corresponding to a second target teaching personnel.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the performing language conversion on the lecture speech segment based on the audio markup information to obtain target teaching information includes: based on the language family mark information, determining at least one teaching mark instruction in the teaching voice fragment, and judging whether teaching information corresponding to the teaching mark instruction belongs to a preset course naming system or not; if the teaching information corresponding to the teaching marking instruction belongs to a preset course naming system, taking the teaching marking instruction as target teaching information according to the course naming system; if the teaching information corresponding to the teaching marking instruction does not belong to a preset course naming system, carrying out naming language conversion on the teaching marking instruction according to the course naming system to obtain target teaching information.

Optionally, in a sixth implementation manner of the first aspect of the present invention, generating control instruction information and auxiliary teaching information of a preset teaching device based on the target teaching information, to obtain an auxiliary teaching result includes: determining a target operation object of a corresponding equipment type and initial state information of preset teaching equipment in the target teaching information; operating parameter calculation is carried out on the longitude and latitude marking information and the time marking information, and control instruction information of the teaching equipment is generated; and generating auxiliary teaching information of the target class in sequence based on the target teaching information and the control instruction information to obtain an auxiliary teaching result.

The second aspect of the present invention provides an auxiliary teaching device based on classroom speech recognition, the auxiliary teaching device based on classroom speech recognition comprising: the mark extraction module is used for acquiring teaching audio data corresponding to a target class and extracting at least one type of audio mark information in the teaching audio data based on a preset audio storage strategy; the personnel determining module is used for extracting a plurality of audio accent characteristics of the teaching audio data and determining target teaching personnel corresponding to teaching in the target class based on the audio accent characteristics; the language conversion module is used for extracting teaching voice fragments corresponding to the teaching audio data based on the target teaching personnel, and carrying out language conversion on the teaching voice fragments based on the audio mark information to obtain target teaching information; the instruction generation module is used for generating control instruction information and auxiliary teaching information of preset teaching equipment based on the target teaching information to obtain an auxiliary teaching result.

Optionally, in a first implementation manner of the second aspect of the present invention, the tag extraction module includes: the first marking unit is used for extracting classroom position information and time marking information corresponding to the teaching audio data based on a preset audio storage strategy; the second marking unit is used for calculating the right moment and the right degree of the right moment of the target class based on the class position information and generating the longitude and latitude marking information of the teaching audio clip based on the right moment and the right degree of the right moment; and the third marking unit is used for determining the language system distribution area of the target class based on the audio frequency position information and generating the language system marking information of the teaching audio frequency data based on the language system distribution area.

Optionally, in a second implementation manner of the second aspect of the present invention, the person determining module includes: the windowing calculation unit is used for preprocessing the teaching audio data, and carrying out multi-frame windowing calculation on the preprocessed teaching audio data based on a preset audio frame number to obtain windowed teaching audio data; the time-frequency conversion unit is used for performing time-frequency conversion on the windowed teaching audio data, and performing frequency spectrum filtering and coefficient operation on the time-frequency converted teaching audio data based on the human voice perception frequency to obtain teaching audio coefficients; and the cepstrum transformation unit is used for performing cepstrum transformation on the teaching audio coefficient and selecting a plurality of audio accent features meeting the preset frequency spectrum orders from the cepstrum transformation result.

Optionally, in a third implementation manner of the second aspect of the present invention, the person determining module further includes: the similarity calculation unit is used for respectively calculating the feature similarity between the audio accent features by adopting a preset voice recognition model; the similarity selecting unit is used for selecting feature similarity larger than a preset similarity threshold value and determining a corresponding number of target teaching staff in the target class based on a selected result.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the language conversion module includes: the audio cutting unit is used for carrying out audio cutting on the teaching audio data based on the target teaching personnel to obtain a plurality of audio cutting fragments, and identifying audio endpoints of each audio cutting fragment; and the personnel marking unit is used for carrying out personnel secondary segmentation on the audio cutting segment based on the audio endpoint, carrying out first teaching personnel marking on the secondary segmentation result to obtain a first teaching voice segment corresponding to a first target teaching personnel, and carrying out second teaching personnel marking on the secondary segmentation result based on the equipment authority to obtain a second teaching voice segment corresponding to a second target teaching personnel.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the language conversion module further includes: the naming judging unit is used for determining at least one teaching marking instruction in the teaching voice fragment based on the language family marking information and judging whether the teaching information corresponding to the teaching marking instruction belongs to a preset course naming system or not; the first conversion unit is used for taking the teaching marking instruction as target teaching information according to a course naming system if the teaching information corresponding to the teaching marking instruction belongs to a preset course naming system; and the second conversion unit is used for carrying out naming language conversion on the teaching marking instruction according to the course naming system if the teaching information corresponding to the teaching marking instruction does not belong to the preset course naming system, so as to obtain target teaching information.

Optionally, in a sixth implementation manner of the second aspect of the present invention, the instruction generating module includes: the state determining unit is used for determining a target operation object of a corresponding equipment type in the target teaching information and initial state information of preset teaching equipment; the parameter calculation unit is used for calculating operation parameters of the longitude and latitude marking information and the time marking information and generating control instruction information of the teaching equipment; and the instruction generation unit is used for sequentially generating auxiliary teaching information of the target class based on the target teaching information and the control instruction information to obtain an auxiliary teaching result.

A third aspect of the present invention provides an auxiliary teaching apparatus based on classroom speech recognition, comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the classroom speech recognition based auxiliary tutorial device to perform the steps of the classroom speech recognition based auxiliary tutorial method described above.

A fourth aspect of the present invention provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the steps of the above-described class speech recognition based assisted tutorial method.

According to the technical scheme provided by the invention, at least one audio marking information in teaching audio data is extracted by acquiring the teaching audio data corresponding to a target class and based on a preset audio storage strategy; extracting a plurality of audio accent features of the teaching audio data, and determining a target lecturer corresponding to the lecture in the target classroom based on the audio accent features; extracting teaching voice fragments corresponding to the teaching audio data based on the target teaching personnel, and performing language conversion on the teaching voice fragments based on the audio marking information to obtain target teaching information; and generating control instruction information and auxiliary teaching information of preset teaching equipment based on the target teaching information to obtain an auxiliary teaching result. Compared with the prior art, the method and the device have the advantages that the corresponding audio mark information is extracted from teaching audio data, the target teaching personnel corresponding to the teaching audio data is determined, the corresponding teaching voice fragments are converted into teaching languages based on the target teaching personnel and the audio mark information, the target teaching information is obtained, the control instruction information and the auxiliary teaching information corresponding to the target teaching information are generated, the auxiliary teaching result is obtained, and therefore accuracy of class voice recognition and conversion of relevant development classes is improved, and effective teaching control of class relevant equipment is achieved.

Drawings

FIG. 1 is a schematic diagram of a first embodiment of an auxiliary teaching method based on classroom speech recognition in an embodiment of the present invention;

FIG. 2 is a schematic diagram of a second embodiment of an auxiliary teaching method based on classroom speech recognition according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a third embodiment of an auxiliary teaching method based on classroom speech recognition according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an embodiment of an auxiliary teaching device based on classroom speech recognition according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of another embodiment of an auxiliary teaching device based on classroom speech recognition according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an embodiment of an auxiliary teaching apparatus based on classroom speech recognition according to an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides an auxiliary teaching method and related equipment based on classroom speech recognition, wherein the method comprises the following steps: acquiring teaching audio data of a teaching corresponding to a target class, and extracting at least one type of audio marking information in the teaching audio data based on a preset audio storage strategy; extracting a plurality of audio accent features of the teaching audio data, and determining a target lecturer corresponding to the lecture in the target classroom based on the audio accent features; extracting teaching voice fragments corresponding to the teaching audio data based on the target teaching personnel, and performing language conversion on the teaching voice fragments based on the audio mark information to obtain target teaching information; based on the target teaching information, control instruction information and auxiliary teaching information of preset teaching equipment are generated, and an auxiliary teaching result is obtained. The method and the device improve the accuracy of classroom voice recognition and conversion of relevant development class types.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For ease of understanding, a specific flow of an embodiment of the present invention is described below with reference to fig. 1, where a first embodiment of an auxiliary teaching method based on classroom speech recognition in the embodiment of the present invention includes:

101. acquiring teaching audio data of a teaching corresponding to a target classroom, and extracting at least one type of audio marking information in the teaching audio data based on a preset audio storage strategy;

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

In this embodiment, the teaching audio data refers to all relevant data of the online or offline public course recorded in real time or in history; the audio storage strategy refers to a strategy for storing teaching audio data into a related server and adding related identifiers for data storage.

In practical application, the teaching audio data of the current teaching course of the target class is obtained from a corresponding database by being connected with a server of an online or related history course, and class position information and time mark information corresponding to the teaching audio data are extracted based on a preset audio storage strategy; further, calculating the right moment and the right degree of declination of the target class based on the class position information, and generating the longitude and latitude marking information of the teaching audio clip based on the right moment and the right degree of declination; therefore, the language system distribution area of the target class is determined based on the audio frequency position information, and language system marking information of teaching audio frequency data is generated based on the language system distribution area.

102. Extracting a plurality of audio accent features of teaching audio data, and determining a target lecturer corresponding to the lecture in the target classroom based on the audio accent features;

in this embodiment, the audio accent features herein refer to the fact that the shapes and sizes of the lips, the tongue, the vocal cords and other oral organs of each person are different, which results in unique sound features when speaking, and the accent features for identifying the human voice can be obtained by analyzing the features of the spectrum, formants and the like of the oral sound; the target teaching staff refers to a teacher giving lessons in the course, a teacher related to maintenance of order of the course, students communicating with the course teaching, and the like.

In practical application, preprocessing teaching audio data, and carrying out multi-frame windowing calculation on the preprocessed teaching audio data based on a preset audio frame number to obtain windowed teaching audio data; further, carrying out time-frequency conversion on the windowed teaching audio data, and carrying out frequency spectrum filtering and coefficient operation on the time-frequency converted teaching audio data based on the human voice perception frequency to obtain teaching audio coefficients; performing cepstrum transformation on the teaching audio coefficient, and selecting a plurality of audio accent features meeting the preset frequency spectrum orders from the cepstrum transformation result; further, respectively calculating the feature similarity between the audio accent features by adopting a preset voice recognition model; and selecting the feature similarity larger than a preset similarity threshold, and determining a corresponding number of target lecturer in the target class based on the selected result.

103. Based on the target teaching personnel, extracting teaching voice fragments corresponding to teaching audio data, and carrying out language conversion on the teaching voice fragments based on audio mark information to obtain target teaching information;

in this embodiment, the lecture voice segment refers to at least one voice segment corresponding to an effective speech issued by different target lecturer on a current target course; the language conversion refers to converting the teaching voices of different courses into the corresponding languages of the areas where the current courses are located, and converting the professional vocabulary voice fragments in part of the course voices into professional vocabularies of the corresponding language system, such as converting the westerns astronomical language into the zhenjingxing, converting the hawk astronomical language into the hawk-like language (or the nivea).

In practical application, based on a target lecturer, audio cutting is carried out on teaching audio data to obtain a plurality of audio cutting fragments, and audio endpoints of all the audio cutting fragments are identified; performing personnel secondary segmentation on the audio cutting segment based on the audio endpoint, marking a first teaching personnel on the secondary segmentation result to obtain a first teaching voice segment corresponding to a first target teaching personnel, and marking a second teaching personnel on the secondary segmentation result based on the equipment authority to obtain a second teaching voice segment corresponding to a second target teaching personnel; further, based on the language series marking information, determining at least one teaching marking instruction in the teaching voice fragment, and judging whether the teaching information corresponding to the teaching marking instruction belongs to a preset course naming system or not; if the teaching information corresponding to the teaching marking instruction belongs to a preset course naming system, the teaching marking instruction is used as target teaching information according to the course naming system; if the teaching information corresponding to the teaching marking instruction does not belong to the preset course naming system, carrying out naming language conversion on the teaching marking instruction according to the course naming system to obtain the target teaching information.

104. Based on the target teaching information, control instruction information and auxiliary teaching information of preset teaching equipment are generated, and an auxiliary teaching result is obtained.

In this embodiment, the control instruction information refers to a control instruction of teaching equipment required by the classroom, such as azimuth adjustment of astronomical observation glasses by astronomical classroom and positioning adjustment of observation celestial bodies by longitude and latitude during utilization; the auxiliary teaching information refers to the auxiliary teaching information formed by real-time translation of voices of classroom teachers and translation of related courseware.

In practical application, determining a target operation object of a corresponding equipment type in target teaching information and preset initial state information of the teaching equipment; further, operation parameter calculation is carried out on the longitude and latitude marking information and the time marking information, and control instruction information of the teaching equipment is generated; and the auxiliary teaching information of the target class is sequentially generated based on the target teaching information and the control instruction information, so that an auxiliary teaching result is obtained.

According to the embodiment of the invention, at least one type of audio marking information in teaching audio data is extracted by acquiring the teaching audio data corresponding to the teaching in the target class and based on a preset audio storage strategy; extracting a plurality of audio accent features of teaching audio data, and determining a target lecturer corresponding to the lecture in the target classroom based on the audio accent features; based on the target teaching personnel, extracting teaching voice fragments corresponding to teaching audio data, and carrying out language conversion on the teaching voice fragments based on audio mark information to obtain target teaching information; based on the target teaching information, control instruction information and auxiliary teaching information of preset teaching equipment are generated, and an auxiliary teaching result is obtained. Compared with the prior art, the method and the device have the advantages that the corresponding audio mark information is extracted from teaching audio data, the target teaching personnel corresponding to the teaching audio data is determined, the corresponding teaching voice fragments are converted into teaching languages based on the target teaching personnel and the audio mark information, the target teaching information is obtained, the control instruction information and the auxiliary teaching information corresponding to the target teaching information are generated, the auxiliary teaching result is obtained, and therefore accuracy of class voice recognition and conversion of relevant development classes is improved, and effective teaching control of class relevant equipment is achieved.

Referring to fig. 2, a second embodiment of the auxiliary teaching method based on classroom speech recognition according to the present invention includes:

201. based on a preset audio storage strategy, class position information and time mark information corresponding to teaching audio data are extracted;

in this embodiment, the classroom location information refers to longitude and latitude, altitude and other location information of the classroom; the time stamp information refers to corresponding time information when teaching in the current class.

In practical application, after teaching audio data of teaching corresponding to a target class are obtained, class position information and time stamp information corresponding to the teaching audio data are extracted from a play data stream based on a preset audio storage strategy.

202. Calculating the right moment and the right degree of the right moment of the target class based on the class position information, and generating the longitude and latitude marking information of the teaching audio clip based on the right moment and the right degree of the right moment;

in this embodiment, the right-hand moment refers to the time of the accurate area corresponding to the longitude of the current classroom position; the degree of declination refers to latitude information of the current class position, wherein the moment of declination and the degree of declination can be modified according to the requirements of teaching equipment, and parameters required by astronomical glasses in astronomical teaching class are taken as an example for illustration.

In practical application, based on classroom position information, the corresponding moment (accurate to second time) of the current longitude of the target classroom is calculated, so that the accuracy of positioning the target star by the rear astronomical telescope is guaranteed, the target star is always in the center of the target visual field, the corresponding degree of declination is converted, and the longitude and latitude marking information required by adjustment of the astronomical telescope is generated by the combination of the moment of declination and the degree of declination by the Jiner.

203. Determining a language system distribution area of a target class based on the audio position information, and generating language system marking information of teaching audio data based on the language system distribution area;

in this embodiment, the language system distribution area refers to an area (such as chinese, english, arabic, etc.) divided according to the language mainly used in the area, and different specialized languages are also divided into areas in different specialized classes according to teaching requirements.

In practical application, based on the audio frequency position information, determining the main teaching language of the position of the target class, obtaining a corresponding language system distribution area, and generating language system mark information of the area of the teaching audio frequency data based on the language system distribution area.

204. Preprocessing teaching audio data, and based on the preset audio frame number, performing multi-frame windowing calculation on the preprocessed teaching audio data to obtain windowed teaching audio data;

in this embodiment, preprocessing steps, such as noise reduction, silence segment removal, etc., are performed on the teaching audio data; dividing the preprocessed teaching audio data into corresponding frames based on the preset audio frame number, wherein the duration of each frame is 20-40 milliseconds, and the framing can be realized by segmenting the signal by using a sliding window; further, a windowing function is applied to the speech signal of each frame, and commonly used windowing functions include Hamming Window (Hamming Window), hanning Window (Hanning Window), and the like. The frequency spectrum leakage phenomenon can be reduced by windowing, so that the frequency spectrum analysis is more accurate, and the windowed teaching audio data can be obtained.

205. Performing time-frequency conversion on the windowed teaching audio data, and performing frequency spectrum filtering and coefficient operation on the time-frequency converted teaching audio data based on the human voice perception frequency to obtain teaching audio coefficients;

in this embodiment, the human voice sensing frequency refers to the frequency of sound that can be heard by human (i.e. the sense of the human ear on the sound is closely related).

In practical application, performing Fast Fourier Transform (FFT) on each frame of windowed teaching audio data to obtain a spectrum representation of each frame of voice signal; and setting a corresponding Mel scale based on the human voice perception frequency, setting the Mel scale as the center frequency of a corresponding Mel filter to perform frequency spectrum filtering on the teaching audio data after time-frequency conversion, and performing logarithmic operation on the filtered result to obtain a teaching audio coefficient.

206. Performing cepstrum transformation on the teaching audio coefficient, and selecting a plurality of audio accent features meeting preset frequency spectrum orders from cepstrum transformation results;

in this embodiment, discrete Cosine Transform (DCT) is performed on the teaching audio coefficients to obtain cepstrum coefficients. Typically, only a portion of the low-order cepstrum coefficients are retained, and the high-order cepstrum coefficients are ignored, so that a plurality of audio accent features satisfying a predetermined spectral order (such as the low-order cepstrum coefficients) are selected from the result of the cepstrum transformation.

207. Respectively calculating the feature similarity among the audio accent features by adopting a preset voice recognition model;

in this embodiment, the voice recognition model refers to a GMM model, which is obtained by collecting a GMM model corresponding to each speaker of a trainer and training by using a voice sample thereof, and may represent different speakers by using a plurality of GMM models.

In practical application, the feature similarity between the accent features of each audio is calculated by using the number K of gaussian distributions in the speech recognition model and the mean vector μ, covariance matrix Σ and mixing coefficient pi of each gaussian distribution.

208. Selecting feature similarity larger than a preset similarity threshold, and determining a corresponding number of target lecturer in the target class based on the selected result;

in this embodiment, based on the feature similarity obtained by calculation, the number of target lecturer corresponding to the lecture in the target classroom is determined by maximum likelihood decision (for example, selecting the speaker corresponding to the highest likelihood probability).

209. Based on the target teaching personnel, extracting teaching voice fragments corresponding to teaching audio data, and carrying out language conversion on the teaching voice fragments based on audio mark information to obtain target teaching information;

210. based on the target teaching information, control instruction information and auxiliary teaching information of preset teaching equipment are generated, and an auxiliary teaching result is obtained.

According to the embodiment of the invention, the corresponding audio mark information is extracted from the teaching audio data, the target teaching personnel corresponding to the teaching audio data is determined, the corresponding teaching voice fragments are converted into the target teaching information based on the target teaching personnel and the audio mark information, the control instruction information and the auxiliary teaching information corresponding to the target teaching information are generated, and the auxiliary teaching result is obtained, so that the accuracy of class voice recognition and conversion of relevant development classes is improved, and the effective teaching control of class related equipment is realized.

Referring to fig. 3, a third embodiment of the auxiliary teaching method based on classroom speech recognition according to the present invention includes:

301. acquiring teaching audio data of a teaching corresponding to a target classroom, and extracting at least one type of audio marking information in the teaching audio data based on a preset audio storage strategy;

302. extracting a plurality of audio accent features of teaching audio data, and determining a target lecturer corresponding to the lecture in the target classroom based on the audio accent features;

303. based on a target teaching person, performing audio cutting on teaching audio data to obtain a plurality of audio cutting fragments, and identifying audio endpoints of each audio cutting fragment;

in this embodiment, based on the target lecturer, according to the corresponding audio accent feature and voiceprint feature, audio cutting of the teaching audio data is performed on the corresponding target lecturer to obtain a plurality of audio cutting segments, and then the starting and ending positions of each audio cutting segment are determined by using an energy threshold method, a short-time zero-crossing rate method, and the like, so as to obtain audio endpoints of each audio cutting segment.

304. Based on an audio endpoint, performing personnel secondary segmentation on the audio cutting segment, performing first teaching personnel marking on the secondary segmentation result to obtain a first teaching voice segment corresponding to a first target teaching personnel, and performing second teaching personnel marking on the secondary segmentation result based on equipment authority to obtain a second teaching voice segment corresponding to a second target teaching personnel;

In this embodiment, the first target lecturer here refers to a lecturer teacher; the second target lecturer here refers to students who interact in a class.

In practical applications, based on the end point detection result, the voice of each target lecturer is segmented into different audio segments, so that each segment is ensured to contain only one voice of one person, and then a feature extraction algorithm, such as MFCC (mel-frequency cepstrum coefficient) and the like, is applied to each audio segment. The feature extraction generates vector representation for training or recognition, and adds a corresponding label to each audio segment to indicate which person the segment belongs to, so that a first teaching voice segment corresponding to a first target teaching person is obtained, and a second teaching person is marked on the result of the secondary segmentation based on the equipment operation authority, so that a second teaching voice segment corresponding to the second target teaching person is obtained.

305. Based on the language series marking information, determining at least one teaching marking instruction in the teaching voice fragment, and judging whether the teaching information corresponding to the teaching marking instruction belongs to a preset course naming system or not;

in this embodiment, the instruction of the lecture mark refers to the lecture voice and the lecture voice vocabulary corresponding to the lecture teacher.

In practical application, based on the language series marking information, at least one teaching marking instruction in the teaching voice fragment is determined (namely, teaching voice characters of a teaching teacher are extracted), so as to judge whether the teaching information corresponding to the teaching marking instruction belongs to a preset course naming system (namely, judge whether the current teaching voice characters accord with teaching use language and use vocabulary of a current area).

306. If the teaching information corresponding to the teaching marking instruction belongs to a preset course naming system, the teaching marking instruction is used as target teaching information according to the course naming system;

in this embodiment, if the teaching information corresponding to the teaching marking instruction belongs to a preset course naming system (i.e., the teaching teacher is a teacher of the local language family), the language text corresponding to the teaching marking instruction is translated, text adjusted and beautified according to the course naming system, and is used as the target teaching information.

307. If the teaching information corresponding to the teaching marking instruction does not belong to the preset course naming system, carrying out naming language conversion on the teaching marking instruction according to the course naming system to obtain target teaching information;

in this embodiment, if the teaching information corresponding to the teaching marking instruction does not belong to the preset course naming system, the teaching marking instruction is subjected to naming language conversion (i.e., the language text corresponding to the teaching marking instruction and the corresponding professional vocabulary are converted into the expression text of the local language system) according to the course naming system, so as to obtain the target teaching information.

308. Determining a target operation object of a corresponding equipment type in target teaching information and initial state information of preset teaching equipment;

in this embodiment, through performing teaching content analysis on the target teaching information after the language conversion, determining a target operation object of a corresponding device type of a current target class and initial state information of a preset teaching device (such as an initial sending position of astronomical glasses, whether calibration is performed, etc.);

309. operating parameter calculation is carried out on the longitude and latitude marking information and the time marking information, and control instruction information of teaching equipment is generated;

in this embodiment, parameter adjustment calculation (such as calculation of adjustment amounts of ocular lens, time longitude and latitude of a telescope, and the like of the telescope according to a celestial body to be observed) of the observation device is performed according to initial state information of the teaching device (such as astronomical glasses) based on longitude and latitude mark information and time mark information, so that control instruction information for controlling the teaching device to complete a corresponding teaching operation task is generated based on a result of the calculation.

310. And generating auxiliary teaching information of the target class in sequence based on the target teaching information and the control instruction information to obtain an auxiliary teaching result.

In this embodiment, based on the target teaching information and the control instruction information, auxiliary teaching information (including real-time translated language information, device operation information, and the like) after the total language conversion of the target class is generated according to the schedule and teaching development of the class teaching, and an auxiliary teaching result is obtained.

The above describes the auxiliary teaching method based on classroom speech recognition in the embodiment of the present invention, and the following describes the auxiliary teaching device based on classroom speech recognition in the embodiment of the present invention, please refer to fig. 4, and one embodiment of the auxiliary teaching device based on classroom speech recognition in the embodiment of the present invention includes:

The mark extraction module 401 is configured to obtain teaching audio data corresponding to a teaching in a target class, and extract at least one type of audio mark information in the teaching audio data based on a preset audio storage policy;

a person determining module 402, configured to extract a plurality of audio accent features of the teaching audio data, and determine a target lecturer corresponding to a lecture in the target classroom based on the audio accent features;

the language conversion module 403 is configured to extract a lecture voice segment corresponding to the teaching audio data based on the target lecturer, and perform language conversion on the lecture voice segment based on the audio markup information, so as to obtain target teaching information;

the instruction generating module 404 is configured to generate control instruction information and auxiliary teaching information of a preset teaching device based on the target teaching information, so as to obtain an auxiliary teaching result.

According to the embodiment of the invention, at least one type of audio marking information in teaching audio data is extracted by acquiring the teaching audio data corresponding to a target class and based on a preset audio storage strategy; extracting a plurality of audio accent features of the teaching audio data, and determining a target lecturer corresponding to the lecture in the target classroom based on the audio accent features; extracting teaching voice fragments corresponding to the teaching audio data based on the target teaching personnel, and performing language conversion on the teaching voice fragments based on the audio marking information to obtain target teaching information; and generating control instruction information and auxiliary teaching information of preset teaching equipment based on the target teaching information to obtain an auxiliary teaching result. Compared with the prior art, the method and the device have the advantages that the corresponding audio mark information is extracted from teaching audio data, the target teaching personnel corresponding to the teaching audio data is determined, the corresponding teaching voice fragments are converted into teaching languages based on the target teaching personnel and the audio mark information, the target teaching information is obtained, the control instruction information and the auxiliary teaching information corresponding to the target teaching information are generated, the auxiliary teaching result is obtained, and therefore accuracy of class voice recognition and conversion of relevant development classes is improved, and effective teaching control of class relevant equipment is achieved.

Referring to fig. 5, another embodiment of the auxiliary teaching device based on classroom speech recognition according to the present invention includes:

Further, the tag extraction module 401 includes:

a first marking unit 4011, configured to extract classroom location information and time stamp information corresponding to the teaching audio data based on a preset audio storage policy;

A second marking unit 4012, configured to calculate, based on the classroom position information, a right-hand warp time and a right-hand weft number of the target classroom, and generate, based on the right-hand warp time and the right-hand weft number, longitude and latitude marking information of the teaching audio clip;

the third marking unit 4013 is configured to determine a language family distribution area of the target class based on the audio position information, and generate language family marking information of the teaching audio data based on the language family distribution area.

Further, the person determination module 402 includes:

the windowing calculation unit 4021 is configured to preprocess the teaching audio data, and perform multi-frame windowing calculation on the preprocessed teaching audio data based on a preset audio frame number, so as to obtain windowed teaching audio data;

the time-frequency conversion unit 4022 is configured to perform time-frequency conversion on the windowed teaching audio data, and perform spectral filtering and coefficient operation on the time-frequency converted teaching audio data based on the human voice perception frequency to obtain a teaching audio coefficient;

the cepstrum transformation unit 4023 is configured to perform cepstrum transformation on the teaching audio coefficient, and select a plurality of audio accent features that satisfy a preset spectrum order from the result of the cepstrum transformation.

Further, the person determination module 402 further includes:

a similarity calculating unit 4024 configured to calculate feature similarities between the audio accent features using a preset voice recognition model, respectively;

the similarity selecting unit 4025 is configured to select a feature similarity greater than a preset similarity threshold, and determine, based on a result of the selecting, a corresponding number of target lecturer during lecturing in the target classroom.

Further, the language conversion module 403 includes:

an audio cutting unit 4031, configured to perform audio cutting on the teaching audio data based on the target lecturer, obtain a plurality of audio cutting segments, and identify audio endpoints of each of the audio cutting segments;

the personnel marking unit 4032 is configured to perform personnel secondary segmentation on the audio cutting segment based on the audio endpoint, perform first lecture personnel marking on a result of the secondary segmentation to obtain a first lecture voice segment corresponding to a first target lecture personnel, and perform second lecture personnel marking on a result of the secondary segmentation based on the equipment authority to obtain a second lecture voice segment corresponding to a second target lecture personnel.

Further, the language conversion module 403 further includes: the naming judging unit 4033 is configured to determine at least one lecture marking instruction in the lecture voice segment based on the language family marking information, and judge whether the lecture information corresponding to the lecture marking instruction belongs to a preset curriculum naming system;

The first conversion unit 4034 is configured to, if the lecture information corresponding to the lecture marking instruction belongs to a preset curriculum naming system, use the lecture marking instruction as target teaching information according to the curriculum naming system;

and the second conversion unit 4035 is configured to, if the lecture information corresponding to the lecture marking instruction does not belong to a preset curriculum naming system, perform naming language conversion on the lecture marking instruction according to the curriculum naming system, so as to obtain target teaching information.

Further, the instruction generating module 404 includes:

a state determining unit 4041, configured to determine a target operation object of a corresponding device type and initial state information of a preset teaching device in the target teaching information;

a parameter calculation unit 4042, configured to perform operation parameter calculation on the warp and weft marking information and the time stamp information, and generate control instruction information of the teaching device;

the instruction generating unit 4043 is configured to sequentially generate auxiliary teaching information of the target class based on the target teaching information and the control instruction information, so as to obtain an auxiliary teaching result.

The auxiliary teaching device based on classroom speech recognition in the embodiment of the present invention is described in detail from the point of view of the modularized functional entity in fig. 4 and fig. 5, and the auxiliary teaching device based on classroom speech recognition in the embodiment of the present invention is described in detail from the point of view of hardware processing.

Fig. 6 is a schematic structural diagram of an auxiliary teaching device based on classroom speech recognition according to an embodiment of the present invention, where the auxiliary teaching device 600 based on classroom speech recognition may have relatively large differences due to configuration or performance, and may include one or more processors (central processing units, CPU) 610 (e.g., one or more processors) and a memory 620, and one or more storage media 630 (e.g., one or more mass storage devices) storing application programs 633 or data 632. Wherein the memory 620 and the storage medium 630 may be transitory or persistent storage. The program stored on the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations in the auxiliary teaching apparatus 600 based on classroom speech recognition. Still further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the classroom speech recognition based auxiliary tutorial device 600.

The classroom speech recognition based auxiliary tutorial device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input/output interfaces 660, and/or one or more operating systems 631, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the architecture of the classroom speech recognition based auxiliary tutorial device shown in FIG. 6 is not limiting and may include more or fewer components than shown, or may be a combination of certain components or a different arrangement of components.

The invention also provides auxiliary teaching equipment based on the classroom speech recognition, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the auxiliary teaching method based on the classroom speech recognition in the above embodiments.

The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, or may be a volatile computer readable storage medium, where instructions are stored in the computer readable storage medium, where the instructions, when executed on a computer, cause the computer to perform the steps of the class speech recognition-based auxiliary teaching method.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The auxiliary teaching method based on the classroom speech recognition is characterized by comprising the following steps of:

acquiring teaching audio data of a teaching corresponding to a target class, and extracting at least one type of audio marking information in the teaching audio data based on a preset audio storage strategy;

extracting a plurality of audio accent features of the teaching audio data, and determining a target lecturer corresponding to the lecture in the target classroom based on the audio accent features;

extracting teaching voice fragments corresponding to the teaching audio data based on the target teaching personnel, and performing language conversion on the teaching voice fragments based on the audio marking information to obtain target teaching information;

and generating control instruction information and auxiliary teaching information of preset teaching equipment based on the target teaching information to obtain an auxiliary teaching result.

2. The classroom speech recognition-based auxiliary tutorial method according to claim 1, wherein the audio markup information includes longitude and latitude markup information, time markup information and language family markup information, wherein the extracting at least one audio markup information in the tutorial audio data based on a preset audio storage policy includes:

Based on a preset audio storage strategy, extracting classroom position information and time mark information corresponding to the teaching audio data;

calculating the right moment and the right degree of the right moment of the target classroom based on the classroom position information, and generating the longitude and latitude marking information of the teaching audio clip based on the right moment and the right degree of the right moment;

and determining a language system distribution area of the target class based on the audio position information, and generating language system marking information of the teaching audio data based on the language system distribution area.

3. The classroom speech recognition based auxiliary tutorial method of claim 1, wherein the extracting the plurality of audio accent features of the tutorial audio data includes:

preprocessing the teaching audio data, and carrying out multi-frame windowing calculation on the preprocessed teaching audio data based on a preset audio frame number to obtain windowed teaching audio data;

performing time-frequency conversion on the windowed teaching audio data, and performing frequency spectrum filtering and coefficient operation on the time-frequency converted teaching audio data based on the human voice perception frequency to obtain teaching audio coefficients;

And carrying out cepstrum transformation on the teaching audio coefficient, and selecting a plurality of audio accent features meeting the preset frequency spectrum orders from the cepstrum transformation result.

4. The teaching aid method based on classroom speech recognition according to claim 3, wherein the determining a target lecturer for a corresponding lecture in the target classroom based on the audio accent feature comprises:

respectively calculating the feature similarity between the audio accent features by adopting a preset voice recognition model;

and selecting feature similarity larger than a preset similarity threshold, and determining a corresponding number of target lecturer in the target class based on the selected result.

5. The teaching aid method based on classroom speech recognition according to claim 1, wherein the target lecturer includes a first target lecturer and a second target lecturer, the lecture speech segments include a first lecture speech segment and a second lecture speech segment, and the extracting the lecture speech segment corresponding to the teaching audio data based on the target lecturer includes:

performing audio cutting on the teaching audio data based on the target teaching personnel to obtain a plurality of audio cutting fragments, and identifying audio endpoints of each audio cutting fragment;

Based on the audio endpoint, the audio cutting segment is subjected to personnel secondary segmentation, a first teaching personnel marking is carried out on the secondary segmentation result to obtain a first teaching voice segment corresponding to a first target teaching personnel, and a second teaching personnel marking is carried out on the secondary segmentation result based on equipment authority to obtain a second teaching voice segment corresponding to a second target teaching personnel.

6. The auxiliary teaching method based on classroom speech recognition according to claim 2, wherein the performing language conversion on the teaching speech segment based on the audio markup information to obtain target teaching information includes:

based on the language family mark information, determining at least one teaching mark instruction in the teaching voice fragment, and judging whether teaching information corresponding to the teaching mark instruction belongs to a preset course naming system or not;

if the teaching information corresponding to the teaching marking instruction belongs to a preset course naming system, taking the teaching marking instruction as target teaching information according to the course naming system;

if the teaching information corresponding to the teaching marking instruction does not belong to a preset course naming system, carrying out naming language conversion on the teaching marking instruction according to the course naming system to obtain target teaching information.

7. The method for assisting teaching based on classroom speech recognition according to claim 6, wherein the generating control instruction information and assisting teaching information of a preset teaching device based on the target teaching information, to obtain an assisting teaching result, comprises:

determining a target operation object of a corresponding equipment type and initial state information of preset teaching equipment in the target teaching information;

operating parameter calculation is carried out on the longitude and latitude marking information and the time marking information, and control instruction information of the teaching equipment is generated;

and generating auxiliary teaching information of the target class in sequence based on the target teaching information and the control instruction information to obtain an auxiliary teaching result.

8. Supplementary teaching device based on classroom speech recognition, its characterized in that, supplementary teaching device based on classroom speech recognition includes:

the mark extraction module is used for acquiring teaching audio data corresponding to a target class and extracting at least one type of audio mark information in the teaching audio data based on a preset audio storage strategy;

the personnel determining module is used for extracting a plurality of audio accent characteristics of the teaching audio data and determining target teaching personnel corresponding to teaching in the target class based on the audio accent characteristics;

The language conversion module is used for extracting teaching voice fragments corresponding to the teaching audio data based on the target teaching personnel, and carrying out language conversion on the teaching voice fragments based on the audio mark information to obtain target teaching information;

the instruction generation module is used for generating control instruction information and auxiliary teaching information of preset teaching equipment based on the target teaching information to obtain an auxiliary teaching result.

9. Supplementary teaching equipment based on classroom speech recognition, its characterized in that, supplementary teaching equipment based on classroom speech recognition includes: a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invoking the instructions in the memory to cause the classroom speech recognition based auxiliary tutorial device to perform the steps of the classroom speech recognition based auxiliary tutorial method of any one of claims 1-7.

10. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the steps of the class speech recognition based assisted instruction method of any one of claims 1 to 7.