CN109213970B

CN109213970B - Method and device for generating notes

Info

Publication number: CN109213970B
Application number: CN201710525292.4A
Authority: CN
Inventors: 石鹏; 梁文波
Original assignee: Beijing Gridsum Technology Co Ltd
Current assignee: Beijing Gridsum Technology Co Ltd
Priority date: 2017-06-30
Filing date: 2017-06-30
Publication date: 2022-07-29
Anticipated expiration: 2037-06-30
Also published as: CN109213970A

Abstract

The application discloses a method and a device for generating a stroke record. The method comprises the following steps: sound collection is carried out through a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a using object; analyzing the sound signal to obtain text information corresponding to the sound signal; and generating a target script based on the text information. By the method and the device, the problem of low efficiency of recording court trial notes in the court trial process of the court in the related technology is solved.

Description

Method and device for generating notes

Technical Field

The application relates to the technical field of information processing, in particular to a method and a device for generating a pen note.

Background

At present, in the court trial process of a court, the speech information of each role such as a judge and a party needs to be recorded, the recorded speech information is arranged into a court trial record after the court trial is finished, and the recorded speech information is filed after the court trial is signed and confirmed by the party. In the traditional court trial process, one bookkeeper is provided, in the whole court trial process, the bookkeeper records the speeches of all the roles, and after the recording is finished, the recorded speeches need to be arranged into a court trial record according to a fixed format. The whole process has certain requirements on the typewriting level of a bookkeeper, and simultaneously needs to have certain understanding on the case content of court trial and has certain professional depth on laws and regulations. In the process, because the typewriting speed of the bookmarker and the understanding degree of the case are different, the problems of low recording efficiency, inaccurate recording, missing recording and the like often occur.

Aiming at the problem of low recording efficiency of court trial records in the court trial process in the related art, no effective solution is provided at present.

Disclosure of Invention

The application mainly aims to provide a method and a device for generating a court record, so as to solve the problem that in the court trial process of a court in the related art, the efficiency of recording the court trial court record is low.

To achieve the above object, according to one aspect of the present application, there is provided a handwriting generating method. The method comprises the following steps: sound collection is carried out through a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a using object; analyzing the sound signal to obtain text information corresponding to the sound signal; and generating a target record based on the text information.

Further, analyzing the sound signal to obtain text information corresponding to the sound signal includes: dividing the voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups; determining a target state corresponding to each voice frame group to obtain a plurality of target states; searching corresponding phonemes for a first preset number of continuous target states to obtain a plurality of phonemes; generating corresponding words based on a second preset number of continuous phonemes to obtain a plurality of words; generating the text information by the plurality of words.

Further, determining a target state corresponding to each speech frame group, and obtaining a plurality of target states includes: calculating the corresponding probability of each voice frame group on a plurality of states; acquiring a state corresponding to the probability that the voice frame group meets a preset condition; and taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.

Further, the sound signal is a sound signal of a court trial case collected by a sound collector on the court trial site, and generating the target record based on the text information comprises: carrying out error correction processing on the text information; determining the type of a court trial case, and acquiring a corresponding record template according to the type of the court trial case; adding corrected text information to the target position in the record template to generate a record to be processed; and processing the record to be processed to obtain the target record.

Further, the target record is a court trial record, and before sound acquisition is performed through a multi-channel sound card to obtain a sound signal, the method further includes: configuring the corresponding relation between each sound channel on the sound card and each court trial object role; and connecting the sound collector corresponding to each court trial object role with each sound channel according to the corresponding relation.

To achieve the above object, according to another aspect of the present application, there is provided a handwriting generating apparatus. The device includes: the system comprises a collecting unit, a processing unit and a processing unit, wherein the collecting unit is used for collecting sound through a multi-channel sound card to obtain a sound signal, each channel on the sound card corresponds to a sound collector, and each sound collector is used for collecting the sound of a using object; the analysis unit is used for analyzing the sound signals to obtain text information corresponding to the sound signals; and a generating unit configured to generate a target entry based on the text information.

Further, the parsing unit includes: the dividing module is used for dividing the voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups; the searching module is used for determining the target state corresponding to each voice frame group to obtain a plurality of target states; the matching module is used for searching corresponding phonemes for the continuous first preset number of target states to obtain a plurality of phonemes; the first generation module is used for generating corresponding words based on continuous second preset number of phonemes to obtain a plurality of words; a second generating module for generating the text information by the plurality of words.

Further, the lookup module includes: the calculation submodule is used for calculating the corresponding probability of each voice frame group on a plurality of states; the obtaining submodule is used for obtaining the state corresponding to the probability that the voice frame group meets the preset condition; and the determining submodule is used for taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.

In order to achieve the above object, according to another aspect of the present application, there is provided a storage medium including a stored program, wherein the program executes the record generation method of any one of the above.

In order to achieve the above object, according to another aspect of the present application, there is provided a processor for executing a program, where the program executes to perform the method for generating a record.

Through the application, the following steps are adopted: sound collection is carried out through a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a using object; analyzing the sound signal to obtain text information corresponding to the sound signal; and generating a target record based on the text information, thereby solving the problem of low efficiency of recording the court trial record in the court trial process in the related technology. The collected sound signals are converted into corresponding text information, and the record is generated based on the text information, so that the effect of improving the efficiency of generating the court trial record is achieved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:

fig. 1 is a flowchart of a method for generating a record according to an embodiment of the present application;

fig. 2 is a schematic diagram of a sound signal in a method for generating a transcript according to an embodiment of the present application;

fig. 3 is a schematic diagram of a state relationship of a phoneme of a character in a method for generating a transcript according to an embodiment of the present application;

fig. 4 is a schematic diagram of a framing operation in a method for generating a record provided in an embodiment of the present application;

fig. 5 is a schematic diagram illustrating a corresponding state of a speech frame in a method for generating a record according to an embodiment of the present application;

fig. 6 is a schematic diagram of a court trial entry template in an entry generation method provided in an embodiment of the present application;

fig. 7 is a schematic diagram of a record generation apparatus according to an embodiment of the present application.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of description, some terms or expressions referred to in the embodiments of the present application are explained below:

the court trial record is also called a court record or an trial record, is an indispensable written material for court judgment cases, is a written description which is made by a bookkeeper and synchronously reflects the real situation of all trial activities in the court trial process, is an entire process of case trial, is an important basis for making a judgment according to the court law, is an important material for carrying out trial and supervision in the future, and has obvious important function and significance. The court trial record should objectively, truly, timely and accurately reflect all the activities of the court trial.

The one hundred and forty-seven notes of the civil litigation law shall record all the activities of the court trial and be signed by the judge and the notes. The court book should be read in a court, and can also inform the parties and other litigation participants to read in the court or within five days. Parties and other litigation participants believe that they have missed or missed their statement records and are entitled to apply for correction. If not corrected, the application shall be recorded. Court notes are signed or stamped by parties and other litigation participants. And rejecting signature sealing and attaching a note case. The criminal litigation law has the following two hundred and one hundred pieces: all the activities of court trial should be written into a record by a bookmarker, and after review by the trial leader, the signature is signed by the trial leader and the bookmarker. The witness part in the court writing record should be read in court or handed to the witness for reading. After the witness acknowledges no errors, it should sign or seal. The court writing should be handed to the party for reading or announced to the party. The principal, considering that the notes are missing or erroneous, may request replenishment or correction. The principal acknowledges that there are no errors, and should sign or seal.

Sound cards are also called audio cards: the sound card is the most basic component in multimedia technology, and is a hardware for realizing sound wave/digital signal interconversion. The basic function of the sound card is to convert original sound signals from a microphone, a magnetic tape, or an optical disk and output the converted signals to sound equipment such as an earphone, a speaker, a loudspeaker, or a recorder, or to make a musical instrument emit a beautiful sound through a Musical Instrument Digital Interface (MIDI). The sound card is composed of various electronic devices and connectors. Electronic devices are used to perform a variety of specific functions. The connector generally has two types, namely a socket and a circular jack, and is used for connecting input and output signals.

Phonemes, which are the smallest units in speech, are analyzed according to the pronunciation actions in syllables, with one action constituting a phoneme. Phonemes are divided into two major categories, namely vowels and consonants. For example, the chinese syllable ā (o) has only one phoneme, the ai (i) has two phonemes, the d ā i (slow) has three phonemes, etc. The method of marking speech with international phonetic symbols is called phonetic notation, and includes both broad and strict ones. The broad phonetic notation method uses the distinguishable phoneme phonetic notation, the strict phonetic notation method uses strict phoneme distinction to mark the phonetic notation, the distinction between phonemes is expressed as much as possible, the symbols used by the broad phonetic notation method are limited, the symbols used by the strict phonetic notation method are many, but the two methods have respective purposes. A phoneme is the smallest unit or smallest speech segment constituting a syllable, and is the smallest linear speech unit divided from the viewpoint of sound quality. Phonemes are physical phenomena that exist specifically. The phonetic symbols of international phonetic alphabets (letters designated by the international phonetic society to uniformly designate the voices of various countries, also called "international phonetic alphabets", "universal phonetic alphabets") correspond one-to-one to phonemes of the whole human language.

According to an embodiment of the present application, there is provided a method of generating a stroke record.

Fig. 1 is a flowchart of a method for generating a transcript according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:

step S101, sound collection is carried out through a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to a sound collector, and each sound collector is used for collecting the sound of a using object.

In the present application, sound collection in a court is described as an example. The court has a multi-sound-channel sound card, the sound card is connected to a computer of a bookkeeper, each sound channel on the sound card corresponds to one microphone (corresponding to the sound collector), the sound channels correspond to the court trial roles, and the microphones of all the roles are connected to the sound card according to the corresponding mode. In a court trial, sound of a plurality of characters using microphones is collected by a multi-channel sound card to obtain a sound signal, and the collected sound signal is a piece of waveform data, for example, as shown in fig. 2.

Optionally, to ensure that the sound signal can be acquired through a multi-channel sound card, in the record generating method provided in the embodiment of the present application, the target record is a court trial record, and before acquiring the sound signal through the multi-channel sound card, the method further includes: configuring the corresponding relation between each sound channel on the sound card and each court trial object role; and connecting the sound collector corresponding to each court trial object role with each sound channel according to the corresponding relation.

Or, configuring the corresponding relation between each court trial object role and each sound collector; and then connecting each sound collector with the sound channel corresponding to each court trial object role according to the corresponding relation.

The record generation method according to the embodiment of the present application may be applied to court trial software, that is, the record generation method according to the embodiment of the present application is embedded in the court trial software. Before court trial begins, the court trial software is installed, the corresponding relation of the role sound channels is set in the software, and further, the information of the type of the court trial case, the type of the record, the examination grade and the like can be set in the court trial software before court trial. The information set in the court trial software is referred to in the subsequent generation of the record.

And step S102, analyzing the sound signal to obtain text information corresponding to the sound signal.

The sound signal is analyzed, the text information corresponding to the sound signal can be obtained at the sound card end, the sound signal can also be sent to the voice analysis server, the sound signal is analyzed at the voice analysis server end, and the analyzed text information is returned.

Before the sound signal is analyzed, the analysis server needs to be trained by a large amount of corpora, and stores a large amount of phoneme information in advance, wherein phonemes are pronunciation information of characters, for example, the Chinese language generally directly uses all initials and finals as a phoneme set, and each phoneme is generally divided into a plurality of states, wherein the states mentioned in the application are components of the phoneme. As shown in fig. 3.

Optionally, in the method for generating a record provided in the embodiment of the present application, analyzing the sound signal to obtain text information corresponding to the sound signal includes: dividing voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups; determining a target state corresponding to each voice frame group to obtain a plurality of target states, wherein one voice group corresponds to one target state; searching corresponding phonemes for a first preset number of continuous target states to obtain a plurality of phonemes; generating corresponding words based on a second preset number of continuous phonemes to obtain a plurality of words; text information is generated by a plurality of words.

The predetermined condition in the method for generating a transcript provided in the embodiment of the present application may be a condition that a probability of a plurality of speech frames in a certain state satisfies a probability threshold. In the process of analyzing the sound signal into the corresponding text information, the speech frames in the sound signal are matched in a plurality of states, and the matching can be performed in a traversal mode, for example, 3 continuous speech frames are matched with the states one by one, if the probability does not meet the probability threshold, 4 continuous speech frames are matched with the states one by one until the probability of the speech frames in a certain state meets the probability threshold, and the speech frames are used as a speech frame group. The state is the target state corresponding to the speech frame set. The states are used as components of phonemes, for example, corresponding phonemes are searched for 3 consecutive target states, a plurality of searched phonemes are combined into words, and text information is generated from the plurality of words.

The waveform of the sound signal is subjected to framing operation, as shown in fig. 4, for example, each frame in fig. 4 has a length of 25ms, and each two frames have an overlap of 25-10 ms and 15 ms. It is called to divide the frame by a frame length of 25ms and a frame shift of 10 ms. As shown in fig. 5, each small vertical bar represents a frame, a plurality of frames of speech correspond to a state, (which is equivalent to dividing the speech frames in the sound signal according to a predetermined condition to obtain a plurality of speech frame groups), each three (first predetermined number) states are combined into a phoneme, and a plurality of (second predetermined number) phonemes are combined into a word. Text information is generated by a plurality of words.

Optionally, in the method for generating a transcript provided in the embodiment of the present application, determining a target state corresponding to each voice frame group, and obtaining multiple target states includes: calculating the corresponding probability of each voice frame group on a plurality of states; acquiring a state corresponding to the probability that the voice frame group meets a preset condition; and taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.

The speech recognition process searches for an optimal path in the state network, i.e. the probability of a speech frame being at the target state is the highest. The path search algorithm is an algorithm for dynamically planning pruning, called Viterbi algorithm, and is used for finding a globally optimal path. In the application, the state with the highest probability of the voice frame on a plurality of states can be obtained by adopting a path searching mode and is used as the target state corresponding to the voice frame group. After the target state corresponding to the voice frame group is obtained, the matched phoneme is finally found through a preset algorithm, and the corresponding character information is inversely calculated, so that the process of analyzing the character information through voice recognition is completed.

Step S103, generating a target record based on the text information.

There are different format requirements for different notes. Therefore, the target record needs to be generated based on the parsed text information according to the template format of the target record.

Optionally, in order to generate an accurate target record, in the record generating method provided in the embodiment of the present application, the sound signal is a sound signal of a court trial case collected by a sound collector at a court trial site, and generating the target record based on the text information includes: carrying out error correction processing on the text information; determining the type of a court trial case, and acquiring a corresponding record template according to the type of the court trial case; adding corrected text information to a target position in a record template to generate a record to be processed; and processing the record to be processed to obtain the target record.

Because the text information is obtained through sound signal analysis, errors on some characters may exist, and the characters in the text information are subjected to error correction processing in the process of generating the target record based on the text information, so that the accuracy of the characters in the text information is ensured, and the errors of the characters in the target record generated subsequently are avoided. For the types of court trial cases, the cases described by the cases can be received and input from the outside, or characters in text information obtained by analysis are input into a case type judgment model, the case type corresponding to the case is determined by using the case type judgment model, and the case type judgment model is used for training and learning referee documents of various case types in advance, so that the corresponding case types can be judged for the input characters; for example, the characters in the text information obtained by analysis are input into a case type judgment model, the case type corresponding to the case is determined to be a civil case through the case type judgment model, a record template corresponding to the civil case is stored in a database in advance, and the record template corresponding to the case type is obtained. As shown in fig. 6, the template of the record is generally divided into three parts, the first part is the record head; the second part is a content area; the third part is a signature area. The corrected text information is added to the content area. And then adding corresponding contents to the record head and the signature area in the record template to obtain a complete target record.

The method for generating the record in the embodiment of the application is applied to court trial software, the record template is automatically extracted according to case information set before a court, microphone sound collected by a sound card is converted into a digital signal, the digital signal is sent to a voice analysis server, an analysis result returned by the voice analysis server is received, the analysis result is displayed according to a template format, and the court trial record is generated. Through the steps, necessary link work in the court trial process is optimized and improved, the original mode of manually selecting the templates and recording characters is improved to the level of intelligent identification and automatic generation, the working efficiency is greatly improved, and the production mode of the court trial record is quicker and the content is more accurate and complete.

According to the record generation method provided by the embodiment of the application, sound collection is carried out through a multi-channel sound card to obtain sound signals, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a user object; analyzing the sound signal to obtain text information corresponding to the sound signal; and the target record is generated based on the text information, so that the problem of low efficiency of recording the court trial record in the court trial process in the related technology is solved. The collected sound signals are converted into corresponding text information, and the record is generated based on the text information, so that the effect of improving the efficiency of generating the court trial record is achieved.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

The embodiment of the present application further provides a record generating device, and it should be noted that the record generating device of the embodiment of the present application may be used to execute the record generating method provided in the embodiment of the present application. The following describes a record generation apparatus provided in an embodiment of the present application.

Fig. 7 is a schematic diagram of a record generation apparatus according to an embodiment of the present application. As shown in fig. 7, the apparatus includes: an acquisition unit 10, an analysis unit 20 and a generation unit 30.

Specifically, the collecting unit 10 is configured to collect sounds through a multi-channel sound card to obtain a sound signal, where each channel on the sound card corresponds to a sound collector, and each sound collector is configured to collect sounds of a user.

The analyzing unit 20 is configured to analyze the sound signal to obtain text information corresponding to the sound signal.

A generating unit 30 for generating a target entry based on the text information.

The record generating device provided by the embodiment of the application acquires sound through the acquisition unit 10 by using a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a user object; the analysis unit 20 analyzes the sound signal to obtain text information corresponding to the sound signal; the generation unit 30 generates the target record based on the text information, so that the problem of low efficiency of recording the court trial record in the court trial process of the court in the related art is solved, the collected sound signals are converted into the corresponding text information, and the record is generated based on the text information, so that the effect of improving the efficiency of generating the court trial record is achieved.

Optionally, in the apparatus for generating a record provided in the embodiment of the present application, the parsing unit 20 includes: the dividing module is used for dividing the voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups; the searching module is used for determining the target state corresponding to each voice frame group to obtain a plurality of target states; the matching module is used for searching corresponding phonemes for the continuous first preset number of target states to obtain a plurality of phonemes; the first generation module is used for generating corresponding words based on continuous second preset number of phonemes to obtain a plurality of words; and the second generation module is used for generating text information through a plurality of words.

Optionally, in the device for generating a record provided in the embodiment of the present application, the searching module includes: the calculation submodule is used for calculating the corresponding probability of each voice frame group on a plurality of states; the obtaining submodule is used for obtaining the state corresponding to the probability that the voice frame group meets the preset condition; and the determining submodule is used for taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.

The record generating device comprises a processor and a memory, the acquiring unit 10, the analyzing unit 20, the generating unit 30 and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.

The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can set one or more, and the notes are generated by adjusting the kernel parameters.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

An embodiment of the present application provides a storage medium on which a program is stored, and the program implements the record generation method when executed by a processor.

The embodiment of the application provides a processor, wherein the processor is used for running a program, and the method for generating the record is executed when the program runs.

The embodiment of the application provides equipment, the equipment comprises a processor, a memory and a program which is stored on the memory and can run on the processor, and the following steps are realized when the processor executes the program: sound collection is carried out through a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a using object; analyzing the sound signal to obtain text information corresponding to the sound signal; and generating a target record based on the text information.

Analyzing the sound signal to obtain text information corresponding to the sound signal includes: dividing the voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups; determining a target state corresponding to each voice frame group to obtain a plurality of target states; searching corresponding phonemes for a first preset number of continuous target states to obtain a plurality of phonemes; generating corresponding words based on a second preset number of continuous phonemes to obtain a plurality of words; generating the text information by the plurality of words.

Determining a target state corresponding to each voice frame group, and obtaining a plurality of target states comprises: calculating the corresponding probability of each voice frame group on a plurality of states; acquiring a state corresponding to the probability that the voice frame group meets a preset condition; and taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.

Generating a target transcript based on the textual information includes: carrying out error correction processing on the text information; determining the type of a court trial case, and acquiring a corresponding record template according to the type of the court trial case; adding corrected text information to the target position in the record template to generate a record to be processed; and processing the record to be processed to obtain the target record.

The target record is a court trial record, and before sound acquisition is performed through a multi-channel sound card to obtain a sound signal, the method further comprises the following steps: configuring the corresponding relation between each sound channel on the sound card and each court trial object role; and connecting the sound collector corresponding to each court trial object role with each sound channel according to the corresponding relation. The device herein may be a server, a PC, a PAD, a mobile phone, etc.

The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device: sound collection is carried out through a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a using object; analyzing the sound signal to obtain text information corresponding to the sound signal; and generating a target record based on the text information.

The target record is a court trial record, and before sound acquisition is performed through a multi-channel sound card to obtain a sound signal, the method further comprises the following steps: configuring the corresponding relation between each sound channel on the sound card and each court trial object role; and connecting the sound collector corresponding to each court trial object role with each sound channel according to the corresponding relation.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method for generating a note, comprising:

sound collection is carried out through a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a using object;

analyzing the sound signal to obtain text information corresponding to the sound signal; and

generating a target record based on the text information;

analyzing the sound signal to obtain text information corresponding to the sound signal includes:

dividing the voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups;

determining a target state corresponding to each voice frame group to obtain a plurality of target states;

searching corresponding phonemes for a first preset number of continuous target states to obtain a plurality of phonemes;

generating corresponding words based on a second preset number of continuous phonemes to obtain a plurality of words;

generating the text information by the plurality of words;

the sound signal is the sound signal of court trial cases collected through a sound collector on the court trial site, and the generating of the target record based on the text information comprises the following steps:

carrying out error correction processing on the text information;

determining the type of a court trial case, and acquiring a corresponding record template according to the type of the court trial case;

adding corrected text information to the target position in the record template to generate a record to be processed;

and processing the record to be processed to obtain the target record.

2. The method of claim 1, wherein determining the target state for each group of speech frames, and wherein obtaining a plurality of target states comprises:

calculating the corresponding probability of each voice frame group on a plurality of states;

acquiring a state corresponding to the probability that the voice frame group meets a preset condition;

and taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.

3. The method of claim 1, wherein the target record is a trial record, and wherein the method further comprises, before obtaining the sound signal through sound collection by a multi-channel sound card:

configuring the corresponding relation between each sound channel on the sound card and each court trial object role;

and connecting the sound collector corresponding to each court trial object role with each sound channel according to the corresponding relation.

4. A stylus generation apparatus, comprising:

the system comprises a collecting unit, a processing unit and a processing unit, wherein the collecting unit is used for collecting sound through a multi-channel sound card to obtain a sound signal, each channel on the sound card corresponds to a sound collector, and each sound collector is used for collecting the sound of a using object;

the analysis unit is used for analyzing the sound signals to obtain text information corresponding to the sound signals; and

a generating unit configured to generate a target entry based on the text information;

wherein the parsing unit includes:

the dividing module is used for dividing the voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups;

the searching module is used for determining the target state corresponding to each voice frame group to obtain a plurality of target states;

the matching module is used for searching corresponding phonemes for the continuous first preset number of target states to obtain a plurality of phonemes;

the first generation module is used for generating corresponding words based on continuous second preset number of phonemes to obtain a plurality of words;

a second generating module, configured to generate the text information through the plurality of words;

the sound signal is a sound signal of a court trial case collected by a sound collector on a court trial site, and the generating unit is further used for carrying out error correction processing on the text information; determining the type of a court trial case, and acquiring a corresponding record template according to the type of the court trial case; adding corrected text information to the target position in the record template to generate a record to be processed; and processing the record to be processed to obtain the target record.

5. The apparatus of claim 4, wherein the lookup module comprises:

the calculation submodule is used for calculating the corresponding probability of each voice frame group on a plurality of states;

the obtaining submodule is used for obtaining the state corresponding to the probability that the voice frame group meets the preset condition;

and the determining submodule is used for taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.

6. A storage medium characterized by comprising a stored program, wherein the program executes the record generation method of any one of claims 1 to 3.

7. A processor, configured to execute a program, wherein the program executes the method of generating a transcript as claimed in any one of claims 1 to 3.