CN109213970B - Method and device for generating notes - Google Patents

Method and device for generating notes Download PDF

Info

Publication number
CN109213970B
CN109213970B CN201710525292.4A CN201710525292A CN109213970B CN 109213970 B CN109213970 B CN 109213970B CN 201710525292 A CN201710525292 A CN 201710525292A CN 109213970 B CN109213970 B CN 109213970B
Authority
CN
China
Prior art keywords
sound
record
target
text information
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710525292.4A
Other languages
Chinese (zh)
Other versions
CN109213970A (en
Inventor
石鹏
梁文波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201710525292.4A priority Critical patent/CN109213970B/en
Publication of CN109213970A publication Critical patent/CN109213970A/en
Application granted granted Critical
Publication of CN109213970B publication Critical patent/CN109213970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The application discloses a method and a device for generating a stroke record. The method comprises the following steps: sound collection is carried out through a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a using object; analyzing the sound signal to obtain text information corresponding to the sound signal; and generating a target script based on the text information. By the method and the device, the problem of low efficiency of recording court trial notes in the court trial process of the court in the related technology is solved.

Description

Method and device for generating notes
Technical Field
The application relates to the technical field of information processing, in particular to a method and a device for generating a pen note.
Background
At present, in the court trial process of a court, the speech information of each role such as a judge and a party needs to be recorded, the recorded speech information is arranged into a court trial record after the court trial is finished, and the recorded speech information is filed after the court trial is signed and confirmed by the party. In the traditional court trial process, one bookkeeper is provided, in the whole court trial process, the bookkeeper records the speeches of all the roles, and after the recording is finished, the recorded speeches need to be arranged into a court trial record according to a fixed format. The whole process has certain requirements on the typewriting level of a bookkeeper, and simultaneously needs to have certain understanding on the case content of court trial and has certain professional depth on laws and regulations. In the process, because the typewriting speed of the bookmarker and the understanding degree of the case are different, the problems of low recording efficiency, inaccurate recording, missing recording and the like often occur.
Aiming at the problem of low recording efficiency of court trial records in the court trial process in the related art, no effective solution is provided at present.
Disclosure of Invention
The application mainly aims to provide a method and a device for generating a court record, so as to solve the problem that in the court trial process of a court in the related art, the efficiency of recording the court trial court record is low.
To achieve the above object, according to one aspect of the present application, there is provided a handwriting generating method. The method comprises the following steps: sound collection is carried out through a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a using object; analyzing the sound signal to obtain text information corresponding to the sound signal; and generating a target record based on the text information.
Further, analyzing the sound signal to obtain text information corresponding to the sound signal includes: dividing the voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups; determining a target state corresponding to each voice frame group to obtain a plurality of target states; searching corresponding phonemes for a first preset number of continuous target states to obtain a plurality of phonemes; generating corresponding words based on a second preset number of continuous phonemes to obtain a plurality of words; generating the text information by the plurality of words.
Further, determining a target state corresponding to each speech frame group, and obtaining a plurality of target states includes: calculating the corresponding probability of each voice frame group on a plurality of states; acquiring a state corresponding to the probability that the voice frame group meets a preset condition; and taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.
Further, the sound signal is a sound signal of a court trial case collected by a sound collector on the court trial site, and generating the target record based on the text information comprises: carrying out error correction processing on the text information; determining the type of a court trial case, and acquiring a corresponding record template according to the type of the court trial case; adding corrected text information to the target position in the record template to generate a record to be processed; and processing the record to be processed to obtain the target record.
Further, the target record is a court trial record, and before sound acquisition is performed through a multi-channel sound card to obtain a sound signal, the method further includes: configuring the corresponding relation between each sound channel on the sound card and each court trial object role; and connecting the sound collector corresponding to each court trial object role with each sound channel according to the corresponding relation.
To achieve the above object, according to another aspect of the present application, there is provided a handwriting generating apparatus. The device includes: the system comprises a collecting unit, a processing unit and a processing unit, wherein the collecting unit is used for collecting sound through a multi-channel sound card to obtain a sound signal, each channel on the sound card corresponds to a sound collector, and each sound collector is used for collecting the sound of a using object; the analysis unit is used for analyzing the sound signals to obtain text information corresponding to the sound signals; and a generating unit configured to generate a target entry based on the text information.
Further, the parsing unit includes: the dividing module is used for dividing the voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups; the searching module is used for determining the target state corresponding to each voice frame group to obtain a plurality of target states; the matching module is used for searching corresponding phonemes for the continuous first preset number of target states to obtain a plurality of phonemes; the first generation module is used for generating corresponding words based on continuous second preset number of phonemes to obtain a plurality of words; a second generating module for generating the text information by the plurality of words.
Further, the lookup module includes: the calculation submodule is used for calculating the corresponding probability of each voice frame group on a plurality of states; the obtaining submodule is used for obtaining the state corresponding to the probability that the voice frame group meets the preset condition; and the determining submodule is used for taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.
In order to achieve the above object, according to another aspect of the present application, there is provided a storage medium including a stored program, wherein the program executes the record generation method of any one of the above.
In order to achieve the above object, according to another aspect of the present application, there is provided a processor for executing a program, where the program executes to perform the method for generating a record.
Through the application, the following steps are adopted: sound collection is carried out through a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a using object; analyzing the sound signal to obtain text information corresponding to the sound signal; and generating a target record based on the text information, thereby solving the problem of low efficiency of recording the court trial record in the court trial process in the related technology. The collected sound signals are converted into corresponding text information, and the record is generated based on the text information, so that the effect of improving the efficiency of generating the court trial record is achieved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
fig. 1 is a flowchart of a method for generating a record according to an embodiment of the present application;
fig. 2 is a schematic diagram of a sound signal in a method for generating a transcript according to an embodiment of the present application;
fig. 3 is a schematic diagram of a state relationship of a phoneme of a character in a method for generating a transcript according to an embodiment of the present application;
fig. 4 is a schematic diagram of a framing operation in a method for generating a record provided in an embodiment of the present application;
fig. 5 is a schematic diagram illustrating a corresponding state of a speech frame in a method for generating a record according to an embodiment of the present application;
fig. 6 is a schematic diagram of a court trial entry template in an entry generation method provided in an embodiment of the present application;
fig. 7 is a schematic diagram of a record generation apparatus according to an embodiment of the present application.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of description, some terms or expressions referred to in the embodiments of the present application are explained below:
the court trial record is also called a court record or an trial record, is an indispensable written material for court judgment cases, is a written description which is made by a bookkeeper and synchronously reflects the real situation of all trial activities in the court trial process, is an entire process of case trial, is an important basis for making a judgment according to the court law, is an important material for carrying out trial and supervision in the future, and has obvious important function and significance. The court trial record should objectively, truly, timely and accurately reflect all the activities of the court trial.
The one hundred and forty-seven notes of the civil litigation law shall record all the activities of the court trial and be signed by the judge and the notes. The court book should be read in a court, and can also inform the parties and other litigation participants to read in the court or within five days. Parties and other litigation participants believe that they have missed or missed their statement records and are entitled to apply for correction. If not corrected, the application shall be recorded. Court notes are signed or stamped by parties and other litigation participants. And rejecting signature sealing and attaching a note case. The criminal litigation law has the following two hundred and one hundred pieces: all the activities of court trial should be written into a record by a bookmarker, and after review by the trial leader, the signature is signed by the trial leader and the bookmarker. The witness part in the court writing record should be read in court or handed to the witness for reading. After the witness acknowledges no errors, it should sign or seal. The court writing should be handed to the party for reading or announced to the party. The principal, considering that the notes are missing or erroneous, may request replenishment or correction. The principal acknowledges that there are no errors, and should sign or seal.
Sound cards are also called audio cards: the sound card is the most basic component in multimedia technology, and is a hardware for realizing sound wave/digital signal interconversion. The basic function of the sound card is to convert original sound signals from a microphone, a magnetic tape, or an optical disk and output the converted signals to sound equipment such as an earphone, a speaker, a loudspeaker, or a recorder, or to make a musical instrument emit a beautiful sound through a Musical Instrument Digital Interface (MIDI). The sound card is composed of various electronic devices and connectors. Electronic devices are used to perform a variety of specific functions. The connector generally has two types, namely a socket and a circular jack, and is used for connecting input and output signals.
Phonemes, which are the smallest units in speech, are analyzed according to the pronunciation actions in syllables, with one action constituting a phoneme. Phonemes are divided into two major categories, namely vowels and consonants. For example, the chinese syllable ā (o) has only one phoneme, the ai (i) has two phonemes, the d ā i (slow) has three phonemes, etc. The method of marking speech with international phonetic symbols is called phonetic notation, and includes both broad and strict ones. The broad phonetic notation method uses the distinguishable phoneme phonetic notation, the strict phonetic notation method uses strict phoneme distinction to mark the phonetic notation, the distinction between phonemes is expressed as much as possible, the symbols used by the broad phonetic notation method are limited, the symbols used by the strict phonetic notation method are many, but the two methods have respective purposes. A phoneme is the smallest unit or smallest speech segment constituting a syllable, and is the smallest linear speech unit divided from the viewpoint of sound quality. Phonemes are physical phenomena that exist specifically. The phonetic symbols of international phonetic alphabets (letters designated by the international phonetic society to uniformly designate the voices of various countries, also called "international phonetic alphabets", "universal phonetic alphabets") correspond one-to-one to phonemes of the whole human language.
According to an embodiment of the present application, there is provided a method of generating a stroke record.
Fig. 1 is a flowchart of a method for generating a transcript according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:
step S101, sound collection is carried out through a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to a sound collector, and each sound collector is used for collecting the sound of a using object.
In the present application, sound collection in a court is described as an example. The court has a multi-sound-channel sound card, the sound card is connected to a computer of a bookkeeper, each sound channel on the sound card corresponds to one microphone (corresponding to the sound collector), the sound channels correspond to the court trial roles, and the microphones of all the roles are connected to the sound card according to the corresponding mode. In a court trial, sound of a plurality of characters using microphones is collected by a multi-channel sound card to obtain a sound signal, and the collected sound signal is a piece of waveform data, for example, as shown in fig. 2.
Optionally, to ensure that the sound signal can be acquired through a multi-channel sound card, in the record generating method provided in the embodiment of the present application, the target record is a court trial record, and before acquiring the sound signal through the multi-channel sound card, the method further includes: configuring the corresponding relation between each sound channel on the sound card and each court trial object role; and connecting the sound collector corresponding to each court trial object role with each sound channel according to the corresponding relation.
Or, configuring the corresponding relation between each court trial object role and each sound collector; and then connecting each sound collector with the sound channel corresponding to each court trial object role according to the corresponding relation.
The record generation method according to the embodiment of the present application may be applied to court trial software, that is, the record generation method according to the embodiment of the present application is embedded in the court trial software. Before court trial begins, the court trial software is installed, the corresponding relation of the role sound channels is set in the software, and further, the information of the type of the court trial case, the type of the record, the examination grade and the like can be set in the court trial software before court trial. The information set in the court trial software is referred to in the subsequent generation of the record.
And step S102, analyzing the sound signal to obtain text information corresponding to the sound signal.
The sound signal is analyzed, the text information corresponding to the sound signal can be obtained at the sound card end, the sound signal can also be sent to the voice analysis server, the sound signal is analyzed at the voice analysis server end, and the analyzed text information is returned.
Before the sound signal is analyzed, the analysis server needs to be trained by a large amount of corpora, and stores a large amount of phoneme information in advance, wherein phonemes are pronunciation information of characters, for example, the Chinese language generally directly uses all initials and finals as a phoneme set, and each phoneme is generally divided into a plurality of states, wherein the states mentioned in the application are components of the phoneme. As shown in fig. 3.
Optionally, in the method for generating a record provided in the embodiment of the present application, analyzing the sound signal to obtain text information corresponding to the sound signal includes: dividing voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups; determining a target state corresponding to each voice frame group to obtain a plurality of target states, wherein one voice group corresponds to one target state; searching corresponding phonemes for a first preset number of continuous target states to obtain a plurality of phonemes; generating corresponding words based on a second preset number of continuous phonemes to obtain a plurality of words; text information is generated by a plurality of words.
The predetermined condition in the method for generating a transcript provided in the embodiment of the present application may be a condition that a probability of a plurality of speech frames in a certain state satisfies a probability threshold. In the process of analyzing the sound signal into the corresponding text information, the speech frames in the sound signal are matched in a plurality of states, and the matching can be performed in a traversal mode, for example, 3 continuous speech frames are matched with the states one by one, if the probability does not meet the probability threshold, 4 continuous speech frames are matched with the states one by one until the probability of the speech frames in a certain state meets the probability threshold, and the speech frames are used as a speech frame group. The state is the target state corresponding to the speech frame set. The states are used as components of phonemes, for example, corresponding phonemes are searched for 3 consecutive target states, a plurality of searched phonemes are combined into words, and text information is generated from the plurality of words.
The waveform of the sound signal is subjected to framing operation, as shown in fig. 4, for example, each frame in fig. 4 has a length of 25ms, and each two frames have an overlap of 25-10 ms and 15 ms. It is called to divide the frame by a frame length of 25ms and a frame shift of 10 ms. As shown in fig. 5, each small vertical bar represents a frame, a plurality of frames of speech correspond to a state, (which is equivalent to dividing the speech frames in the sound signal according to a predetermined condition to obtain a plurality of speech frame groups), each three (first predetermined number) states are combined into a phoneme, and a plurality of (second predetermined number) phonemes are combined into a word. Text information is generated by a plurality of words.
Optionally, in the method for generating a transcript provided in the embodiment of the present application, determining a target state corresponding to each voice frame group, and obtaining multiple target states includes: calculating the corresponding probability of each voice frame group on a plurality of states; acquiring a state corresponding to the probability that the voice frame group meets a preset condition; and taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.
The speech recognition process searches for an optimal path in the state network, i.e. the probability of a speech frame being at the target state is the highest. The path search algorithm is an algorithm for dynamically planning pruning, called Viterbi algorithm, and is used for finding a globally optimal path. In the application, the state with the highest probability of the voice frame on a plurality of states can be obtained by adopting a path searching mode and is used as the target state corresponding to the voice frame group. After the target state corresponding to the voice frame group is obtained, the matched phoneme is finally found through a preset algorithm, and the corresponding character information is inversely calculated, so that the process of analyzing the character information through voice recognition is completed.
Step S103, generating a target record based on the text information.
There are different format requirements for different notes. Therefore, the target record needs to be generated based on the parsed text information according to the template format of the target record.
Optionally, in order to generate an accurate target record, in the record generating method provided in the embodiment of the present application, the sound signal is a sound signal of a court trial case collected by a sound collector at a court trial site, and generating the target record based on the text information includes: carrying out error correction processing on the text information; determining the type of a court trial case, and acquiring a corresponding record template according to the type of the court trial case; adding corrected text information to a target position in a record template to generate a record to be processed; and processing the record to be processed to obtain the target record.
Because the text information is obtained through sound signal analysis, errors on some characters may exist, and the characters in the text information are subjected to error correction processing in the process of generating the target record based on the text information, so that the accuracy of the characters in the text information is ensured, and the errors of the characters in the target record generated subsequently are avoided. For the types of court trial cases, the cases described by the cases can be received and input from the outside, or characters in text information obtained by analysis are input into a case type judgment model, the case type corresponding to the case is determined by using the case type judgment model, and the case type judgment model is used for training and learning referee documents of various case types in advance, so that the corresponding case types can be judged for the input characters; for example, the characters in the text information obtained by analysis are input into a case type judgment model, the case type corresponding to the case is determined to be a civil case through the case type judgment model, a record template corresponding to the civil case is stored in a database in advance, and the record template corresponding to the case type is obtained. As shown in fig. 6, the template of the record is generally divided into three parts, the first part is the record head; the second part is a content area; the third part is a signature area. The corrected text information is added to the content area. And then adding corresponding contents to the record head and the signature area in the record template to obtain a complete target record.
The method for generating the record in the embodiment of the application is applied to court trial software, the record template is automatically extracted according to case information set before a court, microphone sound collected by a sound card is converted into a digital signal, the digital signal is sent to a voice analysis server, an analysis result returned by the voice analysis server is received, the analysis result is displayed according to a template format, and the court trial record is generated. Through the steps, necessary link work in the court trial process is optimized and improved, the original mode of manually selecting the templates and recording characters is improved to the level of intelligent identification and automatic generation, the working efficiency is greatly improved, and the production mode of the court trial record is quicker and the content is more accurate and complete.
According to the record generation method provided by the embodiment of the application, sound collection is carried out through a multi-channel sound card to obtain sound signals, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a user object; analyzing the sound signal to obtain text information corresponding to the sound signal; and the target record is generated based on the text information, so that the problem of low efficiency of recording the court trial record in the court trial process in the related technology is solved. The collected sound signals are converted into corresponding text information, and the record is generated based on the text information, so that the effect of improving the efficiency of generating the court trial record is achieved.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
The embodiment of the present application further provides a record generating device, and it should be noted that the record generating device of the embodiment of the present application may be used to execute the record generating method provided in the embodiment of the present application. The following describes a record generation apparatus provided in an embodiment of the present application.
Fig. 7 is a schematic diagram of a record generation apparatus according to an embodiment of the present application. As shown in fig. 7, the apparatus includes: an acquisition unit 10, an analysis unit 20 and a generation unit 30.
Specifically, the collecting unit 10 is configured to collect sounds through a multi-channel sound card to obtain a sound signal, where each channel on the sound card corresponds to a sound collector, and each sound collector is configured to collect sounds of a user.
The analyzing unit 20 is configured to analyze the sound signal to obtain text information corresponding to the sound signal.
A generating unit 30 for generating a target entry based on the text information.
The record generating device provided by the embodiment of the application acquires sound through the acquisition unit 10 by using a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a user object; the analysis unit 20 analyzes the sound signal to obtain text information corresponding to the sound signal; the generation unit 30 generates the target record based on the text information, so that the problem of low efficiency of recording the court trial record in the court trial process of the court in the related art is solved, the collected sound signals are converted into the corresponding text information, and the record is generated based on the text information, so that the effect of improving the efficiency of generating the court trial record is achieved.
Optionally, in the apparatus for generating a record provided in the embodiment of the present application, the parsing unit 20 includes: the dividing module is used for dividing the voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups; the searching module is used for determining the target state corresponding to each voice frame group to obtain a plurality of target states; the matching module is used for searching corresponding phonemes for the continuous first preset number of target states to obtain a plurality of phonemes; the first generation module is used for generating corresponding words based on continuous second preset number of phonemes to obtain a plurality of words; and the second generation module is used for generating text information through a plurality of words.
Optionally, in the device for generating a record provided in the embodiment of the present application, the searching module includes: the calculation submodule is used for calculating the corresponding probability of each voice frame group on a plurality of states; the obtaining submodule is used for obtaining the state corresponding to the probability that the voice frame group meets the preset condition; and the determining submodule is used for taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.
The record generating device comprises a processor and a memory, the acquiring unit 10, the analyzing unit 20, the generating unit 30 and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can set one or more, and the notes are generated by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present application provides a storage medium on which a program is stored, and the program implements the record generation method when executed by a processor.
The embodiment of the application provides a processor, wherein the processor is used for running a program, and the method for generating the record is executed when the program runs.
The embodiment of the application provides equipment, the equipment comprises a processor, a memory and a program which is stored on the memory and can run on the processor, and the following steps are realized when the processor executes the program: sound collection is carried out through a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a using object; analyzing the sound signal to obtain text information corresponding to the sound signal; and generating a target record based on the text information.
Analyzing the sound signal to obtain text information corresponding to the sound signal includes: dividing the voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups; determining a target state corresponding to each voice frame group to obtain a plurality of target states; searching corresponding phonemes for a first preset number of continuous target states to obtain a plurality of phonemes; generating corresponding words based on a second preset number of continuous phonemes to obtain a plurality of words; generating the text information by the plurality of words.
Determining a target state corresponding to each voice frame group, and obtaining a plurality of target states comprises: calculating the corresponding probability of each voice frame group on a plurality of states; acquiring a state corresponding to the probability that the voice frame group meets a preset condition; and taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.
Generating a target transcript based on the textual information includes: carrying out error correction processing on the text information; determining the type of a court trial case, and acquiring a corresponding record template according to the type of the court trial case; adding corrected text information to the target position in the record template to generate a record to be processed; and processing the record to be processed to obtain the target record.
The target record is a court trial record, and before sound acquisition is performed through a multi-channel sound card to obtain a sound signal, the method further comprises the following steps: configuring the corresponding relation between each sound channel on the sound card and each court trial object role; and connecting the sound collector corresponding to each court trial object role with each sound channel according to the corresponding relation. The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device: sound collection is carried out through a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a using object; analyzing the sound signal to obtain text information corresponding to the sound signal; and generating a target record based on the text information.
Analyzing the sound signal to obtain text information corresponding to the sound signal includes: dividing the voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups; determining a target state corresponding to each voice frame group to obtain a plurality of target states; searching corresponding phonemes for a first preset number of continuous target states to obtain a plurality of phonemes; generating corresponding words based on a second preset number of continuous phonemes to obtain a plurality of words; generating the text information by the plurality of words.
Determining a target state corresponding to each voice frame group, and obtaining a plurality of target states comprises: calculating the corresponding probability of each voice frame group on a plurality of states; acquiring a state corresponding to the probability that the voice frame group meets a preset condition; and taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.
Generating a target transcript based on the textual information includes: carrying out error correction processing on the text information; determining the type of a court trial case, and acquiring a corresponding record template according to the type of the court trial case; adding corrected text information to the target position in the record template to generate a record to be processed; and processing the record to be processed to obtain the target record.
The target record is a court trial record, and before sound acquisition is performed through a multi-channel sound card to obtain a sound signal, the method further comprises the following steps: configuring the corresponding relation between each sound channel on the sound card and each court trial object role; and connecting the sound collector corresponding to each court trial object role with each sound channel according to the corresponding relation.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (7)

1. A method for generating a note, comprising:
sound collection is carried out through a multi-channel sound card to obtain a sound signal, wherein each channel on the sound card corresponds to one sound collector, and each sound collector is used for collecting the sound of a using object;
analyzing the sound signal to obtain text information corresponding to the sound signal; and
generating a target record based on the text information;
analyzing the sound signal to obtain text information corresponding to the sound signal includes:
dividing the voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups;
determining a target state corresponding to each voice frame group to obtain a plurality of target states;
searching corresponding phonemes for a first preset number of continuous target states to obtain a plurality of phonemes;
generating corresponding words based on a second preset number of continuous phonemes to obtain a plurality of words;
generating the text information by the plurality of words;
the sound signal is the sound signal of court trial cases collected through a sound collector on the court trial site, and the generating of the target record based on the text information comprises the following steps:
carrying out error correction processing on the text information;
determining the type of a court trial case, and acquiring a corresponding record template according to the type of the court trial case;
adding corrected text information to the target position in the record template to generate a record to be processed;
and processing the record to be processed to obtain the target record.
2. The method of claim 1, wherein determining the target state for each group of speech frames, and wherein obtaining a plurality of target states comprises:
calculating the corresponding probability of each voice frame group on a plurality of states;
acquiring a state corresponding to the probability that the voice frame group meets a preset condition;
and taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.
3. The method of claim 1, wherein the target record is a trial record, and wherein the method further comprises, before obtaining the sound signal through sound collection by a multi-channel sound card:
configuring the corresponding relation between each sound channel on the sound card and each court trial object role;
and connecting the sound collector corresponding to each court trial object role with each sound channel according to the corresponding relation.
4. A stylus generation apparatus, comprising:
the system comprises a collecting unit, a processing unit and a processing unit, wherein the collecting unit is used for collecting sound through a multi-channel sound card to obtain a sound signal, each channel on the sound card corresponds to a sound collector, and each sound collector is used for collecting the sound of a using object;
the analysis unit is used for analyzing the sound signals to obtain text information corresponding to the sound signals; and
a generating unit configured to generate a target entry based on the text information;
wherein the parsing unit includes:
the dividing module is used for dividing the voice frames in the voice signals according to preset conditions to obtain a plurality of voice frame groups;
the searching module is used for determining the target state corresponding to each voice frame group to obtain a plurality of target states;
the matching module is used for searching corresponding phonemes for the continuous first preset number of target states to obtain a plurality of phonemes;
the first generation module is used for generating corresponding words based on continuous second preset number of phonemes to obtain a plurality of words;
a second generating module, configured to generate the text information through the plurality of words;
the sound signal is a sound signal of a court trial case collected by a sound collector on a court trial site, and the generating unit is further used for carrying out error correction processing on the text information; determining the type of a court trial case, and acquiring a corresponding record template according to the type of the court trial case; adding corrected text information to the target position in the record template to generate a record to be processed; and processing the record to be processed to obtain the target record.
5. The apparatus of claim 4, wherein the lookup module comprises:
the calculation submodule is used for calculating the corresponding probability of each voice frame group on a plurality of states;
the obtaining submodule is used for obtaining the state corresponding to the probability that the voice frame group meets the preset condition;
and the determining submodule is used for taking the state corresponding to the probability meeting the preset condition as the target state corresponding to the voice frame group to obtain a plurality of target states.
6. A storage medium characterized by comprising a stored program, wherein the program executes the record generation method of any one of claims 1 to 3.
7. A processor, configured to execute a program, wherein the program executes the method of generating a transcript as claimed in any one of claims 1 to 3.
CN201710525292.4A 2017-06-30 2017-06-30 Method and device for generating notes Active CN109213970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710525292.4A CN109213970B (en) 2017-06-30 2017-06-30 Method and device for generating notes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710525292.4A CN109213970B (en) 2017-06-30 2017-06-30 Method and device for generating notes

Publications (2)

Publication Number Publication Date
CN109213970A CN109213970A (en) 2019-01-15
CN109213970B true CN109213970B (en) 2022-07-29

Family

ID=64961294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710525292.4A Active CN109213970B (en) 2017-06-30 2017-06-30 Method and device for generating notes

Country Status (1)

Country Link
CN (1) CN109213970B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222158B (en) * 2019-06-20 2022-02-01 北京市律典通科技有限公司 Information processing method and system for simultaneously examining and managing multiple cases
CN111863041B (en) * 2020-07-17 2021-08-31 东软集团股份有限公司 Sound signal processing method, device and equipment
CN114357256B (en) * 2021-12-17 2022-11-15 江苏中智系统集成工程有限公司 Information processing method and system based on court paperless court trial

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005010868A1 (en) * 2003-07-29 2005-02-03 Mitsubishi Denki Kabushiki Kaisha Voice recognition system and its terminal and server
CN1585018A (en) * 2004-06-15 2005-02-23 梁国雄 Comptuer recoding information system for court
CN102867512A (en) * 2011-07-04 2013-01-09 余喆 Method and device for recognizing natural speech
CN103677729A (en) * 2013-12-18 2014-03-26 北京搜狗科技发展有限公司 Voice input method and system
CN106157956A (en) * 2015-03-24 2016-11-23 中兴通讯股份有限公司 The method and device of speech recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005010868A1 (en) * 2003-07-29 2005-02-03 Mitsubishi Denki Kabushiki Kaisha Voice recognition system and its terminal and server
CN1585018A (en) * 2004-06-15 2005-02-23 梁国雄 Comptuer recoding information system for court
CN102867512A (en) * 2011-07-04 2013-01-09 余喆 Method and device for recognizing natural speech
CN103677729A (en) * 2013-12-18 2014-03-26 北京搜狗科技发展有限公司 Voice input method and system
CN106157956A (en) * 2015-03-24 2016-11-23 中兴通讯股份有限公司 The method and device of speech recognition

Also Published As

Publication number Publication date
CN109213970A (en) 2019-01-15

Similar Documents

Publication Publication Date Title
CN109065031B (en) Voice labeling method, device and equipment
US20220059096A1 (en) Systems and Methods for Improved Digital Transcript Creation Using Automated Speech Recognition
CN110148394B (en) Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium
CN101785048B (en) HMM-based bilingual (mandarin-english) TTS techniques
CN104078044B (en) The method and apparatus of mobile terminal and recording search thereof
US8818813B2 (en) Methods and system for grammar fitness evaluation as speech recognition error predictor
CN109256152A (en) Speech assessment method and device, electronic equipment, storage medium
CN111785275A (en) Voice recognition method and device
CN102376182B (en) Language learning system, language learning method and program product thereof
CN109377981B (en) Phoneme alignment method and device
CN109213970B (en) Method and device for generating notes
CN112259083B (en) Audio processing method and device
CN110610698A (en) Voice labeling method and device
CN111180025A (en) Method and device for representing medical record text vector and inquiry system
CN109213977A (en) The generation system of court's trial notes
CN116246610A (en) Conference record generation method and system based on multi-mode identification
WO2021012495A1 (en) Method and device for verifying speech recognition result, computer apparatus, and medium
CN109559752B (en) Speech recognition method and device
CN113593522A (en) Voice data labeling method and device
CN109213971A (en) The generation method and device of court's trial notes
CN109213466B (en) Court trial information display method and device
CN109559753B (en) Speech recognition method and device
CN111462760A (en) Voiceprint recognition system, method and device and electronic equipment
CN113299276B (en) Multi-person multi-language identification and translation method and device
CN113691382A (en) Conference recording method, conference recording device, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant