CN108364653B

CN108364653B - Voice data processing method and processing device

Info

Publication number: CN108364653B
Application number: CN201810145265.9A
Authority: CN
Inventors: 王磊
Original assignee: Individual
Current assignee: Individual
Priority date: 2018-02-12
Filing date: 2018-02-12
Publication date: 2021-08-13
Anticipated expiration: 2038-02-12
Also published as: CN108364653A

Abstract

The invention provides a voice data processing method and a processing device, wherein the method comprises the following steps: acquiring voice information; synchronously converting the acquired voice information into first text information, and displaying the first text information in a first area of a display device; responding to the first operation, performing word processing on first word information to be edited, and displaying the processed word information in a second area of the display device; in response to a second operation, synchronizing the unmodified first text information to the second area and generating second text information; and responding to a third operation, generating third text information based on the second text information, and displaying the third text information in a third area of the display device. According to the voice data processing method provided by the embodiment of the invention, the voice information can be accurately converted into the text information in real time.

Description

Voice data processing method and processing device

Technical Field

The invention relates to the field of audio and video data processing, in particular to a processing method and a processing device for a survey interrogation note.

Background

The examination work of a scouting house is used as an indispensable important link in case detection, in the prior art, when case handling examination is carried out, case handling personnel often cannot record the whole conversation record in the examination process comprehensively and accurately, and can only record the case handling personnel one by one in real time in the case handling process, the workload is high, the efficiency is low, and deviation can be caused, so that the record information of the case is not complete and accurate enough, meanwhile, the understanding of the whole record data of the same case is not easy to master, the examination efficiency is greatly reduced, and the progress of case detection is influenced.

The interrogation recording work who plays crucial role in the current interrogation work faces the trend that promotes the transformation, and along with the deepening investment of science and technology strong inspection and the continuous progress of modernized science and technology, the inspection organ has all had higher requirement to convenience, accuracy and the advance of interrogation writing work.

Disclosure of Invention

In view of the above, the present invention provides a method and a device for processing voice data, so as to accurately convert voice information into text information in real time.

In order to solve the above-mentioned problems, according to an aspect of the present invention, there is provided a voice data processing method, the method comprising: acquiring voice information; synchronously converting the acquired voice information into first text information, and displaying the first text information in a first area of a display device; responding to the first operation, performing word processing on first word information to be edited, and displaying the processed word information in a second area of the display device; in response to a second operation, synchronizing the unmodified first text information to the second area and generating second text information; and responding to a third operation, generating third text information based on the second text information, and displaying the third text information in a third area of the display device.

In order to solve the above-mentioned problems, according to another aspect of the present invention, there is provided a voice data processing method, the method comprising: acquiring voice information; synchronously converting the acquired voice information into first text information, and displaying the first text information in a first area of a display device; responding to a fourth operation, performing word processing on the first word information to generate second word information, and displaying the second word information in a second area of the display device; and responding to a third operation, generating third text information based on the second text information, and displaying the third text information in a third area of the display device.

In order to solve the above-mentioned problems, according to still another aspect of the present invention, there is provided a voice data processing apparatus comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores commands executable by the at least one processor, the commands being executed by the at least one processor to enable the at least one processor to perform obtaining voice information; synchronously converting the acquired voice information into first text information, and displaying the first text information in a first area of a display device; responding to the first operation, performing word processing on first word information to be edited, and displaying the processed word information in a second area of the display device; in response to a second operation, synchronizing the unmodified first text information to the second area and generating second text information; and responding to a third operation, generating third text information based on the second text information, and displaying the third text information in a third area of the display device.

The voice data processing method and the processing system provided by the embodiment of the invention can display the voice simultaneous recording, the proofreading record and the final record on the same display equipment, and can also realize the same-case retrieval and know the related information on the same screen in the interrogation process, thereby greatly facilitating the production of the field record when an inspection institution handles the case.

It should be understood that the voice data processing method and device of the invention can be applied not only in case interrogation process, but also in any other occasions where voice information needs to be converted into text information in real time and where there is a high requirement for the precision of the text information.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:

FIG. 1 is a flow chart of a method of processing voice data according to an embodiment of the present invention;

FIG. 2 is a block diagram of a hospital case handling system according to an embodiment of the present invention;

FIG. 3A is a diagram of a first textual message displayed in a first area;

FIG. 3B is a diagram illustrating the processed text message displayed in the second area;

FIG. 4 schematically illustrates a flow chart for querying first, second and/or third textual information for relevant information; and

fig. 5 is a block diagram of a voice data processing apparatus according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention are described below with reference to the drawings. Elements and features described in one drawing or one embodiment of the invention may be combined with elements and features shown in one or more other drawings or embodiments. It should be noted that the figures and description omit representation and description of components and processes that are not relevant to the present invention and that are known to those of ordinary skill in the art for the sake of clarity.

It will be understood by those skilled in the art that the terms "first", "second", etc. in the present invention are only used for distinguishing different units, modules or steps, etc., and do not represent any specific technical meaning or necessary logical sequence between them, nor represent the importance of the different units, modules or steps defined by them.

In the embodiments, only features different from those of the other embodiments are described with emphasis, and features identical or similar to those of the other embodiments are omitted.

The inventor of the invention finds that the editing and the production of the on-site record are greatly facilitated by displaying the voice simultaneous recording data, the proofreading record and the final on-site record generated according to the proofreading record in a same display device in a split screen mode.

Referring to fig. 1, there is shown a flow chart of a voice data processing method according to an embodiment of the present invention, the method S100 including the steps of: acquiring voice information S110; synchronously converting the acquired voice information into first text information, and displaying the first text information in a first area S130 of the display device; in response to the first operation, performing word processing on the first text information to be edited, and displaying the processed text information in a second area of the display device S150; synchronizing the unmodified first text information to the second area in response to a second operation to generate second text information S170; in response to a third operation, third text information is generated based on the second text information and displayed in a third area of the display device S190.

The following describes the voice data processing method S100 according to the embodiment of the present invention specifically in a scenario of entry and recording during examination at a scout. It should be understood that the voice data processing method of the embodiment of the present invention is not limited to the trial scenario, and can be applied to various situations where voice input is converted into editable text information.

Fig. 2 is a block diagram of a structure of a hospital admission system according to an embodiment of the present invention. As shown in fig. 2, the case handling system may include a plurality of collecting devices disposed in the examination room, and respectively used for collecting images and voice information of the case handling person and the suspect. In particular, the acquisition device may be a plurality of cameras disposed in the interrogation chamber at different locations.

The interrogation personnel can control the opening or closing of different cameras through an interrogation host as a controller, and the voice data processing device sends the collected voice signals.

The voice data processing device can convert the acquired voice signals, perform word processing and finally display the required record information on the display equipment.

In another possible example, a dedicated voice server may be configured to process the captured voice signals.

Referring to fig. 1, in step S110, in response to a start operation by an operator, the voice data processing apparatus starts receiving input voice data. In the present embodiment, the voice data may be a conversation between the inspector and the criminal suspect collected in real time by a collecting device such as a camera provided in the auditorium.

In one possible example, the start operation may be the operator selecting a start virtual button provided on the control device by clicking or touching. The control device may be a dedicated control device of a specially developed system, or may be implemented by installing software on an existing device such as a smart mobile terminal, a tablet computer, or the like.

After an operator clicks a start button, a collection device arranged in the interrogation chamber is activated to start to acquire image information and voice information in the interrogation chamber, and then the acquired voice information is transmitted to the voice data processing device.

In step S120, the voice data processing apparatus may synchronously convert the acquired voice information into text information using various voice recognition engines stored therein. Hereinafter, the text information converted by the speech recognition engine will be referred to as first text information for convenience.

For example, various speech recognition engines may be constructed using an algorithm model such as a markov speech recognition model, a neural network algorithm, a support vector machine, or the like, with reference to the related art.

Then, the first character information is displayed in real time in a specific area of the display device. Hereinafter, this region is referred to as a first region for convenience. Meanwhile, a special Identification (ID) and a storage space for storing the first text information may be configured in the processing device.

In one possible example, a special window may be provided for displaying the converted text information in real time, and the operator may select the position where the window is placed by dragging or the like according to the need. In another possible example, the display area may be divided into a plurality of areas, and different information may be displayed in different areas, as needed. For example, the display area may be fixedly divided into 3 areas, a left area, a middle area, and a right area, and recognized text information may be displayed in one of the areas, such as the left area.

In this embodiment, the trial record may be displayed based on the identity information of the personnel participating in the trial. For example, when the examiner speaks, the speech recognition engine starts recognizing the speech information of the speaker, the processing device displays the recognized speech information and the speaker identification information (person a) in the first area, and when the speaking person changes from the examiner to a criminal suspect, the speech recognition engine judges the role of the speaker to change according to the change of the voice, and at this time, the switching section displays the speech information of another person (person B) newly recognized.

In one possible example, the identity information of the identified speaker may be set by the operator. For example, the operator may modify "person a" to be "reviewer" and "person B" to be "criminal suspect" in the displayed first text message. In yet another possible example, different character information may be distinguished according to voice information acquired by different audio acquisition devices.

In one possible embodiment, the first text information includes a time stamp of the acquired voice information. Fig. 3A is a schematic diagram of first text information displayed in the first area. As shown in fig. 3A, the voice data processing apparatus may record a time stamp of the acquisition of the voice information and display the recorded time stamp in the first area together with the identification information of the identified speaker and the voice information.

Next, in step S150, in response to a first operation by the operator, the first character information is subjected to character processing.

In one possible example, the first operation may be a double-click operation after the first information to be edited is selected. Alternatively, the editing operation may be implemented by clicking a preset button after the first information to be edited is selected. When the user selects the first information to be edited, the information to be edited can be presented in a specific area of the display device. Hereinafter, this region is referred to as a second region for convenience.

As described above, a special window may be provided for displaying the selected text message to be edited in real time. Likewise, the text information to be edited may be displayed in the middle area of the display device in the three-split display manner described above.

Then, the operator may perform operations such as adding and deleting on the text information in the second area, and display the modified text information. In one possible example, the modify operation may modify the text stored in the processing device by the processing device invoking the background interface with an Identification (ID) of the first text information.

Fig. 3B is a schematic diagram of displaying the processed text message in the second area. The process of preprocessing the recorded trial information is described in detail below in conjunction with fig. 3A and 3B.

As shown in fig. 3A, the first area of the display device displays the collected inquiries of the inquiries and the suspect. In the example shown in fig. 3A and 3B, the record of the suspect needs to be initially processed to remove the information irrelevant to the contents of the interrogation, i.e., "haha" of the two words. In order to modify the text information, the operator selects the corresponding text information, i.e. the text information corresponding to the timestamp "10: 50: 55" in the figure, by means of touch or mouse click, for example. After the operator selects the information to be edited, the information, including the timestamp identification information, may be extracted into the second region by a first operation, such as double-clicking the information. The operator may then delete the two words "haha" in the second region. As a result, as shown in fig. 3B, the modified text information is displayed in the second area.

According to the above description, the operator performs the first operation to pre-process the text information to be edited, and displays the pre-processed text information in the second area, and at this time, the information that is not selected for modification processing is not displayed in the second area. In step S170, in response to a second operation, synchronizing the unmodified first text information to the second area, and generating complete preprocessed entry information. In the following, the complete preprocessed text information presented in the second area is made the second text information.

The second operation may be performed, for example, by the operator clicking a preset virtual button on the control device.

In one possible embodiment, the second textual information is editable. After the second text information is generated, the operator can edit the second text information in detail.

Next, in step S190, in response to a third operation by the operator, third text information is generated based on the second text information, and the third text information is displayed in a third area S190 of the display device.

As described above, the third operation may be implemented by the operator clicking a preset virtual button on the control device.

In one possible embodiment, the third textual information is editable.

For example, after the operator triggers the third operation, a window may be newly created in a special area, i.e., the third area, to display the generated final script, i.e., the third text message. Similarly, as described above, the three-division display method described above may be adopted, in which the first character information is displayed in the left area of the display device, the second character information is displayed in the middle area of the display device, and the third character information is displayed in the right area of the display device.

It should be understood that in this embodiment, the third area is not dedicated to displaying the third textual information. For example, after the third textual information is generated and saved, the operator may close a window for displaying the third textual information and use the third area for displaying additional information.

The above describes a voice data processing method according to an embodiment of the present invention with reference to fig. 1-3B, and by using the method described in this embodiment, the voice simultaneous recording, the proofreading record and the final record can be displayed on the same display device, thereby greatly facilitating the production of the on-site record when the examining organization handles the case. It should be understood that the voice data processing method described in this embodiment may not only be applied in the case interrogation process, but also be applied in any other occasions where it is necessary to convert voice information into text information in real time and where there is a high requirement for the accuracy of the text information.

In another embodiment of the invention, related information can be inquired in case record information displayed in a split screen mode. Fig. 4 schematically shows a flow chart for querying the relevant information in the first, second and/or third text information. In this embodiment, any sentence in the first text information, the second text information, or the third text information may be selected, so that the query of the related information can be performed in the whole record. Hereinafter, taking an example of selecting a certain sentence from the second text information, the operation of querying related information in the case record will be described in detail with reference to fig. 4.

The operator can arbitrarily select the character information desired to be understood in units of sentences from the second character information displayed in the second area. For example, in step S410, when the operator selects a sentence "the account number of me transportation bank is changed by 100 ten thousand after three years" in the second area by clicking, in step S430, the selected sentence may be subjected to word segmentation, and words without practical meaning are removed according to a preset rule, so as to extract key words, i.e., "three years", "transportation bank", "account number", and "1000 ten thousand".

Then, in step S450, a sentence referring to "zhang san", "transportation bank", "account", or "1000 ten thousand" is queried in the first, second, and/or third text information, and the queried associated text information is displayed. For example, in one possible example, all of the retrieved associated textual information may be displayed in a third area in a particular area. In another possible example, the queried related information may be highlighted in the first, second and/or third textual information shown on the same screen.

In the embodiment, the same case retrieval analysis is presented on the same screen, so that the comprehensive recording of the case interrogation process can be realized, the case can be detected quickly, and the best implementation effect is achieved. It should be understood that when the speech data processing method according to the embodiment is applied to other word processing scenarios, all information that the operator desires to know can be quickly analyzed and located.

In another embodiment of the present invention, the first text information may be preprocessed by a preset text processing rule, so as to generate the second text information.

Specifically, the operator may add different words to the rule base, and when generating the second text information, may first directly perform operations, such as replacement, deletion, and the like, matching the preset rule on the first information text according to the preset rule.

For example, in one possible example, in response to a fourth operation by the user, such as by clicking or touching a virtual button provided on the operation interface that performs a preprocessing function, an operation matching various rules in the rule file may be automatically performed on the first text information, thereby directly generating the second text information. For example, when there are hip-hop words such as "haha" in the selected text, the words can be automatically deleted by a preset rule.

In another possible example, as described in the above embodiment, after the operator selects the sentence to be edited from the first text information, the operator may first directly perform predetermined processing such as replacement, deletion, and the like on the selected text according to the preset rule and the text information matched with the words in the rule file, and then confirm whether further modification is needed by the operator. After the modification is completed, the unmodified first text information is synchronized to the second area in response to the second operation through the operation of the operator, as described in step S170 in the above embodiment.

In this embodiment, the voice information is obtained, and the first text information and the third text information are generated and displayed in a manner similar to that described above, which is not repeated herein.

According to the voice data processing method of the embodiment, the first text information can be preprocessed according to the preset rule, so that the generation of the second text information is simplified.

The voice data processing method according to the embodiment of the present invention is described above with reference to fig. 1 to 4. In fact, the present invention also provides a voice data processing apparatus for executing the above voice data processing method. Fig. 5 is a block diagram of a voice data processing apparatus according to an embodiment of the present invention. Referring to fig. 5, the voice data processing apparatus includes:

a memory 53 and one or at least a processor 51;

wherein the memory 53 is communicatively coupled to the one or more processors 51, the memory 53 having stored therein instructions executable by the one or more processors 51 to cause the one or more processors 51 to perform: acquiring voice information; synchronously converting the acquired voice information into first text information, and displaying the first text information in a first area of a display device; responding to the first operation, performing word processing on first word information to be edited, and displaying the processed word information in a second area of the display device; in response to a second operation, synchronizing the unmodified first text information to the second area and generating second text information; and responding to a third operation, generating third text information based on the second text information, and displaying the third text information in a third area of the display device.

The processor may execute the voice data processing method according to one embodiment described with reference to fig. 1 to fig. 3B, or execute the voice data processing method according to another embodiment described with reference to fig. 4, and details are not described herein again.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Finally, it should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Furthermore, without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method of speech data processing, comprising:

acquiring voice information;

synchronously converting the acquired voice information into first text information, and displaying the first text information in a first area of a display device;

responding to the first operation, performing word processing on first word information to be edited, and displaying the processed word information in a second area of the display device;

the first text information to be edited is subjected to text processing, and the operation matched with the preset rule is directly executed according to the preset rule;

responding to a second operation, synchronizing unmodified first text information to the second area, and generating second text information, wherein the unmodified first text information comprises information of the first text information, and operation matched with a preset rule is not executed; and responding to a third operation, generating third text information based on the second text information, and displaying the third text information in a third area of the display device.

2. The voice data processing method according to claim 1, characterized in that: the first text information comprises a time stamp for acquiring the voice information.

3. The voice data processing method according to claim 1, characterized in that:

the second textual information is editable.

4. The voice data processing method according to claim 3, wherein:

and under the condition that specific characters in the first, second or third character information are selected, performing word segmentation processing on the selected characters, performing word segmentation query in the first, second and/or third character information, and displaying and querying all associated character information.

5. A speech data processing apparatus comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the memory stores commands executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.