CN110619897A - Conference summary generation method and vehicle-mounted recording system - Google Patents

Conference summary generation method and vehicle-mounted recording system Download PDF

Info

Publication number
CN110619897A
CN110619897A CN201910713072.3A CN201910713072A CN110619897A CN 110619897 A CN110619897 A CN 110619897A CN 201910713072 A CN201910713072 A CN 201910713072A CN 110619897 A CN110619897 A CN 110619897A
Authority
CN
China
Prior art keywords
information
text information
voice information
text
collected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910713072.3A
Other languages
Chinese (zh)
Inventor
高文宝
苏宁
孙家鑫
龚兆业
曾富安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Varitronix Ltd
Original Assignee
Varitronix Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Varitronix Ltd filed Critical Varitronix Ltd
Priority to CN201910713072.3A priority Critical patent/CN110619897A/en
Publication of CN110619897A publication Critical patent/CN110619897A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application is suitable for the technical field of data processing, and provides a method for generating a meeting summary and a vehicle-mounted recording system, wherein the method for generating the meeting summary comprises the steps of collecting voice information, converting the collected voice information into text information, editing the text information according to the collected voice information and/or converted text information, and generating and editing the meeting summary corresponding to the text information. This application is through automatic acquisition speech information to convert and edit it, can the automatic generation meet the meeting summary of requirement.

Description

Conference summary generation method and vehicle-mounted recording system
Technical Field
The present application relates to the field of data processing, and in particular, to a method for generating a conference summary, a vehicle-mounted recording system, and a computer-readable storage medium.
Background
In order to facilitate the subsequent review of the content of a certain conference, a corresponding conference summary is generally generated for each held conference, especially for the application scenarios of mobile office, such as a two-party in-car conference, a multi-party in-car conference, a two-party in-car conference, or a multi-party in-car conference, so that the demand for recording the conference content will increase.
In the prior art, the conference summary is mainly generated in a manual recording mode, however, the mode is not only easy to miss recording, but also easy to record wrongly, and the conference summary meeting the requirements is difficult to generate.
Therefore, it is necessary to provide a new technical solution to solve the technical problems.
Disclosure of Invention
In view of this, the embodiment of the present application provides a method for generating a conference summary and a vehicle-mounted recording system, which can automatically generate a conference summary meeting requirements by automatically acquiring voice information and converting and editing the voice information.
A first aspect of an embodiment of the present application provides a method for generating a conference summary, including:
collecting voice information;
converting the collected voice information into text information;
editing the text information according to the collected voice information and/or the converted text information;
generating a meeting summary corresponding to the edited text information.
In one embodiment, before converting the collected voice information into text information, the method further includes:
deleting blank voice information in the voice information;
determining the splicing sequence among the residual voice information;
and splicing the residual voice information according to the determined splicing sequence.
In one embodiment, editing the text information according to the collected voice information and/or the converted text information comprises:
determining the position of punctuation marks needing to be added in the text information according to the position of each pause in the voice information and/or the position of each tone word in the text information;
determining the type of punctuation marks to be added at the position according to the duration and/or tone of each pause in the voice information;
and adding punctuation marks to the text information according to the determined position and type.
In one embodiment, editing the text information according to the collected voice information and/or the converted text information further comprises:
determining the position of a paragraph needing to be divided in the text information according to the converted semantic meaning of the text information;
and segmenting the text information according to the determined position of the paragraph needing to be divided.
In one embodiment, editing the text information according to the collected voice information and/or the converted text information further comprises:
collecting image information;
collecting pressure information on each seat;
determining the sound source position corresponding to each section of collected voice information according to the collected voice information, image information and/or pressure information;
generating an identity of a speaker corresponding to each sound source position according to the conference seat list;
and sequentially adding the identity of the corresponding speaker to the corresponding text information according to the attribute of each section of voice information.
In one embodiment, converting the collected voice information into text information comprises:
acquiring a preset target language;
and converting the collected voice information into target text information according to the acquired target language.
In one embodiment, before editing the text information according to the collected voice information and/or the converted text information, the method further includes:
and extracting key information in the text information.
In one embodiment, before the collecting the voice information, the method further comprises:
the operating state of the vehicle is determined.
A second aspect of the embodiments of the present application provides an in-vehicle sound recording system, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the method mentioned in the first aspect when executing the computer program.
A third aspect of embodiments of the present application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method mentioned in the first aspect.
A fourth aspect of the embodiments of the present application provides a computer program product, which, when running on an onboard sound recording system, causes the onboard sound recording system to execute the method for generating a conference summary of any one of the first aspects.
Compared with the prior art, the embodiment of the application has the advantages that: in this embodiment, first, voice information is collected, then, the collected voice information is converted into text information, then, the text information is edited according to the collected voice information and/or the converted text information, and finally, a conference summary corresponding to the edited text information is generated. Compared with the prior art, the meeting summary meeting the requirements can be automatically generated through the embodiment of the application; punctuation marks are added to the text information according to the collected voice information and/or the converted text information, so that the text information with clear semantics can be generated, and the conference summary meeting the requirements can be generated quickly; by segmenting the text information according to the collected voice information and/or the converted text information, text information with clear semantics can be generated, which is beneficial to quickly generating meeting summary meeting requirements; the method has the advantages that the sound source position corresponding to each section of voice information is determined, the identity of the speaker is generated according to the determined sound source position, and then the identity is added into the text information corresponding to the corresponding voice information, so that the speaking content of each speaker is determined, the meeting summary meeting requirements can be generated quickly, and the method has high usability and practicability.
It is understood that the beneficial effects of the second to fourth aspects may be referred to in the description of the first aspect, and are not described herein again.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flowchart of a method for generating a conference summary according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of a method for generating a conference summary according to a second embodiment of the present application;
fig. 3 is a schematic flowchart of a method for generating a conference summary according to a third embodiment of the present application;
fig. 4 is a schematic flowchart of a method for generating a conference summary according to a fourth embodiment of the present application;
fig. 5 is a schematic structural diagram of a device for generating a conference summary according to a fifth embodiment of the present application;
fig. 6 is a schematic structural diagram of a vehicle-mounted recording system according to a sixth embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
It should be understood that, the sequence numbers of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic of the process, and should not constitute any limitation to the implementation process of the embodiment of the present application.
It should be noted that, the descriptions of "first" and "second" in this embodiment are used to distinguish different regions, modules, and the like, and do not represent a sequential order, and the descriptions of "first" and "second" are not limited to be of different types.
In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.
Example one
Fig. 1 is a schematic flowchart of a method for generating a conference summary according to an embodiment of the present application, where the method may include the following steps:
s101: and voice information is collected.
It should be noted that the conference summary generation method provided by the application can be applied to in-vehicle conferences; the conference comprises any type of business conference; the meeting summary refers to a documentary with narrative and introductivity formed by certain processing and arrangement on the basis of original meeting records, and can be a paper version of the meeting summary and an electronic version of the meeting summary.
In one embodiment, when the method is applied to an in-vehicle meeting, the voice information should be in-vehicle voice information, and since a meeting is usually temporarily performed on the vehicle only when the on-vehicle person cannot get off the vehicle and an emergency meeting needs to be held, the operating state of the vehicle may be determined before the in-vehicle voice information is collected, so that when the operating state is started, the collection of the voice information is started, for example, the state of an in-vehicle engine is determined through an in-vehicle motion sensor, so as to determine the operating state of the vehicle.
In one embodiment, the voice information may be collected by a voice collection module in the in-vehicle recording system or other voice collection module independent of the in-vehicle recording system.
It should be noted that, because the function of editing text information can be started after the CPU in the vehicle is usually started, the power value required for starting the function of acquiring voice information in the present application is much lower than the power value required for starting the function of editing text information.
In one embodiment, the voice acquisition module may be a single microphone or a plurality of microphones. When the microphone is a single microphone, the microphone can be arranged near a center console in the vehicle or at the center of the top of the carriage or in the central area between the front row of seats and the rear row of seats; when a plurality of microphones are provided, they can be arranged evenly distributed in the positions where the whole vehicle compartment is available for installation.
In one embodiment, a corresponding time tag is added to each piece of voice information at the same time or after the voice information is collected.
In one embodiment, the collected voice information can be temporarily stored locally so as to be converted into text information when the condition is met, thereby improving the processing speed of the CPU in the vehicle.
In an embodiment, blank voice information in the voice information may be deleted first, then a splicing sequence between the remaining voice information is determined according to respective time tags of the remaining voice information, and finally the remaining voice information is spliced according to the determined splicing sequence, wherein the blank voice information refers to voice information of which the voice intensity is smaller than a preset value.
In one embodiment, it may be determined whether the voice message is a voice message containing a noise, and if so, the voice message may be deleted, where the noise may be transient insertion speech of a person or persons other than the current speaker.
In an embodiment, it may be determined whether the voice message includes a voice message with a length exceeding a threshold, and if so, the voice message with the length exceeding the threshold is deleted.
S102: and converting the collected voice information into text information.
In one embodiment, the collected speech information may be converted into textual information by different neural network models, such as an RNN model, a BLSTM model, and/or an HMM model.
It should be understood that when the voice message is attached with a time tag, the corresponding text message is also attached with the same time tag.
Since the collected voice information may include multiple languages, in order to unify the collected voice information, in an embodiment, a preset target language may be obtained before the conversion, and then the collected voice information may be converted into target text information according to the obtained target language.
S103: and editing the text information according to the collected voice information and/or the converted text information.
In one embodiment, the editing comprises adding punctuation marks, paragraph division and/or identity of the speaker to the text information.
In one embodiment, after converting the collected voice information into text information and before editing the text information, extracting key information in the text information so as to determine whether the currently collected voice information is related to a conference according to the extracted key information, thereby determining whether a function of editing the text information needs to be started. In addition, by extracting the key information in the text information, the subject content of the conference can be obtained.
In one embodiment, if a function of editing the text information is activated according to the key information, only the latter half text information including the key information may be edited.
In an embodiment, the identity of the current speaker may also be determined according to the collected voice information, and whether the function of editing the text information needs to be started is determined according to the identity of the speaker, for example, the function of editing the text information may be started when the identity of the speaker is important.
In one embodiment, if the function of editing the text message is enabled according to the identity of the speaker, the entire text message should be edited to ensure the integrity of the message.
In one embodiment, whether a function of editing the text message needs to be started may also be determined according to an instruction manually input by a user.
In one embodiment, if the collected voice information is not related to the conference, the voice information and/or the text information temporarily stored locally is deleted.
It should be noted that the function of deleting the voice information in step S101 and the function of starting editing the text information in step S103 may occur simultaneously or may occur independently, for example, when it is determined that the collected second segment of voice information includes preset key information, if the locally stored voice information exceeds the maximum allowable total amount after the previously collected first segment of voice information is added, the second segment of voice information may be sent to the corresponding editing module for processing while the first segment of voice information is cleared.
In one embodiment, after the function of editing the text information is started, the voice information collected subsequently is not required to be cached locally, and the voice information is directly sent.
S104: generating a meeting summary corresponding to the edited text information.
In one embodiment, the generated text information may be respectively filled into each part of a preset conference summary template, so as to generate a corresponding conference summary.
In one embodiment, the concierge summary may also be generated based on key information extracted from the textual information.
In one embodiment, the meeting summary should contain necessary meeting information such as the participants, the time of the meeting, the location of the meeting, and the time and content of the participants speaking.
In one embodiment, the generated conference summary may be temporarily stored locally or transmitted to a cloud or other terminal.
In one embodiment, the generated conference summary may be associated with corresponding voice information prior to storage or transmission.
Therefore, in the embodiment of the application, the meeting summary meeting the requirements can be automatically generated by automatically acquiring the voice information and converting and editing the voice information, and the meeting the requirements has strong usability and practicability.
Example two
Fig. 2 is a schematic flow chart of a method of a conference summary provided in an embodiment two of the present application, which is a further refinement and description of step S103 in the embodiment one, and the method may include the following steps:
s201: and voice information is collected.
S202: and converting the collected voice information into text information.
The steps S201 to S202 are the same as the steps S101 to S102 in the first embodiment, and the specific implementation process thereof can refer to the description of the steps S101 to S102, which is not repeated herein.
S203: determining the position of punctuation marks needing to be added in the text information according to the position of each pause in the voice information and/or the position of each tone word in the text information, determining the type of punctuation marks needing to be added in the position according to the duration and/or tone of each pause in the voice information, and adding punctuation marks to the text information according to the determined position and type.
In one embodiment, the mood words include, but are not limited to, what, how, wool, bar, and o; types of punctuation include, but are not limited to, commas, periods, question marks, exclamations and semicolons.
In one embodiment, the position and type of punctuation mark to be added in the text information can also be determined by combining the voice information before and after each pause position.
In one embodiment, punctuation marks can be added to the text information according to the semantics of the text information.
In one embodiment, the position of each pause in the voice message is the position of the text message to which the punctuation mark needs to be added.
S204: and generating a conference summary corresponding to the text information added with the punctuation marks.
The step S204 is the same as the step S104 in the first embodiment, and the specific implementation process thereof can refer to the description of the step S104, which is not repeated herein.
It can be seen from the above that, in the embodiment two, compared with the embodiment one, the text information is added with punctuation marks according to the collected voice information and/or conversion, so that the text information with clear sentence meaning can be generated, the meeting summary meeting requirements can be generated quickly, and the method has strong usability and practicability.
EXAMPLE III
Fig. 3 is a schematic flow chart of a method of a conference summary provided in an embodiment three of the present application, which is another refinement and description of step S103 in the embodiment one, and the method may include the following steps:
s301: and voice information is collected.
S302: and converting the collected voice information into text information.
The steps S301 to S302 are the same as the steps S101 to S102 in the first embodiment, and the specific implementation process thereof can refer to the description of the steps S101 to S102, which is not repeated herein.
S303: and determining the position of a paragraph to be divided in the text information according to the converted semantic meaning of the text information, and segmenting the text information according to the determined position of the paragraph to be divided.
In one embodiment, the segmentation is intended to divide the entire text information into a plurality of dialog text information.
In one embodiment, several sentences of text having the same semantics may be divided into a paragraph, such as a sentence segment or multiple sentence segments.
In one embodiment, the textual information may be segmented in conjunction with a time tag on the textual information.
S304: generating a conference summary corresponding to the segmented text information.
The step S304 is the same as the step S104 in the first embodiment, and the specific implementation process thereof can refer to the description of the step S104, which is not repeated herein.
Therefore, three-phase ratio in this application embodiment is in embodiment one, through according to gathering speech information and/or conversion text information does the text information segmentation can generate the clear text information of semanteme, is favorable to generating meeting summary that meets the requirements fast, has stronger ease for use and practicality.
Example four
Fig. 4 is a schematic flow chart of a method of a conference summary provided in the fourth embodiment of the present application, which is a further refinement and description of step S103 in the first embodiment, and the method may include the following steps:
s401: and voice information is collected.
S402: and converting the collected voice information into text information.
The steps S401 to S402 are the same as the steps S101 to S102 in the first embodiment, and the specific implementation process thereof can refer to the description of the steps S101 to S102, which is not repeated herein.
S403: acquiring image information, acquiring pressure information on each seat, determining a sound source position corresponding to each acquired voice information according to the acquired voice information, image information and/or pressure information, generating an identity of a speaker corresponding to each sound source position according to a conference seat order table, and sequentially adding the identity of the corresponding speaker to the corresponding text according to the attribute of each voice information.
In one embodiment, when the method is applied to an in-vehicle meeting, the image information should be image information in a vehicle and the pressure information should be pressure information on each seat in the vehicle.
In one embodiment, the identity of the speaker may be front left one, front left two, back left one, back left two, or back left three.
In one embodiment, the attributes of each piece of voice information include, but are not limited to, a time tag, a precedence order, and a correlation between each other of each piece of voice information.
S404: and generating a conference summary corresponding to the text information added with the speaker identification.
The step S404 is the same as the step S104 in the first embodiment, and the specific implementation process thereof can refer to the description of the step S104, which is not repeated herein.
Therefore, compared with the first embodiment, the fourth embodiment of the application determines the sound source position corresponding to each section of voice information, generates the identity of the speaker according to the determined sound source position, and adds the identity to the text information corresponding to the corresponding voice information, so that the speaking content of each speaker is determined, the conference summary meeting requirements can be generated quickly, and the first embodiment of the application has high usability and practicability.
EXAMPLE five
Fig. 5 is a schematic structural diagram of a device for conference summary generation provided in the fifth embodiment of the present application, and for convenience of description, only the relevant portions of the fifth embodiment of the present application are shown.
The device for generating the conference summary can be a software unit, a hardware unit or a unit combining software and hardware which are arranged in the vehicle-mounted recording system, and can also be integrated into the vehicle-mounted recording system as an independent pendant.
The apparatus for conference summary generation, comprising:
the acquisition module 51 is used for acquiring voice information;
a conversion module 52, configured to convert the collected voice information into text information;
the editing module 53 is configured to edit the text information according to the collected voice information and/or the converted text information;
a generating module 54, configured to generate a conference summary corresponding to the edited text information.
In one embodiment, the apparatus further comprises:
and the deleting module is used for deleting blank voice information in the voice information, determining the splicing sequence among the rest voice information, and splicing the rest voice information according to the determined splicing sequence.
In an embodiment, the editing module 53 is specifically configured to:
determining the position of punctuation marks needing to be added in the text information according to the position of each pause in the voice information and/or the position of each tone word in the text information;
determining the type of punctuation marks to be added at the position according to the duration and/or tone of each pause in the voice information;
and adding punctuation marks to the text information according to the determined position and type.
In an embodiment, the editing module 53 is specifically configured to:
determining the position of a paragraph needing to be divided in the text information according to the converted semantic meaning of the text information;
and segmenting the text information according to the determined position of the paragraph needing to be divided.
In an embodiment, the editing module 53 is specifically configured to:
collecting image information;
collecting pressure information on each seat;
determining the sound source position corresponding to each section of collected voice information according to the collected voice information, image information and/or pressure information;
generating an identity of a speaker corresponding to each sound source position according to the conference seat list;
and sequentially adding the identity of the corresponding speaker to the corresponding text information according to the attribute of each section of voice information.
In one embodiment, the conversion module 52 is specifically configured to:
acquiring a preset target language;
and converting the collected voice information into target text information according to the acquired target language.
In one embodiment, the apparatus further comprises:
and the extraction module is used for extracting the key information in the text information.
In one embodiment, the apparatus further comprises:
the determination module is used for determining the running state of the vehicle.
EXAMPLE six
Fig. 6 is a schematic structural diagram of an in-vehicle recording system according to a sixth embodiment of the present application. As shown in fig. 6, the in-vehicle recording system 6 of the embodiment includes: a processor 60, a memory 61 and a computer program 62 stored in said memory 61 and executable on said processor 60. The processor 60, when executing the computer program 62, implements the steps of the first embodiment of the method, such as the steps S101 to S104 shown in fig. 1. Alternatively, the steps in the second embodiment of the method, for example, steps S201 to S204 shown in fig. 2, are implemented. Alternatively, the steps in the third embodiment of the method, for example, steps S301 to S304 shown in fig. 3, are implemented. Alternatively, the steps in the fourth embodiment of the method, for example, steps S401 to S404 shown in fig. 4, are implemented. The processor 60, when executing the computer program 62, implements the functions of the various modules/units in the various device embodiments, such as the functions of the modules 51-54 shown in fig. 5.
Illustratively, the computer program 62 may be partitioned into one or more modules/units that are stored in the memory 61 and executed by the processor 60 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 62 in the in-vehicle display system 6. For example, the computer program 62 may be divided into an acquisition module 621, a conversion module 622, an editing module 623, and a generation module 624, and each module has the following specific functions:
the acquisition module 621 is used for acquiring voice information;
a conversion module 622, configured to convert the collected voice information into text information;
an editing module 623, configured to edit the text information according to the collected voice information and/or the converted text information;
a generating module 624, configured to generate a conference summary corresponding to the edited text information.
The in-vehicle recording system may include, but is not limited to, a processor 60, a memory 61. Those skilled in the art will appreciate that fig. 6 is merely an example of the in-vehicle recording system 6, and does not constitute a limitation of the in-vehicle recording system 6, and may include more or less components than those shown, or some components in combination, or different components, for example, the in-vehicle recording system may further include a microphone 63, an amplifier 64, a transducer 65, various sensors 66, a database 67, and the like.
The Processor 60 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Wherein the processor 60 may be built into the CID system, the intelligent tachograph and/or the electronic rearview mirror.
In addition, the processor 60 may also control the sensitivity of the in-vehicle recording system, such as by varying the sensitivity through the gain of the amplifier 64.
The memory 61 may be an internal storage unit of the in-vehicle sound recording system 6, such as a hard disk or a memory of the in-vehicle sound recording system 6. The memory 61 may also be an external storage device of the vehicle-mounted recording system 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the vehicle-mounted recording system 6. Further, the memory 61 may also include both an internal storage unit and an external storage device of the in-vehicle recording system 6. The memory 61 is used for storing the computer program and other programs and data required by the in-vehicle sound recording system. The memory 61 may also be used to temporarily store data that has been output or is to be output.
The microphone 63 is used for collecting the voice information, and may be a single microphone or a plurality of microphones.
The amplifier 64 is used for amplifying the collected voice information, and may be one or more common sound box amplifiers.
The converter 65 is used for converting the collected analog voice signal into a corresponding digital signal, and may be an existing analog/digital converter.
The various sensors 66 include, but are not limited to, image sensors, sound sensors, pressure sensors, and motion sensors. The image sensor is used for acquiring image information in the vehicle, the number of the image sensors is at least one, and the image sensors can be cameras or equipment provided with the cameras; the sound sensors are used for collecting voice information and positioning of sound sources in the vehicle, the number of the sound sensors is at least two, and the sound sensors can be two single microphones or two microphone arrays, when the sound sensors are single microphones, the sound sensors can be respectively arranged at the position near the central control in the vehicle or at the center of the top of a carriage or in the central area between two rows of seats in front and at the back, and when the sound sensors are microphone arrays, the sound sensors can be uniformly dispersed in the mountable position of the whole carriage; the pressure sensors are used for acquiring the pressure exerted on the seats and the backrest, the number of the pressure sensors is at least consistent with the number of the seats in the vehicle, and the pressure sensors can be arranged at the bottom center position of each seat.
The database 67 is a database running locally or on the cloud, and is used for assisting in converting the voice information into corresponding text information.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or recited in a certain embodiment.
Those of ordinary skill in the art would appreciate that the modules, elements, and/or method steps of the various embodiments described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium and used by a processor to implement the steps of the embodiments of the methods. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A method of conference summary generation, comprising:
collecting voice information;
converting the collected voice information into text information;
editing the text information according to the collected voice information and/or the converted text information;
generating a meeting summary corresponding to the edited text information.
2. The method of claim 1, wherein before converting the collected voice information into text information, further comprising:
deleting blank voice information in the voice information;
determining the splicing sequence among the residual voice information;
and splicing the residual voice information according to the determined splicing sequence.
3. The method of claim 1, wherein editing the text information according to the collected voice information and/or the converted text information comprises:
determining the position of punctuation marks needing to be added in the text information according to the position of each pause in the voice information and/or the position of each tone word in the text information;
determining the type of punctuation marks to be added at the position according to the duration and/or tone of each pause in the voice information;
and adding punctuation marks to the text information according to the determined position and type.
4. The method of claim 1, wherein editing the text information according to the collected voice information and/or the converted text information further comprises:
determining the position of a paragraph needing to be divided in the text information according to the converted semantic meaning of the text information;
and segmenting the text information according to the determined position of the paragraph needing to be divided.
5. The method of claim 1, wherein editing the text information according to the collected voice information and/or the converted text information further comprises:
collecting image information;
collecting pressure information on each seat;
determining the sound source position corresponding to each section of collected voice information according to the collected voice information, image information and/or pressure information;
generating an identity of a speaker corresponding to each sound source position according to the conference seat list;
and sequentially adding the identity of the corresponding speaker to the corresponding text information according to the attribute of each section of voice information.
6. The method of claim 1, wherein converting the collected voice information into text information comprises:
acquiring a preset target language;
and converting the collected voice information into target text information according to the acquired target language.
7. The method according to claim 1, before editing the text information according to the collected voice information and/or the converted text information, further comprising:
and extracting key information in the text information.
8. The method according to any one of claims 1 to 7, further comprising, prior to said collecting voice information:
the operating state of the vehicle is determined.
9. An in-vehicle sound recording system comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 8 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
CN201910713072.3A 2019-08-02 2019-08-02 Conference summary generation method and vehicle-mounted recording system Pending CN110619897A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910713072.3A CN110619897A (en) 2019-08-02 2019-08-02 Conference summary generation method and vehicle-mounted recording system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910713072.3A CN110619897A (en) 2019-08-02 2019-08-02 Conference summary generation method and vehicle-mounted recording system

Publications (1)

Publication Number Publication Date
CN110619897A true CN110619897A (en) 2019-12-27

Family

ID=68921586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910713072.3A Pending CN110619897A (en) 2019-08-02 2019-08-02 Conference summary generation method and vehicle-mounted recording system

Country Status (1)

Country Link
CN (1) CN110619897A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111261162A (en) * 2020-03-09 2020-06-09 北京达佳互联信息技术有限公司 Speech recognition method, speech recognition apparatus, and storage medium
CN111445929A (en) * 2020-03-12 2020-07-24 维沃移动通信有限公司 Voice information processing method and electronic equipment
WO2020224217A1 (en) * 2019-05-07 2020-11-12 平安科技(深圳)有限公司 Speech processing method and apparatus, computer device, and storage medium
CN112634879A (en) * 2020-12-18 2021-04-09 建信金融科技有限责任公司 Voice conference management method, device, equipment and medium
CN112836476A (en) * 2021-02-04 2021-05-25 北京字跳网络技术有限公司 Summary generation method, device, equipment and medium
CN115550075A (en) * 2022-12-01 2022-12-30 中网道科技集团股份有限公司 Anti-counterfeiting processing method and device for public welfare activity data of community correction object

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107305541A (en) * 2016-04-20 2017-10-31 科大讯飞股份有限公司 Speech recognition text segmentation method and device
CN107451110A (en) * 2017-07-10 2017-12-08 珠海格力电器股份有限公司 A kind of method, apparatus and server for generating meeting summary
CN109361825A (en) * 2018-11-12 2019-02-19 平安科技(深圳)有限公司 Meeting summary recording method, terminal and computer storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107305541A (en) * 2016-04-20 2017-10-31 科大讯飞股份有限公司 Speech recognition text segmentation method and device
CN107451110A (en) * 2017-07-10 2017-12-08 珠海格力电器股份有限公司 A kind of method, apparatus and server for generating meeting summary
CN109361825A (en) * 2018-11-12 2019-02-19 平安科技(深圳)有限公司 Meeting summary recording method, terminal and computer storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020224217A1 (en) * 2019-05-07 2020-11-12 平安科技(深圳)有限公司 Speech processing method and apparatus, computer device, and storage medium
CN111261162A (en) * 2020-03-09 2020-06-09 北京达佳互联信息技术有限公司 Speech recognition method, speech recognition apparatus, and storage medium
CN111261162B (en) * 2020-03-09 2023-04-18 北京达佳互联信息技术有限公司 Speech recognition method, speech recognition apparatus, and storage medium
CN111445929A (en) * 2020-03-12 2020-07-24 维沃移动通信有限公司 Voice information processing method and electronic equipment
CN112634879A (en) * 2020-12-18 2021-04-09 建信金融科技有限责任公司 Voice conference management method, device, equipment and medium
CN112836476A (en) * 2021-02-04 2021-05-25 北京字跳网络技术有限公司 Summary generation method, device, equipment and medium
CN115550075A (en) * 2022-12-01 2022-12-30 中网道科技集团股份有限公司 Anti-counterfeiting processing method and device for public welfare activity data of community correction object

Similar Documents

Publication Publication Date Title
CN110619897A (en) Conference summary generation method and vehicle-mounted recording system
CN207149252U (en) Speech processing system
CN108447471A (en) Audio recognition method and speech recognition equipment
CN110517689B (en) Voice data processing method, device and storage medium
DE102018113034A1 (en) VOICE RECOGNITION SYSTEM AND VOICE RECOGNITION METHOD FOR ANALYZING A COMMAND WHICH HAS MULTIPLE INTENTIONS
US8150687B2 (en) Recognizing speech, and processing data
DE102017121059A1 (en) IDENTIFICATION AND PREPARATION OF PREFERRED EMOJI
US20130253932A1 (en) Conversation supporting device, conversation supporting method and conversation supporting program
CN107564531A (en) Minutes method, apparatus and computer equipment based on vocal print feature
JP7158217B2 (en) Speech recognition method, device and server
CN109710949B (en) Translation method and translator
CN111916088B (en) Voice corpus generation method and device and computer readable storage medium
CN111128212A (en) Mixed voice separation method and device
DE112018007847B4 (en) INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD AND PROGRAM
EP4044178A2 (en) Method and apparatus of performing voice wake-up in multiple speech zones, method and apparatus of performing speech recognition in multiple speech zones, device, and storage medium
US7844459B2 (en) Method for creating a speech database for a target vocabulary in order to train a speech recognition system
CN113270095B (en) Voice processing method, device, storage medium and electronic equipment
CN116631380B (en) Method and device for waking up audio and video multi-mode keywords
CN113643704A (en) Test method, upper computer, system and storage medium of vehicle-mounted machine voice system
CN110588524B (en) Information display method and vehicle-mounted auxiliary display system
CN110737422A (en) sound signal acquisition method and device
CN116364083A (en) Vehicle-mounted multimode voice test method, system, equipment and medium
CN115953996A (en) Method and device for generating natural language based on in-vehicle user information
CN113535308A (en) Language adjusting method, language adjusting device, electronic equipment and medium
CN115063155A (en) Data labeling method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191227

RJ01 Rejection of invention patent application after publication