CN113434663A

CN113434663A - Conference summary generation method based on edge calculation and related equipment

Info

Publication number: CN113434663A
Application number: CN202110740076.8A
Authority: CN
Inventors: 李佳琳; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-09-24

Abstract

The invention relates to the field of artificial intelligence and discloses a conference summary generation method based on edge calculation and related equipment. The method comprises the following steps: acquiring audio data acquired by a conference, and storing the audio data into an edge computing equipment cluster through a local area network, wherein a text translation model and a semantic recognition model are deployed in the edge computing equipment cluster; performing distributed translation on the audio data by adopting a text translation model to obtain text data corresponding to the audio data; performing distributed semantic recognition on the text data by adopting a semantic recognition model and a preset prior knowledge base to obtain a natural language text corresponding to the text data; and combining the text data and the corresponding natural language text to obtain a conference summary, and storing the conference summary into a storage container of the edge computing device cluster. The invention realizes the automatic generation of the conference summary and improves the confidentiality and the safety of the generation process of the conference summary.

Description

Conference summary generation method based on edge calculation and related equipment

Technical Field

The invention relates to the field of artificial intelligence, in particular to a conference summary generation method based on edge calculation and related equipment.

Background

As the times continue to grow, more and more work process hand-offs are required to be performed by organizing a conference call between employees. In the process of the teleconference, important information exchanged among the employees needs to be recorded as a conference summary. However, teleconferences sometimes perform a small number of people, with conferences moving toward high frequencies and changing on a small scale, making it increasingly impractical to plan the behavior of meeting presidents separately for each meeting. Some important meetings lack the meeting summary, and some important information is easily forgotten in the work in the future, so that the butt joint among the employees is asymmetric, resources are wasted, and the whole office efficiency and progress are influenced.

At present, each meeting record is mainly completed in a manual mode, and summary records are carried out in characters according to the language communication content of meeting personnel. However, the manpower conference summary needs many resources and is long in time consumption, and the manual conference summary is not a feasible recording scheme in a working scene with high conference frequency and small scale. In addition, aiming at the current computing capability requirements of various artificial intelligence recognition models, the cloud computing scheme is a necessary option for most systems, and the requirements of the cloud computing scheme on various data streams are very strict. The enormous network pressure and maintenance costs associated with uploading real-time data to the relevant databases cannot be amortized. In addition, in the existing cloud conference system in the market, most of calculation and storage are performed through the cloud, and confidentiality and data security of conference summary information exposed in a network environment are difficult to guarantee. In summary, the automated generation of the existing conference summary is not flexible enough.

Disclosure of Invention

The invention mainly aims to solve the problem that the flexibility of the generation mode of the existing conference summary is not enough.

The invention provides a conference summary generation method based on edge calculation, which comprises the following steps: acquiring audio data acquired by a conference, and storing the audio data into an edge computing device cluster through a local area network, wherein a text translation model and a semantic recognition model are deployed in the edge computing device cluster; performing distributed translation on the audio data by adopting the text translation model to obtain text data corresponding to the audio data; performing distributed semantic recognition on the text data by adopting the semantic recognition model and a preset prior knowledge base to obtain a natural language text corresponding to the text data; and combining the text data and the corresponding natural language text to obtain a conference summary, and storing the conference summary into a storage container of the edge computing device cluster.

Optionally, in a first implementation manner of the first aspect of the present invention, the storing the audio data in the edge computing device cluster through the local area network includes: storing the audio data into an edge management device in the edge computing device cluster through a local area network; and sharing the audio data stored in the edge management equipment to the edge node equipment in the edge computing equipment cluster through the local area network by adopting the edge management equipment.

Optionally, in a second implementation manner of the first aspect of the present invention, the performing, by using the text translation model, distributed translation on the audio data to obtain text data corresponding to the audio data includes: determining translation tasks deployed by the text translation model on each edge node device according to translation setting information stored in the edge management device; generating translation instructions corresponding to the translation tasks through the edge management equipment, and sending the translation instructions to corresponding edge node equipment; according to each translation instruction, executing the text translation model on the corresponding edge node device in a distributed mode, and translating the audio data into the corresponding text data through the text translation model.

Optionally, in a third implementation manner of the first aspect of the present invention, the performing distributed semantic recognition on the text data by using the semantic recognition model and a preset prior knowledge base to obtain a natural language text corresponding to the text data includes: determining an identification task of the semantic identification model deployed in each edge node device according to identification setting information stored in the edge management device; generating identification instructions corresponding to the identification tasks through the edge management equipment, and sending the identification instructions to corresponding edge node equipment; determining an application scene of the text data through the edge management device, selecting a preset prior knowledge base corresponding to the application scene and sharing the preset prior knowledge base to each edge node device; and according to each identification instruction, executing the semantic identification model on the corresponding edge node device in a distributed manner, and identifying the natural language text corresponding to the text data through the semantic identification model and the preset prior knowledge base.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the translating, by the text translation model, the audio data into corresponding text data includes: performing audio division on the audio data through the text translation model to obtain a plurality of continuous sampling values, and performing arithmetic mean processing on each continuous sampling value to obtain a filtering value corresponding to the audio data; performing signal cutting processing on the clean audio through the text translation model to obtain at least one audio segment, and performing vectorization processing on the at least one audio segment to obtain at least one digital vector; and searching the characters mapped with the digital vectors from a preset translation table through the text translation model, and combining the searched characters to obtain text data corresponding to the audio data.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the identifying, by the semantic identification model and the preset prior knowledge base, the natural language text corresponding to the text data includes: dividing the text data into a character string sequence by adopting the semantic recognition model, and performing lexical processing on each element in the character string sequence to obtain a plurality of lexical elements; performing word frequency statistics and sequencing on each word method element by adopting the semantic recognition model and a preset professional corpus to obtain word frequency sequencing, and taking a preset number of words with the word frequency sequencing being earlier as a keyword; and recognizing the semantic relation among the keywords by adopting the grammar recognition model, and combining the keywords based on the semantic relation and a preset grammar rule to obtain a natural language text.

Optionally, in a sixth implementation manner of the first aspect of the present invention, before the dividing the text data into a character string sequence by using the semantic recognition model, and performing lexical processing on each element in the character string sequence to obtain a plurality of lexical elements, the method further includes: detecting special characters of the text data to obtain a first detection result, and detecting stop words of the text data according to a preset stop word list to obtain a second detection result; judging whether the text data contains special characters or not according to the first detection result, and judging whether the text data contains stop words or not according to the second detection result; and if the text data contains special characters, rejecting the special characters, and if the text data contains stop words, rejecting the stop words.

The second aspect of the present invention provides a conference summary generation apparatus based on edge calculation, including: the acquisition module is used for acquiring audio data acquired by a conference and storing the audio data into an edge computing equipment cluster through a local area network, wherein a text translation model and a semantic recognition model are deployed in the edge computing equipment cluster; the translation module is used for performing distributed translation on the audio data by adopting the text translation model to obtain text data corresponding to the audio data; the recognition module is used for carrying out distributed semantic recognition on the text data by adopting the semantic recognition model and a preset prior knowledge base to obtain a natural language text corresponding to the text data; and the combination module is used for combining the text data and the corresponding natural language text to obtain a conference summary, and storing the conference summary into a storage container of the edge computing equipment cluster.

Optionally, in a first implementation manner of the second aspect of the present invention, the acquisition module includes: the storage unit is used for storing the audio data into the edge management equipment in the edge computing equipment cluster through a local area network; and the sharing unit is used for sharing the audio data stored in the edge management equipment to the edge node equipment in the edge computing equipment cluster through the local area network by adopting the edge management equipment.

Optionally, in a second implementation manner of the second aspect of the present invention, the translation module includes: the translation determining unit is used for determining translation tasks deployed by the text translation model on each edge node device according to translation setting information stored in the edge management device; a translation instruction generation unit, configured to generate, by the edge management device, translation instructions corresponding to the translation tasks, and send the translation instructions to corresponding edge node devices; and the translation execution unit is used for executing the text translation model on the corresponding edge node equipment in a distributed manner according to each translation instruction, and translating the audio data into corresponding text data through the text translation model.

Optionally, in a third implementation manner of the second aspect of the present invention, the identification module includes: the identification determining unit is used for determining the identification tasks deployed by the semantic identification model on each edge node device according to the identification setting information stored in the edge management device; the identification instruction generating unit is used for generating identification instructions corresponding to the identification tasks through the edge management equipment and sending the identification instructions to the corresponding edge node equipment; a selecting unit, configured to determine an application scenario of the text data through the edge management device, select a preset prior knowledge base corresponding to the application scenario, and share the preset prior knowledge base to each edge node device; and the identification execution unit is used for executing the semantic identification models in a distributed manner on the corresponding edge node equipment according to the identification instructions and identifying the natural language text corresponding to the text data through the semantic identification models and the preset prior knowledge base.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the translation execution unit is further configured to: performing audio division on the audio data through the text translation model to obtain a plurality of continuous sampling values, and performing arithmetic mean processing on each continuous sampling value to obtain a filtering value corresponding to the audio data; performing signal cutting processing on the clean audio through the text translation model to obtain at least one audio segment, and performing vectorization processing on the at least one audio segment to obtain at least one digital vector; and searching the characters mapped with the digital vectors from a preset translation table through the text translation model, and combining the searched characters to obtain text data corresponding to the audio data.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the identification execution unit is further configured to: dividing the text data into a character string sequence by adopting the semantic recognition model, and performing lexical processing on each element in the character string sequence to obtain a plurality of lexical elements; performing word frequency statistics and sequencing on each word method element by adopting the semantic recognition model and a preset professional corpus to obtain word frequency sequencing, and taking a preset number of words with the word frequency sequencing being earlier as a keyword; and recognizing the semantic relation among the keywords by adopting the grammar recognition model, and combining the keywords based on the semantic relation and a preset grammar rule to obtain a natural language text.

Optionally, in a sixth implementation manner of the second aspect of the present invention, the identification execution unit is further configured to: detecting special characters of the text data to obtain a first detection result, and detecting stop words of the text data according to a preset stop word list to obtain a second detection result; judging whether the text data contains special characters or not according to the first detection result, and judging whether the text data contains stop words or not according to the second detection result; and if the text data contains special characters, rejecting the special characters, and if the text data contains stop words, rejecting the stop words.

A third aspect of the present invention provides a conference summary generation apparatus based on edge calculation, including: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the edge-based computed conference summary generation device to perform the steps of the edge-based computed conference summary generation method described above.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the above-described edge-computation-based conference summary generation method.

According to the technical scheme, audio data in a conference process are acquired through audio acquisition equipment in a conference room and are preliminarily stored in an edge computing equipment cluster through a local area network, data transmission in a conference summary generation process is achieved through the local area network, so that the data are prevented from entering a cloud end, data safety is improved, the edge computing equipment cluster can comprise a computer, computer equipment, a smart home, a CPU (Central processing Unit) and a GPU (graphics processing Unit) in a television and the like in an office conference environment, and cost of office environment deployment is reduced. Then, audio data are translated into text data in a distributed mode through a text translation model in the edge computing equipment cluster, and the text data are translated into natural language texts in a distributed mode through a semantic recognition model, so that a conference summary can be automatically generated; the existing low-memory edge computing equipment in the conference room is fully utilized, a cluster is formed to execute model computation needing high computing capacity, the operation efficiency is improved, the operation pressure is reduced, and the operation and maintenance cost is further reduced.

Drawings

FIG. 1 is a schematic diagram of a first embodiment of a method for generating a conference summary based on edge calculation according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a second embodiment of a conference summary generation method based on edge calculation according to the embodiment of the present invention;

FIG. 3 is a schematic diagram of a third embodiment of a conference summary generation method based on edge calculation according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an embodiment of a conference summary generation apparatus based on edge calculation according to the embodiment of the present invention;

FIG. 5 is a schematic diagram of another embodiment of the device for generating a conference summary based on edge calculation in the embodiment of the present invention;

fig. 6 is a schematic diagram of an embodiment of a conference summary generation device based on edge calculation in the embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a conference summary generation method, a device, equipment and a storage medium based on edge computing, which are used for acquiring audio data acquired by a conference and storing the audio data into an edge computing equipment cluster through a local area network, wherein a text translation model and a semantic recognition model are deployed in the edge computing equipment cluster; performing distributed translation on the audio data by adopting a text translation model to obtain text data corresponding to the audio data; performing distributed semantic recognition on the text data by adopting a semantic recognition model and a preset prior knowledge base to obtain a natural language text corresponding to the text data; and combining the text data and the corresponding natural language text to obtain a conference summary, and storing the conference summary into a storage container of the edge computing device cluster. The invention realizes the automatic generation of the conference summary and improves the confidentiality and the safety of the generation process of the conference summary.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of understanding, a specific flow of the embodiment of the present invention is described below, and referring to fig. 1, a first embodiment of a conference summary generation method based on edge calculation in the embodiment of the present invention includes:

101. acquiring audio data acquired by a conference, and storing the audio data into an edge computing equipment cluster through a local area network, wherein a text translation model and a semantic recognition model are deployed in the edge computing equipment cluster;

it is to be understood that the executing subject of the present invention may be a conference summary generating apparatus based on edge calculation, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.

In this embodiment, when a conference starts each time, the recording acquisition device in the conference place may be directly utilized, or a pre-installed dedicated recording acquisition device may acquire the audio data, for example, when an offline conference, a video conference, or a telephone conference is performed, the microphone authority of the mobile device may be acquired to acquire the audio data of the conference, and the acquisition mode of the audio data is not specifically limited herein.

In this embodiment, the available electronic computing devices in the whole conference site are connected to the edge computing pool through the WI-FI network or the bluetooth protocol to form an edge computing device cluster, and the storage, computation, and transmission of audio data are performed together to share computing/storage resources. Wherein the available electronic computing devices that make up the edge computing device cluster may include: and a computer, a computer device, an intelligent home, a CPU (Central processing Unit) and a GPU (graphics processing Unit) in a television and the like in the conference place deploy a text translation model and a semantic recognition model in the computer device to prepare for processing and calculating input audio data.

Specifically, the audio data is not uploaded to the cloud and is only sent to the edge computing device cluster through the local area network, wherein the audio data is sent to each edge computing device in the edge computing device cluster for storage, and each subsequent edge computing device can call corresponding partial data according to computing requirements.

102. Performing distributed translation on the audio data by adopting a text translation model to obtain text data corresponding to the audio data;

in the embodiment, aiming at various current artificial intelligence recognition models, the computing capacity requirement is high, a cloud computing scheme is a necessary option of most systems under normal conditions, the requirement of the cloud computing scheme on various data streams is very strict, if the conference summary is recorded only, large computing resources and storage resources are obviously not in accordance with economic benefits, real-time data are uploaded to a related database, and the generated huge network pressure and maintenance cost cannot be amortized.

Therefore, the storage, calculation and processing of the audio data are all placed in the edge computing device cluster and are not processed on the cloud end; in addition, other professional acquisition equipment, storage equipment and computing equipment are prevented from being purchased additionally, the existing equipment in a conference place is utilized through direct local material utilization, the existing equipment with conventional computing capacity is connected in a cluster mode, the computing amount of the text translation model and the semantic recognition model is distributed into each piece of equipment, the computing pressure and the storage pressure of each piece of equipment are reduced, the computing requirements of the two models are met, and the equipment purchasing cost is reduced.

In this embodiment, the translation process of the text translation model is divided into a plurality of translation tasks, and the plurality of translation tasks are distributed to each edge computing device for computation according to the computing power of each edge computing device in the edge computing device cluster. Wherein, the distributed translation mode can comprise: each edge computing device performing a different translation task; two or more edge computing devices collectively perform a translation task.

Specifically, the text translation setting information model may adopt a text translation setting information deep learning model of an ASR (Automatic Speech Recognition), which translates the audio data to obtain text data in a RAW text format. For example, the ASR model includes four basic execution steps: the input, encoding, decoding and output can be performed by distributing 4 execution steps into 4 edge computing devices as 4 translation tasks, or further subdividing four execution steps to obtain more than 4 translation tasks, and distributing the translation tasks to corresponding edge computing devices.

103. Performing distributed semantic recognition on the text data by adopting a semantic recognition model and a preset prior knowledge base to obtain a natural language text corresponding to the text data;

in this embodiment, the process of the distributed semantic recognition process of text data by using the semantic recognition model and the priori knowledge base is the same as the translation process of the text translation model, the recognition process of the semantic recognition model is divided into a plurality of recognition tasks, and then the recognition tasks are distributed to the edge computing devices for execution according to the computing power of the edge computing devices in different edge computing device clusters, so as to ensure the optimization of the execution efficiency.

Specifically, the semantic recognition model may adopt an NLP (Natural Language Processing) model, and the text data is recognized to obtain a Natural Language text. For example, the process of recognizing text data by using NLP model includes the following four processes: the text preprocessing, the word segmentation, the feature extraction and the combination are divided into four recognition tasks which are respectively distributed to 4 edge computing devices, or the four recognition tasks are subdivided into a plurality of recognition tasks which participate in the common computation by more edge computing devices.

In addition, the prior knowledge base is deployed on edge computing equipment for executing the identification task related to feature extraction, and is used for calling during feature extraction, different prior knowledge bases are set for feature extraction according to professional types of different conferences, for example, a financial company adopts the prior knowledge base of financial conference terms, and an internet company adopts the prior knowledge base of internet professional terms.

104. And combining the text data and the corresponding natural language text to obtain a conference summary, and storing the conference summary into a storage container of the edge computing device cluster.

In this embodiment, the text data and the natural language text are both used as part of the meeting summary, and the two are verified mutually and are mutually referred to ensure that the meeting summary records important information of the meeting content completely and accurately. The edge computing device cluster is provided with a storage container special for storing the conference summary, each edge computing device in the edge computing device cluster can be accessed through a port, and text data and natural language texts in the storage container can be read at will and viewed, called and archived.

In the embodiment of the invention, the audio data in the conference process is acquired by the audio acquisition equipment in the conference room and is preliminarily stored in the edge computing equipment cluster through the local area network, wherein the data transmission in the conference summary generation is realized through the local area network, so that the data are prevented from entering a cloud end, the data safety is improved, the edge computing equipment cluster can comprise a computer, computer equipment, an intelligent home, a CPU (central processing unit) and a GPU (graphic processing unit) in a process television and the like in an office conference environment, and the cost of office environment deployment is reduced. Then, audio data are translated into text data in a distributed mode through a text translation model in the edge computing equipment cluster, and the text data are translated into natural language texts in a distributed mode through a semantic recognition model, so that a conference summary can be automatically generated; the existing low-memory edge computing equipment in the conference room is fully utilized, a cluster is formed to execute model computation needing high computing capacity, the operation efficiency is improved, the operation pressure is reduced, and the operation and maintenance cost is further reduced.

Referring to fig. 2, a second embodiment of the method for generating a conference summary based on edge calculation according to the embodiment of the present invention includes:

201. acquiring audio data acquired by a conference, and storing the audio data into edge management equipment in an edge computing equipment cluster through a local area network;

202. sharing audio data stored in the edge management equipment to edge node equipment in an edge computing equipment cluster by adopting edge management equipment through a local area network, wherein a text translation model and a semantic recognition model are deployed in the edge computing equipment cluster;

in this embodiment, the audio acquisition device in the conference site is connected to the edge management device in the edge computing device cluster, and the edge management device is connected to other edge node devices to form a Lora network of the VPN networking as a local area network.

In this embodiment, an edge manager is deployed on the edge management device, and each edge node device in the edge computing device cluster may be managed via an edge manager port. The edge manager is mainly in the form of software deployed in computer edge hardware and is responsible for resource calling and node management. The edge node equipment is responsible for receiving, transmitting, calculating and storing audio data, text data, natural language text and other related data. The whole edge computing device cluster can be connected with each other and transmit information through a WIFI network or a Bluetooth protocol.

The audio data are cached in the edge management device, and then the audio data are shared to other edge node devices according to the deployed edge manager. Furthermore, corresponding audio data can be distributed according to the execution task of each edge node device, all the audio data do not need to be shared to each edge node device, and the storage space occupation of the edge computing device cluster is reduced.

203. Determining a translation task deployed by a text translation model on each edge node device according to translation setting information stored in the edge management device;

204. generating translation instructions corresponding to the translation tasks through the edge management equipment, and sending the translation instructions to the corresponding edge node equipment;

205. according to each translation instruction, executing a text translation model on the corresponding edge node device in a distributed mode, and translating the audio data into corresponding text data through the text translation model;

in this embodiment, a user may set translation tasks executed by each edge node device in advance based on translation setting information of a device text translation model on an edge manager of the edge management device; the edge management device can also comprise an interactive interface, and for each edge node device, the translation tasks divided during one or more development phases are not selected or selected in a single selection or multiple selection mode, so that the setting of the translation setting information is completed. Where the translation may be performed directly according to the translation setting information.

And each translation task corresponds to one translation instruction, and the edge manager generates the translation instruction corresponding to each translation task according to the translation setting information and appoints to send the translation instruction to the set edge node equipment. After the translation instruction is sent to the edge node device, the translation task of the text translation setting information model is triggered correspondingly. The generation of each translation instruction and the execution of the translation task have time sequence, and the generation and the execution of each translation instruction and the execution of each translation task are triggered sequentially and simultaneously according to setting.

In addition, the specific audio data translation process is as follows:

(1) audio data is subjected to audio division through a text translation model to obtain a plurality of continuous sampling values, and arithmetic mean processing is carried out on each continuous sampling value to obtain a filtering value corresponding to the audio data;

(2) performing signal cutting processing on the clean audio through a text translation model to obtain at least one audio segment, and performing vectorization processing on the at least one audio segment to obtain at least one digital vector;

(3) and searching characters mapped with the digital vectors from a preset translation table through a text translation model, and combining the searched characters to obtain text data corresponding to the audio data.

In the embodiment, audio data is subjected to audio division according to frames, and can be in a millisecond level, so that a plurality of continuous sampling values are obtained; specifically, a Fourier transform function is adopted to transform audio data, a drawing sound wave diagram is called, Fourier transform images are drawn according to the Fourier transform function, average arithmetic processing is carried out on continuous sampling values, unnecessary Fourier transform audio is ground flat, the converted sound wave diagram is drawn, and the divided small sections of waveforms are changed into multi-dimensional digital vectors according to human ear characteristics; the translation table is then used to identify the number vectors as states (which can be understood as an intermediate process, a process smaller than a phoneme) and the states are then combined to form a phoneme (typically 3 states-1 phoneme, factors such as "d", "a", "j", "ā", "h", "{ hacek", "o"); finally, the phonemes are grouped into words (d-aji ā h { hacek over (a) } o) and concatenated into sentences. Thus, the conversion from voice to text can be realized.

206. Performing distributed semantic recognition on the text data by adopting a semantic recognition model and a preset prior knowledge base to obtain a natural language text corresponding to the text data;

207. and combining the text data and the corresponding natural language text to obtain a conference summary, and storing the conference summary into a storage container of the edge computing device cluster.

In the embodiment of the invention, the edge computing device cluster can realize the off-line computation and result return of complete audio data by relying on the computing power of an edge system, audio data, text data and natural language text in the processing process do not need to be exposed to an external network, and zero network load is realized by relying on an internal storage completely; in addition, the deployment of the edge computing device cluster is completely based on the existing computing devices in the conference place, and if a user can provide a storage device by himself, the system can achieve zero hardware cost issue in an environment meeting implementation conditions, and computing cost performance of the whole office scene is improved.

Referring to fig. 3, a third embodiment of the method for generating a conference summary based on edge calculation according to the embodiment of the present invention includes:

301. acquiring audio data acquired by a conference, and storing the audio data into edge management equipment in an edge computing equipment cluster through a local area network;

302. sharing audio data stored in the edge management equipment to edge node equipment in an edge computing equipment cluster by adopting edge management equipment through a local area network, wherein a text translation model and a semantic recognition model are deployed in the edge computing equipment cluster;

303. performing distributed translation on the audio data by adopting a text translation model to obtain text data corresponding to the audio data;

304. determining the recognition tasks of the semantic recognition model deployed in each edge node device according to the recognition setting information stored in the edge management device;

305. generating identification instructions corresponding to the identification tasks through the edge management equipment, and sending the identification instructions to the corresponding edge node equipment;

in this embodiment, a user may set, in advance, an identification task executed by each edge node device, based on identification setting information of a device semantic identification model on an edge manager of an edge management device; the edge management device can also comprise an interactive interface, and for each edge node device, the identification tasks divided during one or more development stages are not selected or selected in a single selection or multiple selection mode, so that the setting of the identification setting information is completed. Where the identification can be performed directly according to the identification setting information.

Each recognition task corresponds to one translation instruction, and the edge manager generates the recognition instruction corresponding to each recognition task according to the recognition setting information and appoints to send the recognition instruction to the set edge node equipment. And after the identification instruction is sent to the edge node equipment, the identification task of the semantic identification model is correspondingly triggered. The generation of each identification instruction and the execution of the identification task have time sequence, and the generation and the execution of each identification instruction comprise sequential triggering and simultaneous triggering, and the triggering is carried out according to the setting.

306. Determining an application scene of the text data through the edge management equipment, selecting a preset prior knowledge base corresponding to the application scene and sharing the preset prior knowledge base to each edge node equipment;

307. according to each identification instruction, executing a semantic identification model on corresponding edge node equipment in a distributed mode, and identifying a natural language text corresponding to text data through the semantic identification model and a preset priori knowledge base;

in this embodiment, for different professional fields, the usage habits of the common words and the special words are different, that is, the semantics of natural languages are different, such as "conjugation", there are "conjugate complex number" and "conjugate root" in the mathematical field, there is "conjugation of double bonds" in the chemical field, "there is a" conjugate gradient method "in the calculation, and there are different semantics in different fields. Therefore, for application scenarios of different text data, that is, professional types of different conferences, semantic recognition can be performed by using a corresponding preset prior knowledge base, and natural language recognition can be performed by combining a semantic recognition model, which is specifically as follows:

(1) detecting special characters of the text data to obtain a first detection result, and detecting stop words of the text data according to a preset stop word list to obtain a second detection result;

(2) judging whether the text data contains special characters or not according to the first detection result, and judging whether the text data contains stop words or not according to the second detection result;

(3) if the text data contains special characters, the special characters are removed, and if the text data contains stop words, the stop words are removed;

(4) dividing text data into character string sequences by adopting a semantic recognition model, and performing lexical processing on each element in the character string sequences to obtain a plurality of lexical elements;

(5) performing word frequency statistics and sequencing on each lexical element by adopting a semantic recognition model and a preset professional corpus to obtain word frequency sequencing, and taking a preset number of words with the word frequency sequencing being earlier as a keyword;

(6) and recognizing semantic relations among the keywords by adopting a grammar recognition model, and combining the keywords based on the semantic relations and preset grammar rules to obtain the natural language text.

In this embodiment, before extracting semantic features of text data, text preprocessing is performed on the text data to remove special characters such as labels, punctuation marks, quotation marks, and URLs (Uniform Resource locators). To ensure that the conference summary is clear and concise, stop words are also deleted from the text data. So as to eliminate special characters and stop words.

And then, performing feature extraction and semantic recognition on the text data through a semantic recognition model, such as NPL and professional linguistic data corresponding to the audio data. Specifically, the method includes two core tasks of natural language understanding and natural language generation, where the natural language understanding includes the foregoing step (4), and may specifically adopt a transform algorithm for understanding, and the natural language generation includes the foregoing steps (5), (6), which may be further subdivided into: content determination, text structure, sentence aggregation, grammar generation, reference expression generation and language implementation.

Further, Word Segment can be used to Segment text into a sequence of strings according to specific requirements, the elements of which are generally called tokens. And lexical processing is carried out on the token. Subsequently, based on the preset professional corpus, weighting of data mining and information retrieval is performed through TF-IDF (term frequency-inverse text frequency index) to extract the relation between the keywords. And finally, extracting the extracted text and forming a natural language text through a grammar rule.

308. And combining the text data and the corresponding natural language text to obtain a conference summary, and storing the conference summary into a storage container of the edge computing device cluster.

In the embodiment of the invention, after the edge computing equipment cluster is deployed in a conference place, the flow realization and the workload of the conference summary are completely replaced by the edge computing equipment cluster, so that the deployment of manpower and material resources is reduced, the coverage range is wide, and the processes of identification, generation summary and archiving can be automatically realized no matter whether the conference is large or small. Moreover, the edge computing device cluster can be operated completely without a network offline, and if no file storage requirement exists, a user does not need to upload any data to the cloud server. The exposure of data in the network is reduced to the minimum, and privacy and safety are fully guaranteed relative to a cloud computing algorithm.

In the above description of the method for generating a conference summary based on edge calculation in the embodiment of the present invention, a device for generating a conference summary based on edge calculation in the embodiment of the present invention is described below, please refer to fig. 4, where an embodiment of the device for generating a conference summary based on edge calculation in the embodiment of the present invention includes:

the acquisition module 401 is configured to acquire audio data acquired by a conference and store the audio data in an edge computing device cluster through a local area network, where a text translation model and a semantic recognition model are deployed in the edge computing device cluster;

a translation module 402, configured to perform distributed translation on the audio data by using the text translation model to obtain text data corresponding to the audio data;

the recognition module 403 is configured to perform distributed semantic recognition on the text data by using the semantic recognition model and a preset priori knowledge base to obtain a natural language text corresponding to the text data;

and the combining module 404 is configured to combine the text data and the corresponding natural language text to obtain a meeting summary, and store the meeting summary in a storage container of the edge computing device cluster.

In the embodiment of the invention, the audio data in the conference process is acquired by the audio acquisition equipment in the conference room and is preliminarily stored in the edge computing equipment cluster through the local area network, wherein the data transmission in the conference summary generation process is realized through the local area network, so that the data is prevented from entering a cloud end, the data safety is improved, the edge computing equipment cluster can comprise a computer, computer equipment, an intelligent home, a CPU (central processing unit) and a GPU (graphic processing unit) in a television and the like in an office conference environment, and the cost of office environment deployment is reduced. Then, audio data are translated into text data in a distributed mode through a text translation model in the edge computing equipment cluster, and the text data are translated into natural language texts in a distributed mode through a semantic recognition model, so that a conference summary can be automatically generated; the existing low-memory edge computing equipment in the conference room is fully utilized, a cluster is formed to execute model computation needing high computing capacity, the operation efficiency is improved, the operation pressure is reduced, and the operation and maintenance cost is further reduced.

Referring to fig. 5, another embodiment of the device for generating a conference summary based on edge calculation according to the embodiment of the present invention includes:

Wherein the acquisition module 401 comprises:

the storage unit 4011 is configured to store the audio data in an edge management device in the edge computing device cluster through a local area network;

a sharing unit 4012, configured to share, with the edge management device, the audio data stored in the edge management device to an edge node device in the edge computing device cluster through the local area network.

Wherein the translation module 402 comprises:

a translation determining unit 4021, configured to determine, according to translation setting information stored in the edge management device, a translation task to be deployed by the text translation model at each edge node device;

a translation instruction generating unit 4022, configured to generate, by the edge management device, a translation instruction corresponding to each translation task, and send each translation instruction to a corresponding edge node device;

the translation executing unit 4023 is configured to execute the text translation model in a distributed manner on the corresponding edge node device according to each translation instruction, and translate the audio data into corresponding text data through the text translation model.

Wherein the identification module 403 comprises:

an identification determining unit 4031, configured to determine, according to identification setting information stored in the edge management device, an identification task that the semantic identification model is deployed in each edge node device;

an identification instruction generating unit 4032, configured to generate, by the edge management device, an identification instruction corresponding to each identification task, and send each identification instruction to a corresponding edge node device;

a selecting unit 4033, configured to determine an application scenario of the text data through the edge management device, select a preset prior knowledge base corresponding to the application scenario, and share the preset prior knowledge base with each edge node device;

an identification execution unit 4034, configured to execute the semantic identification model in a distributed manner on the corresponding edge node device according to each identification instruction, and identify, through the semantic identification model and the preset prior knowledge base, a natural language text corresponding to the text data.

Wherein the translation execution unit 4023 is further configured to:

performing audio division on the audio data through the text translation model to obtain a plurality of continuous sampling values, and performing arithmetic mean processing on each continuous sampling value to obtain a filtering value corresponding to the audio data;

performing signal cutting processing on the clean audio through the text translation model to obtain at least one audio segment, and performing vectorization processing on the at least one audio segment to obtain at least one digital vector;

and searching the characters mapped with the digital vectors from a preset translation table through the text translation model, and combining the searched characters to obtain text data corresponding to the audio data.

Wherein the identification execution unit 4034 is further configured to:

dividing the text data into a character string sequence by adopting the semantic recognition model, and performing lexical processing on each element in the character string sequence to obtain a plurality of lexical elements;

performing word frequency statistics and sequencing on each word method element by adopting the semantic recognition model and a preset professional corpus to obtain word frequency sequencing, and taking a preset number of words with the word frequency sequencing being earlier as a keyword;

and recognizing the semantic relation among the keywords by adopting the grammar recognition model, and combining the keywords based on the semantic relation and a preset grammar rule to obtain a natural language text.

Wherein the identification execution unit 4034 is further configured to:

detecting special characters of the text data to obtain a first detection result, and detecting stop words of the text data according to a preset stop word list to obtain a second detection result;

judging whether the text data contains special characters or not according to the first detection result, and judging whether the text data contains stop words or not according to the second detection result;

and if the text data contains special characters, rejecting the special characters, and if the text data contains stop words, rejecting the stop words.

In the embodiment of the invention, the edge computing device cluster can realize the off-line computation and result return of complete audio data by relying on the computing power of an edge system, audio data, text data and natural language text in the processing process do not need to be exposed to an external network, and zero network load is realized by relying on an internal storage completely; in addition, the deployment of the edge computing device cluster is completely based on the existing computing devices in the conference place, if a user can provide a storage device by himself, the system can achieve zero hardware cost issue in an environment meeting implementation conditions, and the computing cost performance of the whole office scene is improved; in addition, after the edge computing device cluster is deployed in a conference place, the flow realization and workload of the conference summary are completely replaced by the edge computing device cluster, the manpower and material resource deployment is reduced, the coverage range is wide, and the processes of identification, generation summary and archiving can be automatically realized no matter whether the conference is large or small.

Fig. 4 and 5 describe the conference summary generation apparatus based on edge computation in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the conference summary generation apparatus based on edge computation in the embodiment of the present invention is described in detail from the perspective of hardware processing.

Fig. 6 is a schematic structural diagram of an edge-computing-based conference summary generation device 600 according to an embodiment of the present invention, which may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 610 (e.g., one or more processors) and a memory 620, and one or more storage media 630 (e.g., one or more mass storage devices) storing an application program 633 or data 632. Memory 620 and storage medium 630 may be, among other things, transient or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations in the edge-based computed conference summary generation apparatus 600. Still further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the edge-computing based conference summary generation apparatus 600.

The edge-computing based conference summary generation apparatus 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input-output interfaces 660, and/or one or more operating systems 631, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the edge-computing based conference summary generation facility architecture shown in FIG. 6 does not constitute a limitation of edge-computing based conference summary generation facilities, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

The invention also provides a conference summary generation device based on edge calculation, wherein the computer device comprises a memory and a processor, the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the conference summary generation method based on edge calculation in the above embodiments.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the edge-computing-based conference summary generation method.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A conference summary generation method based on edge calculation is characterized in that the conference summary generation method based on edge calculation comprises the following steps:

acquiring audio data acquired by a conference, and storing the audio data into an edge computing device cluster through a local area network, wherein a text translation model and a semantic recognition model are deployed in the edge computing device cluster;

performing distributed translation on the audio data by adopting the text translation model to obtain text data corresponding to the audio data;

performing distributed semantic recognition on the text data by adopting the semantic recognition model and a preset prior knowledge base to obtain a natural language text corresponding to the text data;

and combining the text data and the corresponding natural language text to obtain a conference summary, and storing the conference summary into a storage container of the edge computing device cluster.

2. The method of claim 1, wherein storing the audio data to a cluster of edge computing devices over a local area network comprises:

storing the audio data into an edge management device in the edge computing device cluster through a local area network;

and sharing the audio data stored in the edge management equipment to the edge node equipment in the edge computing equipment cluster through the local area network by adopting the edge management equipment.

3. The method for generating a conference summary based on edge computing according to claim 2, wherein the performing distributed translation on the audio data by using the text translation model to obtain the text data corresponding to the audio data includes:

determining translation tasks deployed by the text translation model on each edge node device according to translation setting information stored in the edge management device;

generating translation instructions corresponding to the translation tasks through the edge management equipment, and sending the translation instructions to corresponding edge node equipment;

according to each translation instruction, executing the text translation model on the corresponding edge node device in a distributed mode, and translating the audio data into the corresponding text data through the text translation model.

4. The method for generating a conference summary based on edge computing according to claim 2, wherein the performing distributed semantic recognition on the text data by using the semantic recognition model and a preset prior knowledge base to obtain the natural language text corresponding to the text data includes:

determining an identification task of the semantic identification model deployed in each edge node device according to identification setting information stored in the edge management device;

generating identification instructions corresponding to the identification tasks through the edge management equipment, and sending the identification instructions to corresponding edge node equipment;

determining an application scene of the text data through the edge management device, selecting a preset prior knowledge base corresponding to the application scene and sharing the preset prior knowledge base to each edge node device;

and according to each identification instruction, executing the semantic identification model on the corresponding edge node device in a distributed manner, and identifying the natural language text corresponding to the text data through the semantic identification model and the preset prior knowledge base.

5. The method of claim 3, wherein said translating the audio data into corresponding text data via the text translation model comprises:

6. The method of generating a conference summary based on edge computing according to claim 4, wherein the identifying the natural language text corresponding to the text data through the semantic identification model and the preset prior knowledge base includes:

7. The method for generating a conference summary based on edge computing according to claim 6, wherein before said adopting the semantic recognition model to divide the text data into a character string sequence and lexically process each element in the character string sequence to obtain a plurality of lexical elements, the method further comprises:

8. An edge-computing based conference summary generation apparatus, characterized in that the edge-computing based conference summary generation apparatus comprises:

the acquisition module is used for acquiring audio data acquired by a conference and storing the audio data into an edge computing equipment cluster through a local area network, wherein a text translation model and a semantic recognition model are deployed in the edge computing equipment cluster;

the translation module is used for performing distributed translation on the audio data by adopting the text translation model to obtain text data corresponding to the audio data;

the recognition module is used for carrying out distributed semantic recognition on the text data by adopting the semantic recognition model and a preset prior knowledge base to obtain a natural language text corresponding to the text data;

and the combination module is used for combining the text data and the corresponding natural language text to obtain a conference summary, and storing the conference summary into a storage container of the edge computing equipment cluster.

9. An edge-computing based conference summary generation apparatus, characterized in that the edge-computing based conference summary generation apparatus comprises: a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invoking the instructions in the memory to cause the edge-based computed conference summary generation apparatus to perform the steps of the edge-based computed conference summary generation method of any of claims 1-7.

10. A computer-readable storage medium having stored thereon instructions which, when executed by a processor, carry out the steps of the edge-computing based conference summary generation method according to any one of claims 1 to 7.