CN116645965A - Voice information processing method and device, electronic equipment and storage medium - Google Patents

Voice information processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116645965A
CN116645965A CN202310629036.5A CN202310629036A CN116645965A CN 116645965 A CN116645965 A CN 116645965A CN 202310629036 A CN202310629036 A CN 202310629036A CN 116645965 A CN116645965 A CN 116645965A
Authority
CN
China
Prior art keywords
information
text information
text
target
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310629036.5A
Other languages
Chinese (zh)
Inventor
陈亚楠
李文利
杜青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310629036.5A priority Critical patent/CN116645965A/en
Publication of CN116645965A publication Critical patent/CN116645965A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • G06F3/1454Digital output to display device ; Cooperation and interconnection of the display device with other functional units involving copying of the display data of a local workstation or window to a remote workstation or window so that an actual copy of the data is displayed simultaneously on two or more displays, e.g. teledisplay
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/03Protecting confidentiality, e.g. by encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/80Services using short range communication, e.g. near-field communication [NFC], radio-frequency identification [RFID] or low energy communication

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application discloses a voice information processing method, a voice information processing device, electronic equipment and a storage medium, and relates to the field of financial science and technology and other related technical fields. Wherein the method comprises the following steps: collecting voice information of N objects through an audio collecting device, and identifying the voice information of a target object from the voice information of the N objects; converting the voice information of the target object into first text information; identifying whether the information to be encrypted exists in the first text information; under the condition that the information to be encrypted exists in the first text information, carrying out data processing operation on the first text information to obtain second text information, wherein the data processing operation is used for encrypting the information to be encrypted in the first text information; and sending the second text information to the target screen throwing equipment for display. The application solves the technical problem of low information interaction efficiency caused by information interaction only through the audio device in the prior art.

Description

Voice information processing method and device, electronic equipment and storage medium
Technical Field
The application relates to the field of financial science and technology and other related technical fields, in particular to a method and device for processing voice information, electronic equipment and a storage medium.
Background
In the existing banking working scene, the safety of the banking staff and the banking clients is guaranteed by the glass on the counter, but the information interaction efficiency between the banking staff and the banking clients is reduced, for example, in the prior art, voice interaction is usually carried out between the banking staff and the banking clients through the microphone and the audio playing device on the counter, but the problem that the banking clients cannot hear the staff or cannot hear the words of the banking clients easily occurs in the information interaction mode, so that the information interaction efficiency of the two parties is lower.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The application provides a voice information processing method, a voice information processing device, electronic equipment and a storage medium, and aims to at least solve the technical problem of low information interaction efficiency caused by information interaction only through an audio device in the prior art.
According to an aspect of the present application, there is provided a method of processing voice information, including: collecting voice information of N objects through an audio collecting device, and identifying the voice information of a target object from the voice information of the N objects, wherein N is a positive integer, and the target object is an object corresponding to a target screen throwing device; converting the voice information of the target object into first text information; identifying whether information to be encrypted exists in the first text information, wherein the information to be encrypted is information which is forbidden to be displayed through the target screen throwing equipment; under the condition that the information to be encrypted exists in the first text information, carrying out data processing operation on the first text information to obtain second text information, wherein the data processing operation is used for encrypting the information to be encrypted in the first text information; and sending the second text information to the target screen throwing equipment for display.
Further, the processing method of the voice information further comprises the following steps: performing voice feature recognition processing on voice information of N objects through a target model to obtain voice features of each object, wherein the target model is a neural network model obtained by training by using voice information of M objects with known voice features as training samples, and the M objects at least comprise N objects; the voice information corresponding to the sound feature of the target object is set as the voice information of the target object.
Further, the processing method of the voice information further comprises the following steps: identifying first voice information in voice information of a target object, wherein the first voice information is voice information with volume smaller than preset volume; removing the first voice information from the voice information of the target object to obtain second voice information corresponding to the target object; the second speech information is converted into first text information.
Further, the processing method of the voice information further comprises the following steps: identifying whether first type keywords exist in the first text information, wherein the first type keywords are used for representing information which is forbidden to be displayed to a first object, and the first object is an object of the watching target screen throwing equipment; under the condition that a first type keyword exists in the first text information, determining information corresponding to the first type keyword in the first text information as information to be encrypted; and under the condition that the first type of keywords do not exist in the first text information, determining that the information to be encrypted does not exist in the first text information.
Further, the processing method of the voice information further comprises the following steps: after converting the voice information of the target object into first text information, detecting whether second type keywords exist in the first text information, wherein the second type keywords are keywords related to a service form, and the service form is a form which needs filling or auditing of the first object; under the condition that the second type of keywords exist in the first text information, sending the business form to target screen throwing equipment for display; and under the condition that the second type of keywords do not exist in the first text information, the service form is forbidden to be sent to the target screen throwing equipment for display.
Further, the processing method of the voice information further comprises the following steps: after converting the voice information of the target object into the first text information, detecting whether a third type of keywords exist in the first text information, wherein the third type of keywords are keywords related to the target financial product, and the target financial product is a financial product recommended to the first object by the target object; under the condition that a third type of keywords exist in the first text information, product information of a target financial product is sent to target screen throwing equipment for display; and under the condition that the third type of keywords are not present in the first text information, the product information of the target financial product is forbidden to be sent to the target screen throwing equipment for display.
Further, the processing method of the voice information further comprises the following steps: after the second text information is sent to the target screen projection equipment for display, acquiring the sound information of the first object; identifying whether a fourth type of keywords exist in the sound information of the first object, wherein the fourth type of keywords are used for representing consultation problems of the first object aiming at the second text information and provided for the target object; under the condition that a fourth type keyword exists in the sound information of the first object, performing a distinguishing text operation on target text information in the second text information, wherein the target text information is text information related to the fourth type keyword, and the distinguishing text operation is used for switching the text format of the target text information into a text format different from the text formats of other text information, and the other text information is text information except the target text information in the second text information; and prohibiting the text distinguishing operation on the target text information in the second text information under the condition that the fourth type keyword does not exist in the sound information of the first object.
According to another aspect of the present application, there is also provided a processing apparatus of voice information, including: the system comprises an acquisition module, a screen projection device and a screen projection device, wherein the acquisition module is used for acquiring voice information of N objects through an audio acquisition device and identifying the voice information of a target object from the voice information of the N objects, wherein N is a positive integer, and the target object is an object corresponding to the target screen projection device; the conversion module is used for converting the voice information of the target object into first text information; the identification module is used for identifying whether information to be encrypted exists in the first text information, wherein the information to be encrypted is information which is forbidden to be displayed through the target screen throwing equipment; the data processing module is used for carrying out data processing operation on the first text information to obtain second text information under the condition that the information to be encrypted exists in the first text information, wherein the data processing operation is used for encrypting the information to be encrypted in the first text information; and the information sending module is used for sending the second text information to the target screen throwing equipment for displaying.
According to another aspect of the present application, there is also provided a computer readable storage medium having a computer program stored therein, wherein the computer readable storage medium is controlled to execute the above-described method for processing voice information by a device in which the computer readable storage medium is located when the computer program is executed.
According to another aspect of the present application, there is also provided an electronic device, wherein the electronic device includes one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the above-described method for processing voice information.
In the application, the voice information of the target object is converted into text information, the voice information of N objects is acquired through the audio acquisition equipment, the voice information of the target object is identified from the voice information of N objects, then the voice information of the target object is converted into first text information, whether the first text information contains information to be encrypted or not is identified, the first text information is subjected to data processing operation under the condition that the first text information contains the information to be encrypted, so as to obtain second text information, and finally the second text information is sent to the target screen throwing equipment for display. Wherein N is a positive integer, and the target object is an object corresponding to the target screen-throwing equipment; the information to be encrypted is information which is forbidden to be displayed through the target screen throwing equipment; the data processing operation is used for encrypting the information to be encrypted in the first text information.
As can be seen from the above, the method and the device for text interaction provided by the application can solve the technical problem of low information interaction efficiency existing in the prior art that the information interaction is performed only through the audio device by converting the voice information of the target object into the text information and displaying the text information on the target screen-throwing device corresponding to the target object. In addition, the application can also recognize the voice information of the target object from the voice information of the N objects, thereby avoiding the voice information of other bank staff who do not correspond to the target screen throwing equipment from being displayed on the target screen throwing equipment. In addition, the method and the device can also identify whether the information to be encrypted exists in the first text information, and perform data encryption processing on the first text information under the condition that the information to be encrypted exists in the first text information, so that the problem of poor information security caused by indiscriminate conversion of the voice information of the target object into the text information is avoided.
Therefore, the technical scheme of the application achieves the aim of information interaction through a plurality of interaction modes, thereby realizing the technical effect of improving the information interaction efficiency, and further solving the technical problem of low information interaction efficiency caused by information interaction only through the audio device in the prior art.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a flow chart of an alternative method of processing voice information according to an embodiment of the application;
FIG. 2 is a flowchart of an alternative method for converting speech information of a target object to first text information according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an alternative first device according to an embodiment of the application;
FIG. 4 is a schematic diagram of an alternative second device according to an embodiment of the application;
FIG. 5 is a flow chart of another method of processing voice information according to an embodiment of the present application;
FIG. 6 is a schematic diagram of an alternative speech information processing device provided in accordance with an embodiment of the present application;
fig. 7 is a schematic diagram of an electronic device according to an embodiment of the application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, the related information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.
The application is further illustrated below in conjunction with the examples.
Example 1
According to an embodiment of the present application, there is provided an embodiment of a method of processing voice information, it being noted that the steps shown in the flowcharts of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.
Fig. 1 is a flowchart of an alternative method for processing voice information according to an embodiment of the present application, as shown in fig. 1, the method includes the steps of:
step S101, voice information of N objects is collected through an audio collection device, and voice information of a target object is identified from the voice information of the N objects.
In step S101, N is a positive integer, and the target object is an object corresponding to the target screen projection device.
Optionally, the method for processing voice information in the embodiment of the application can be applied to a scene of business handling for a banking user in a banking website. In the existing banking outlets, a plurality of counters are generally provided, wherein each counter is provided with glass, and two sides of the glass are respectively provided with seats for banking staff and banking users, so that the banking staff and the banking users can communicate with each other through the glass, and the arrangement is used for guaranteeing personal safety and fund safety of the banking staff and the banking users.
Further, in the prior art, in order to ensure that information exchange can be performed between the bank staff and the bank users on two sides of the glass, an audio device is further arranged on the counter and used for collecting the sound of the bank staff and playing the sound to the bank users and collecting the sound of the bank users and playing the sound to the bank staff. On the basis, the application provides a voice information processing system for executing the voice information processing method in the embodiment of the application, wherein the voice information processing system can be operated in an audio device on a counter in the form of a software system or an embedded system.
It should be noted that, each business handling window on the counter is provided with an audio device and a screen throwing device, and the bank staff using the business handling window has a corresponding relationship with the audio device and the screen throwing device of the business handling window. For example, a counter at a banking website has 4 windows, and a banking staff member responsible for the 4 windows is an object 1, an object 2, an object 3, and an object 4, wherein the object 1 is responsible for the window a and corresponds to an audio device a-1 and a screen throwing device a-2 of the window a, the object 2 is responsible for the window B and corresponds to an audio device B-1 and a screen throwing device B-2 of the window B, the object 3 is responsible for the window C and corresponds to an audio device C-1 and a screen throwing device C-2 of the window C, and the object 4 is responsible for the window D and corresponds to an audio device D-1 and a screen throwing device D-2 of the window D. Thus, in this example, at least object 1, object 2, object 3, and object 4 are included in the N objects.
Further, the audio capturing device may be an audio capturing device in any one of the audio apparatuses of the window, for example, for the window a, the audio capturing device corresponding to the window a may be a microphone a-1-1 for capturing the sound of the bank staff in the audio apparatus a-1, the object 1 is a target object corresponding to the microphone a-1-1, and the object 1 also corresponds to the screen projection device a-2 of the window a.
It is easy to understand that when a bank client transacts business in the window a, it only needs to communicate with the object 1, so when the microphone a-1-1 receives the voice information of a plurality of objects, the voice information of the plurality of objects except the voice information of the object 1 is redundant voice information, and the bank client of the window a does not need to know the redundant voice information. In addition, the voice information of other objects may also relate to privacy data, for example, the voice information of the object 2 may include privacy data of the bank client of the window B, and thus if the voice information of the object 2 is forwarded to the bank user of the window a, a problem of revealing the privacy data of the bank client of the window B may also be caused. On the basis, the application recognizes the voice information of the target object from the voice information of the N objects, and the target object is the object corresponding to the target screen projection equipment, so that not only can the bank client be prevented from receiving redundant voice information, but also the information security of the bank client can be improved.
Step S102, voice information of the target object is converted into first text information.
Alternatively, in step S102, after recognizing the voice information of the target object, the voice information processing system may perform voice-to-text processing on the voice information of the target object to obtain text information (first text information) corresponding to the voice information of the target object.
Step S103, identifying whether there is information to be encrypted in the first text information.
In step S103, the information to be encrypted is information that is prohibited from being publicly presented by the target screen-projecting device.
Optionally, in some scenarios, some private data may exist in the voice information of the target object, for example, when the target object is the object 1, the object 1 may need assistance of other objects when handling some services, for example, the object 1 needs a website responsible person of a banking website to perform authority confirmation, so in the communication process of the object 1 and other objects, some information that is not allowed to be disclosed outside inside the bank may be involved, and if the information is also directly displayed on the target screen-throwing device, the risk of information leakage may be likely to be caused. In addition, the information to be encrypted may be private data of a bank client, for example, when the object 1 communicates with the bank client of the window a, the private data such as an identification card number of the bank client may be confirmed in a voice manner, and since the information may be seen by other bank clients when being displayed on the screen, the information may be encrypted and displayed as the information to be encrypted.
Step S104, under the condition that the information to be encrypted exists in the first text information, carrying out data processing operation on the first text information to obtain second text information.
In step S104, the data processing operation is used for encrypting information to be encrypted in the first text information.
Optionally, after recognizing that the information to be encrypted exists in the first text information, the voice information processing system encrypts the information to be encrypted through a preset encryption algorithm, for example, uniformly converts the information to be encrypted into special characters.
And step S105, the second text information is sent to the target screen projection equipment for display.
Optionally, after the second text information is generated, the voice information processing system sends the second text information to the target screen projection device for display, so that the bank client can watch the second text information. It should be noted that the target projection device may be a display screen.
Based on the content of the steps S101 to S105, in the present application, the voice information of the target object is collected by the audio collecting device in a manner of converting the voice information of the target object into text information, the voice information of the target object is identified from the voice information of the N objects, then the voice information of the target object is converted into first text information, then whether the first text information has information to be encrypted or not is identified, and when the first text information has information to be encrypted, the data processing operation is performed on the first text information to obtain second text information, and finally the second text information is sent to the target screen projection device for displaying. Wherein N is a positive integer, and the target object is an object corresponding to the target screen-throwing equipment; the information to be encrypted is information which is forbidden to be displayed through the target screen throwing equipment; the data processing operation is used for encrypting the information to be encrypted in the first text information.
As can be seen from the above, the method and the device for text interaction provided by the application can solve the technical problem of low information interaction efficiency existing in the prior art that the information interaction is performed only through the audio device by converting the voice information of the target object into the text information and displaying the text information on the target screen-throwing device corresponding to the target object. In addition, the application can also recognize the voice information of the target object from the voice information of the N objects, thereby avoiding the voice information of other bank staff who do not correspond to the target screen throwing equipment from being displayed on the target screen throwing equipment. In addition, the method and the device can also identify whether the information to be encrypted exists in the first text information, and perform data encryption processing on the first text information under the condition that the information to be encrypted exists in the first text information, so that the problem of poor information security caused by indiscriminate conversion of the voice information of the target object into the text information is avoided.
Therefore, the technical scheme of the application achieves the aim of information interaction through a plurality of interaction modes, thereby realizing the technical effect of improving the information interaction efficiency, and further solving the technical problem of low information interaction efficiency caused by information interaction only through the audio device in the prior art.
In an alternative embodiment, the voice information processing system may perform voice feature recognition processing on voice information of N objects through a target model to obtain a voice feature of each object, where the target model is a neural network model trained by using voice information of M objects with known voice features as training samples, and the M objects include at least N objects. Then, the voice information processing system takes the voice information corresponding to the sound feature of the target object as the voice information of the target object.
Optionally, in order to obtain the target model, voice information of M objects (such as all employees) of the banking website may be collected in advance as a training sample, then, a voice feature of each object is extracted according to the voice information of the object, so as to obtain a voice feature of each object, the voice feature of each object is used as tag data corresponding to the voice information of the object, and then, the voice information of each object and the tag data corresponding to each voice information are input into a preset deep learning neural network and are subjected to repeated iterative training, so as to obtain the target model. The target model can output sound characteristics corresponding to each piece of voice information according to the plurality of pieces of input voice information.
Alternatively, the target model is disposed in the speech information processing system, and the speech information processing system, knowing the target object corresponding to the current window, takes, as the speech information of the target object, the speech information corresponding to the sound features of the target object based on the sound features of each object output by the target model.
In an alternative embodiment, fig. 2 shows an alternative flowchart for converting voice information of a target object into first text information according to an embodiment of the present application, including the steps of:
step S201, identifying first voice information in voice information of a target object, wherein the first voice information is voice information with volume smaller than preset volume;
step S202, removing first voice information from voice information of a target object to obtain second voice information corresponding to the target object;
step S203, the second voice information is converted into the first text information.
Optionally, the target object may make some sounds with smaller volume, such as small sound with broken words, light cough, etc., which are insignificant sounds for the bank clients, so that the voice information processing system determines the voice information with volume smaller than the preset volume as the first voice information, filters the first voice information from the voice information of the target object, and uses the remaining voice information as the second voice information of the target object. Finally, the voice information processing system converts the second voice information into the first text information.
In an alternative embodiment, the voice information processing system may further identify whether a first type of keyword exists in the first text information, where the first type of keyword is used to characterize information that is prohibited from being presented to the first object, and the first object is an object of the viewing target screen device. Under the condition that a first type keyword exists in the first text information, the voice information processing system determines information corresponding to the first type keyword in the first text information as information to be encrypted; and under the condition that the first type of keywords do not exist in the first text information, the voice information processing system determines that the information to be encrypted does not exist in the first text information.
Alternatively, the first type of keyword may be a keyword related to privacy data inside the bank, for example, a money reserve amount of the bank, a money reserve place of the bank, a money transport place of the bank, identity information of a bank staff, and the like. The first type of keywords may also be keywords related to privacy data of a banking client, such as identity information of the banking client, deposit amount of the banking client, residence place of the banking client, and the like.
In order that the private data in the bank and the private data of the bank client are not disclosed and displayed on the target screen device, the voice information processing system identifies the private data in the bank and/or the private data of the bank client possibly existing in the first text information by identifying the first type of keywords, and then encrypts the private data as information to be encrypted in the subsequent display process.
In an alternative embodiment, after converting the voice information of the target object into the first text information, the voice information processing system may further detect whether a second type of keyword exists in the first text information, where the second type of keyword is a keyword related to a service form, and the service form is a form that needs to be filled in or audited by the first object. Under the condition that the second type of keywords exist in the first text information, the voice information processing system sends the business form to target screen throwing equipment for display; and under the condition that the second type of keywords are not present in the first text information, the voice information processing system prohibits the service form from being sent to the target screen throwing equipment for display.
Optionally, the second keywords may be keywords related to a business form that needs to be filled in or audited by the first object, for example, "filling in", "form", "credit card application form", "bank card application form", and so on, and in order to show the business form to the customer, the voice information processing system may automatically call, when recognizing that the second type of keywords exist in the first text information, from the database, that the form corresponding to the second type of keywords is shown on the target screen-throwing device. For example, if the speech information processing system recognizes that the keyword "credit card application form" exists in the first text information, then a credit card application form in the form of an electronic form is automatically invoked to be displayed on the target screen-throwing device.
In an alternative embodiment, after converting the voice information of the target object into the first text information, the voice information processing system detects whether a third type of keyword exists in the first text information, where the third type of keyword is a keyword related to the target financial product, and the target financial product is a financial product recommended to the first object by the target object. Under the condition that a third type of keywords exist in the first text information, the voice information processing system sends product information of a target financial product to target screen throwing equipment for display; and under the condition that the third type of keywords are not present in the first text information, the voice information processing system prohibits the product information of the target financial product from being sent to the target screen throwing equipment for display.
Optionally, during the transaction of some business scenarios, the target object may recommend some financial products to the banking client, and in order to more conveniently display the product information of these financial products, the voice information processing system of the present application may identify a third type of keyword related to the target financial product, for example, the name of the target financial product. If the voice information processing system recognizes that the first text information contains the keyword of the name of the target financial product, the voice information processing system automatically reads the product information of the target financial product from the database and displays the product information on the target screen throwing equipment.
In an optional embodiment, after the second text information is sent to the target screen device for display, the voice information processing system may further obtain the voice information of the first object, and identify whether a fourth type of keyword exists in the voice information of the first object, where the fourth type of keyword is used to characterize a consultation problem of the first object to the target object with respect to the second text information. And under the condition that the fourth type of keywords exist in the sound information of the first object, the voice information processing system performs a distinguishing text operation on target text information in the second text information, wherein the target text information is text information related to the fourth type of keywords, and the distinguishing text operation is used for switching the text format of the target text information into a text format different from the text formats of other text information, and the other text information is text information except the target text information in the second text information. In the case where the fourth type of keyword does not exist in the sound information of the first object, the voice information processing system prohibits the discriminating text operation on the target text information in the second text information.
Optionally, when the first object views the second text information on the target projection device, some contents may be questioned by the first object, so the first object may query the target object for these questioning information, so that, to better facilitate the target object to understand the questioning contents of the first object and improve understanding efficiency of the target object, the voice information processing system may acquire the voice information of the first object, and then identify whether a fourth type keyword exists in the voice information of the first object, where the fourth type keyword is used to characterize a consultation problem posed by the first object to the target object with respect to the second text information, and for example, the fourth type keyword may be "what meaning of xx paragraph", "what meaning of xx paragraph" and so on.
In the case where the speech information processing system recognizes that the fourth type of keyword exists in the sound information of the first object, the speech information processing system performs a distinguishing text operation on the target text information in the second text information, for example, performs a text operation such as font thickening, font enlarging, color highlighting, and the like on the target text information.
By performing text distinguishing operation on the target text information, the target object can quickly know the questionable content of the first object by watching the target screen throwing equipment, and the first object can also be combined with the target text information to understand again, so that the technical effect of improving the service handling efficiency is achieved.
In an alternative embodiment, the voice information system may include a first device and a second device, where the first device is a device used by a bank customer, the second device is a device used by a bank staff, and both the first device and the second device may be used as an audio acquisition device and a screen throwing device.
Optionally, as shown in fig. 3, the first device includes at least bluetooth, a display screen, and a radio receiver. The Bluetooth is used for ensuring that a first device corresponding to the same window and a second device are connected one to one, the display screen is used for displaying second text information converted from voice information of a target object, and the radio device is used for collecting voice information of a bank client.
Optionally, as shown in fig. 4, the second device includes at least an external radio, an external device line, bluetooth, and a display screen. The external radio is used for collecting voice information of a bank worker (such as a target object), and then the processor runs a program to convert the voice information of the target object into second text information and display the second text information on the display screen; the external equipment wire is used for connecting external equipment such as a mouse, a keyboard and the like, and when the voice-to-text error occurs, the error content can be changed through the external equipment; bluetooth, be used for guaranteeing that second equipment has carried out one-to-one connection with first equipment.
In an alternative embodiment, fig. 5 shows a flowchart of another method for processing voice information according to an embodiment of the present application, as shown in fig. 5, including the steps of:
s501: detecting whether the teller equipment (second equipment) and the client equipment (first equipment) are connected together through Bluetooth, so that characters converted from voice can be displayed on the teller equipment and the client equipment at the same time, and the fluency of information communication is guaranteed;
s502: under the condition that an external radio of the teller device is connected with a display screen of the teller device, displaying 'the radio is connected with the teller device' on the display screen, so that the teller device can receive voice information of the teller;
S503: the teller (target object) starts speaking, the processing method of the voice information is executed by the built-in processor of the teller equipment, the voice of the teller is converted into second text information, and the second text information is stored in the storage module;
s504: synchronously displaying the characters in the storage module on display screens of teller equipment and client equipment;
s505: the teller can check whether the second text information has the same meaning as the text information which the teller wants to express, and if the second text information has errors, the second text information can be modified through a touch screen or external equipment, so that misunderstanding of a customer is avoided;
s506: the customers take the voice of the hearing teller as the main part and watch the second text information on the display screen as the auxiliary part, and the combination of viewing and hearing avoids misoperation caused by the fact that the customers cannot hear the words of the hearing teller, and improves the communication efficiency;
s507: after a customer is served, the teller equipment files, saves and records the customer voice, teller voice and converted text in time, so that the later transaction disputes are avoided;
s508: if a financial product recommending link exists, product information of the financial product is displayed in display screens of teller equipment and client equipment;
s509: the device refreshes the page waiting for the arrival of the next client.
As can be seen from the above, the method and the device for text interaction provided by the application can solve the technical problem of low information interaction efficiency existing in the prior art that the information interaction is performed only through the audio device by converting the voice information of the target object into the text information and displaying the text information on the target screen-throwing device corresponding to the target object. In addition, the application can also recognize the voice information of the target object from the voice information of the N objects, thereby avoiding the voice information of other bank staff who do not correspond to the target screen throwing equipment from being displayed on the target screen throwing equipment. In addition, the method and the device can also identify whether the information to be encrypted exists in the first text information, and perform data encryption processing on the first text information under the condition that the information to be encrypted exists in the first text information, so that the problem of poor information security caused by indiscriminate conversion of the voice information of the target object into the text information is avoided.
Example 2
The present embodiment provides an optional voice information processing apparatus, in which each implementation unit/module corresponds to each implementation step in the first embodiment.
Fig. 6 is a schematic diagram of an alternative voice information processing apparatus according to an embodiment of the present application, as shown in fig. 6, including: the device comprises an acquisition module 601, a conversion module 602, an identification module 603, a data processing module 604 and an information sending module 605.
Specifically, the collection module 601 is configured to collect voice information of N objects through an audio collection device, and identify voice information of a target object from the voice information of the N objects, where N is a positive integer, and the target object is an object corresponding to a target screen projection device; a conversion module 602, configured to convert voice information of a target object into first text information; the identifying module 603 is configured to identify whether information to be encrypted exists in the first text information, where the information to be encrypted is information that is prohibited from being displayed by the target screen-throwing device in a public manner; the data processing module 604 is configured to perform a data processing operation on the first text information to obtain second text information when the information to be encrypted exists in the first text information, where the data processing operation is configured to encrypt the information to be encrypted in the first text information; and the information sending module 605 is used for sending the second text information to the target screen projection device for display.
Optionally, the identification module includes: a voice feature recognition unit and a voice information determination unit. The voice feature recognition unit is used for carrying out voice feature recognition processing on voice information of N objects through a target model to obtain the voice feature of each object, wherein the target model is a neural network model obtained by training by using voice information of M objects with known voice features as training samples, and the M objects at least comprise N objects; and a voice information determination unit configured to take voice information corresponding to the sound feature of the target object as the voice information of the target object.
Optionally, the conversion module includes: the device comprises a first recognition unit, a first voice information processing unit and an information conversion unit. The first recognition unit is used for recognizing first voice information in voice information of the target object, wherein the first voice information is voice information with volume smaller than preset volume; the first voice information processing unit is used for removing the first voice information from the voice information of the target object to obtain second voice information corresponding to the target object; and an information conversion unit for converting the second voice information into the first text information.
Optionally, the identification module includes: the device comprises a second identification unit, a first determination unit and a second determination unit. The second recognition unit is used for recognizing whether a first type of keywords exist in the first text information, wherein the first type of keywords are used for representing information which is forbidden to be displayed to a first object, and the first object is an object of the watching target screen throwing equipment; the first determining unit is used for determining that information corresponding to the first type of keywords in the first text information is information to be encrypted under the condition that the first type of keywords exist in the first text information; and the second determining unit is used for determining that the information to be encrypted does not exist in the first text information under the condition that the first type of keywords do not exist in the first text information.
Optionally, the processing device of voice information further includes: the device comprises a first detection module, a first processing module and a second processing module. The first detection module is used for detecting whether second type keywords exist in the first text information, wherein the second type keywords are keywords related to a service form, and the service form is a form which needs to be filled or checked by a first object; the first processing module is used for sending the business form to the target screen projection equipment for display under the condition that the second type of keywords exist in the first text information; and the second processing module is used for prohibiting the service form from being sent to the target screen throwing equipment for display under the condition that the second type of keywords do not exist in the first text information.
Optionally, the processing device of voice information further includes: the device comprises a second detection module, a third processing module and a fourth processing module. The second detection module is used for detecting whether a third type of keywords exist in the first text information, wherein the third type of keywords are keywords related to target financial products, and the target financial products are financial products recommended to the first object by the target object; the third processing module is used for sending the product information of the target financial product to the target screen throwing equipment for display under the condition that the third type of keywords exist in the first text information; and the fourth processing module is used for prohibiting the product information of the target financial product from being sent to the target screen throwing equipment for display under the condition that the third type of keywords are not present in the first text information.
Optionally, the processing device of voice information further includes: the device comprises an acquisition module, a first identification module, a fifth processing module and a sixth processing module. The acquisition module is used for acquiring sound information of the first object; the first recognition module is used for recognizing whether a fourth type of keywords exist in the sound information of the first object, wherein the fourth type of keywords are used for representing the consultation problem of the first object, which is presented to the target object aiming at the second text information; a fifth processing module, configured to perform a distinguishing text operation on the target text information in the second text information when the fourth type of keyword exists in the sound information of the first object, where the target text information is text information related to the fourth type of keyword, and the distinguishing text operation is configured to switch a text format of the target text information to a text format different from a text format of other text information, where the other text information is text information other than the target text information in the second text information; and the sixth processing module is used for prohibiting the distinguishing text operation of the target text information in the second text information under the condition that the fourth type keyword does not exist in the sound information of the first object.
Example 3
According to another aspect of the embodiment of the present application, there is also provided an electronic device, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of processing speech information of any of the above-described embodiments 1 via execution of executable instructions.
Fig. 7 is a schematic diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 7, an embodiment of the present application provides an electronic device, where the electronic device includes a processor, a memory, and a program stored on the memory and executable on the processor, and the processor implements the processing method of voice information in embodiment 1 when executing the program.
Example 4
According to another aspect of the embodiment of the present application, there is also provided a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where the computer program when executed controls a device in which the computer readable storage medium is located to execute the method for processing voice information in embodiment 1.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of units may be a logic function division, and there may be another division manner in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims (10)

1. A method for processing voice information, comprising:
collecting voice information of N objects through an audio collecting device, and identifying the voice information of a target object from the voice information of the N objects, wherein N is a positive integer, and the target object is an object corresponding to a target screen throwing device;
converting the voice information of the target object into first text information;
identifying whether information to be encrypted exists in the first text information, wherein the information to be encrypted is information which is forbidden to be displayed through the target screen projection equipment;
under the condition that the information to be encrypted exists in the first text information, carrying out data processing operation on the first text information to obtain second text information, wherein the data processing operation is used for encrypting the information to be encrypted in the first text information;
and sending the second text information to the target screen projection equipment for display.
2. The method of claim 1, wherein identifying the speech information of the target object from the speech information of the N objects comprises:
performing voice feature recognition processing on voice information of the N objects through a target model to obtain voice features of each object, wherein the target model is a neural network model trained by using voice information of M objects with known voice features as training samples, and the M objects at least comprise the N objects;
And taking the voice information corresponding to the voice characteristics of the target object as the voice information of the target object.
3. The method of claim 1, wherein converting the speech information of the target object into the first text information comprises:
identifying first voice information in the voice information of the target object, wherein the first voice information is voice information with volume smaller than preset volume;
removing the first voice information from the voice information of the target object to obtain second voice information corresponding to the target object;
and converting the second voice information into the first text information.
4. The method of claim 1, wherein identifying whether information to be encrypted is present in the first text information comprises:
identifying whether a first type of keywords exist in the first text information, wherein the first type of keywords are used for representing information which is forbidden to be displayed to a first object, and the first object is an object for watching the target screen-throwing equipment;
under the condition that the first type of keywords exist in the first text information, determining that information corresponding to the first type of keywords in the first text information is the information to be encrypted;
And under the condition that the first type of keywords are not existed in the first text information, determining that the information to be encrypted is not existed in the first text information.
5. The method of claim 4, wherein after converting the speech information of the target object to the first text information, the method further comprises:
detecting whether a second type of keywords exist in the first text information, wherein the second type of keywords are keywords related to a service form, and the service form is a form which needs to be filled in or checked by the first object;
under the condition that the second type of keywords exist in the first text information, the business form is sent to the target screen projection equipment for display;
and under the condition that the second type of keywords do not exist in the first text information, the service form is forbidden to be sent to the target screen projection equipment for display.
6. The method of claim 4, wherein after converting the speech information of the target object to the first text information, the method further comprises:
detecting whether a third type of keywords exist in the first text information, wherein the third type of keywords are keywords related to a target financial product, and the target financial product is a financial product recommended to the first object by the target object;
Under the condition that the third type of keywords exist in the first text information, product information of the target financial product is sent to the target screen throwing equipment for display;
and under the condition that the third type of keywords do not exist in the first text information, the product information of the target financial product is forbidden to be sent to the target screen projection equipment for display.
7. The method of claim 4, wherein after sending the second text information to the target projection device for presentation, the method further comprises:
acquiring sound information of the first object;
identifying whether a fourth type of keywords exist in the sound information of the first object, wherein the fourth type of keywords are used for representing consultation problems of the first object aiming at the second text information and provided for the target object;
performing a distinguishing text operation on target text information in the second text information under the condition that the fourth type keyword exists in the sound information of the first object, wherein the target text information is text information related to the fourth type keyword, and the distinguishing text operation is used for switching the text format of the target text information into a text format different from the text formats of other text information, and the other text information is text information except the target text information in the second text information;
And prohibiting the distinguishing text operation on the target text information in the second text information under the condition that the fourth type keyword does not exist in the sound information of the first object.
8. A processing apparatus for voice information, comprising:
the system comprises an acquisition module, an audio acquisition device and a screen projection device, wherein the acquisition module is used for acquiring voice information of N objects through the audio acquisition device and identifying the voice information of a target object from the voice information of the N objects, N is a positive integer, and the target object is an object corresponding to the target screen projection device;
the conversion module is used for converting the voice information of the target object into first text information;
the identification module is used for identifying whether information to be encrypted exists in the first text information, wherein the information to be encrypted is information which is forbidden to be displayed through the target screen projection equipment;
the data processing module is used for carrying out data processing operation on the first text information to obtain second text information under the condition that the information to be encrypted exists in the first text information, wherein the data processing operation is used for encrypting the information to be encrypted in the first text information;
And the information sending module is used for sending the second text information to the target screen projection equipment for display.
9. A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and wherein the computer program, when executed, controls a device in which the computer-readable storage medium is located to perform the method for processing voice information according to any one of claims 1 to 7.
10. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of processing speech information of any of claims 1-7.
CN202310629036.5A 2023-05-30 2023-05-30 Voice information processing method and device, electronic equipment and storage medium Pending CN116645965A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310629036.5A CN116645965A (en) 2023-05-30 2023-05-30 Voice information processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310629036.5A CN116645965A (en) 2023-05-30 2023-05-30 Voice information processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116645965A true CN116645965A (en) 2023-08-25

Family

ID=87643140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310629036.5A Pending CN116645965A (en) 2023-05-30 2023-05-30 Voice information processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116645965A (en)

Similar Documents

Publication Publication Date Title
US11017406B2 (en) Multi factor authentication rule-based intelligent bank cards
US10028081B2 (en) User authentication
CN112651841B (en) Online business handling method, online business handling device, server and computer readable storage medium
AU2015271025B2 (en) Systems and methods for provisioning transaction data to mobile communications devices
US20080192901A1 (en) Digital Process and Arrangement for Authenticating a User of a Telecommunications or Data Network
CN107240022B (en) Insurance information processing method, device and system
CN107093066A (en) Service implementation method and device
CN103310339A (en) Identity recognition device and method as well as payment system and method
DK3176779T3 (en) SYSTEMS AND METHODS FOR SENSITIVE AUDIO ZONE RANGE
CN108763898A (en) A kind of information processing method and system
US20190075097A1 (en) Verification system
US11356469B2 (en) Method and apparatus for estimating monetary impact of cyber attacks
CN114666135A (en) Data encryption method and device, electronic equipment and storage medium
US11995723B1 (en) Systems and methods for administrating a certificate of deposit
CN106228365A (en) A kind of method of payment and device
CN116645965A (en) Voice information processing method and device, electronic equipment and storage medium
US20140067602A1 (en) Sanctions Screening
CN113782035A (en) Service processing method and device, electronic equipment and storage medium
JP2019149027A (en) Automatic bill-splitting settlement system by face authentication technique
US20130054345A1 (en) Data mining
US11551203B2 (en) Retrieving hidden digital identifier
US9646437B2 (en) Method of generating a temporarily limited and/or usage limited means and/or status, method of obtaining a temporarily limited and/or usage limited means and/or status, corresponding system and computer readable medium
CN107317679B (en) Method and system for preventing fraud after identity cards are lost
CN111369264A (en) Entity association method, device, equipment and computer readable storage medium
US20190073469A1 (en) Verification system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination