CN112420049A - Data processing method, device and storage medium - Google Patents
Data processing method, device and storage medium Download PDFInfo
- Publication number
- CN112420049A CN112420049A CN202011228484.7A CN202011228484A CN112420049A CN 112420049 A CN112420049 A CN 112420049A CN 202011228484 A CN202011228484 A CN 202011228484A CN 112420049 A CN112420049 A CN 112420049A
- Authority
- CN
- China
- Prior art keywords
- voice
- information
- user
- intention
- voice information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title abstract description 15
- 238000000034 method Methods 0.000 claims abstract description 33
- 238000012545 processing Methods 0.000 claims abstract description 20
- 230000008451 emotion Effects 0.000 claims description 60
- 238000013507 mapping Methods 0.000 claims description 41
- 238000001914 filtration Methods 0.000 claims description 16
- 238000005070 sampling Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 13
- 230000015654 memory Effects 0.000 claims description 13
- 210000004556 brain Anatomy 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 7
- 238000012216 screening Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 230000003993 interaction Effects 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000013441 quality evaluation Methods 0.000 description 20
- 238000011156 evaluation Methods 0.000 description 16
- 230000006870 function Effects 0.000 description 10
- 238000012360 testing method Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000036651 mood Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000000994 depressogenic effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The present application relates to data processing technologies, and in particular, to a data processing method, apparatus, and storage medium, where the method includes: acquiring voice information responded by a user aiming at the question information; converting the voice information into text information, and displaying the text information; determining the operation intention of the user expressed in the voice information according to the text information; and executing corresponding target operation according to the operation intention. By adopting the embodiment of the application, the user intention can be recognized through the voice of the user, and the operation corresponding to the intention is executed, so that the improvement of the human-computer interaction experience is facilitated.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, and storage medium.
Background
Taking an intelligent voice project as an example, the project needs to test whether the strategy of the AI robot is correct. Particularly, in the process of service acceptance testing, more test scenes are covered, and one-pass conversation usually needs more than 10 minutes in a mobile phone call incoming mode. What can i see is the desire to do business acceptance testing sensorially? What can i hear? They do not focus on the code of your code layer, or your automated test layer. The people update the experience of believing that the people are seeing the experience, and when the people cannot input enough manpower to listen, the problems are solved from the aspect of viewing, and the problem of how to improve the communication between the AI robot and the outside, namely the experience sense of man-machine interaction, is urgently needed to be solved.
Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing device and a storage medium, which can improve human-computer interaction experience.
In a first aspect, an embodiment of the present application provides a data processing method, where the method includes:
acquiring voice information responded by a user aiming at the question information;
converting the voice information into text information, and displaying the text information;
determining the operation intention of the user expressed in the voice information according to the text information;
and executing corresponding target operation according to the operation intention.
In a second aspect, an embodiment of the present application provides a data processing apparatus, where the apparatus includes: an acquisition unit, a conversion unit, a determination unit and an execution unit, wherein,
the acquisition unit is used for acquiring voice information responded by a user aiming at the question information;
the conversion unit is used for converting the voice information into text information and displaying the text information;
the determining unit is used for determining the operation intention of the user expressed in the voice information according to the text information;
and the execution unit is used for executing corresponding target operation according to the operation intention.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for executing the steps in the first aspect of the embodiment of the present application.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program enables a computer to perform some or all of the steps described in the first aspect of the embodiment of the present application.
In a fifth aspect, embodiments of the present application provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, where the computer program is operable to cause a computer to perform some or all of the steps as described in the first aspect of the embodiments of the present application. The computer program product may be a software installation package.
The embodiment of the application has the following beneficial effects:
it can be seen that the data processing method, the data processing apparatus, and the storage medium described in the embodiments of the present application acquire the voice information responded by the user with respect to the question information, convert the voice information into the text information, display the text information, determine the operation intention of the user expressed in the voice information according to the text information, and execute the corresponding target operation according to the operation intention.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a data processing method provided in an embodiment of the present application;
FIG. 2 is a schematic flow chart diagram of another data processing method provided in the embodiments of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 4 is a block diagram of functional units of a data processing apparatus according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The electronic device according to the embodiment of the present application may include various handheld devices (such as a mobile phone, a tablet computer, a POS machine, etc.) having a wireless communication function, a desktop computer, an in-vehicle device, a wearable device (a smart watch, a smart bracelet, a wireless headset, an augmented reality/virtual reality device, smart glasses), an AI robot, a computing device, or other processing devices connected to a wireless modem, and various forms of User Equipment (UE), a Mobile Station (MS), a terminal device (terminal device), etc. For convenience of description, the above-mentioned devices are collectively referred to as electronic devices.
The following describes embodiments of the present application in detail.
Referring to fig. 1, fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present application, and as shown in the figure, the data processing method is applied to an electronic device, and includes:
101. and acquiring voice information responded by the user aiming at the question information.
The user can be a recording user. For example, the questioning information may be: is deposit required? As another example, is deposit confirmed by 50 ten thousand dollars? And so on. The dialog message may be a reply to a preset message, the preset message may be a question asked by the electronic device, and the voice message is a reply to the preset message. Taking an electronic device as an AI robot as an example, the AI robot may include a camera, and the camera may be used to implement gesture recognition; as another example, it may include an acoustic sensor for recognizing speech recognition; as another example, it may include a touch sensor or a keyboard that can receive text input by a user; as another example, it may include a brain wave sensor, which may be used to acquire brain wave signals.
Optionally, in the step 101, acquiring the voice information to which the user replies to the question information may include the following steps:
11. acquiring a voice set of a preset time period, wherein the voice set comprises a plurality of voice segments, and each voice segment corresponds to an identity tag used for representing a user;
12. acquiring an identity label of the user;
13. and screening at least one voice segment corresponding to the identity tag in the voice set according to the identity tag, and taking the screened voice segment as the voice information of the user.
Wherein, the preset time period can be set by the user or the default of the system. In the embodiment of the present application, the tag may be used for marking a category, and the tag may be at least one of the following: number, name, category, time, etc., without limitation.
In specific implementation, the electronic device may obtain a voice set of a preset time period, where the voice set includes a plurality of voice segments, each voice segment corresponds to a tag for representing a user identity, obtains an identity tag of the user, filters the voice set according to the tag to obtain at least one voice segment corresponding to the tag, and uses the at least one voice segment as voice information of the user, so that the voice information corresponding to the user may be selected.
102. And converting the voice information into text information, and displaying the text information.
In specific implementation, the electronic device can perform voice recognition on the voice information to obtain text information, and can also display the text information on the display screen. For example, a dialog box may be presented, in which attribute information of the user may be presented, and the attribute information may be at least one of the following: name, gender, relationship, contact, unit name, etc., without limitation. The number of users may be one or more.
Further, in a specific implementation, if a session is to be completed, data in the database and data corresponding to the session state may be inserted into the table. When inserting data into a table, there are variables that need to be customized, such as user name, work unit, etc.
In the specific implementation, taking an AI robot as an example, an AI robot interface can be called to start a conversation, the relevant information returned by the AI robot is displayed on a page, and a user can input a response to submit to form a conversation window.
Further, in a specific implementation, during service acceptance, text can be input according to a scene to be tested, for example, when a test user is required to answer that the test user is a client, text of an affirmative type can be input. When a person is not tested, a negative text is input, and the response and the strategy returned by the AI can be seen. The strategy of the AI robot is very intuitive to see if the client makes a certain response. What policy needs to be tested, the corresponding text is entered. In this way, visual dialog can be achieved, and the user can make the machine intention more clear and whether the own voice is recognized correctly.
Optionally, in the step 102, converting the voice information into text information may include the following steps:
21. intercepting first voice information containing the voice of the user from the voice information;
22. filtering the first voice information to obtain second voice information;
23. and inputting the second voice information into a preset semantic segmentation model to obtain the text information.
The preset semantic segmentation model can be trained in advance and is used for converting voice into text information. In the specific implementation, the electronic device can intercept the voice information, mainly for intercepting the part of the user speaking, the other parts do not include information expressed by the user, filtering can be performed to obtain first voice information, the starting time and the ending time of the first voice information both correspond to the voice of the user speaking, further, filtering processing is performed on the first voice information to obtain second voice information, environmental sounds such as wind sounds, rain sounds and the speaking sounds of other people can be filtered, and finally, the second voice information is input into a preset semantic segmentation model to obtain text information, so that the text information can be accurately obtained.
Further optionally, in the step 22, performing filtering processing on the first voice information to obtain the second voice information, may include the following steps:
221. acquiring voice characteristics of a user;
222. and filtering the first voice information to obtain the second voice information corresponding to the voice characteristics in the first voice information.
Wherein the voice features may include at least one of: timbre, pitch, audio (frequency), etc., without limitation. In concrete realization, electronic equipment can acquire user's speech characteristics, and according to speech characteristics, carries out filtering process to first speech information, obtains second speech information, and this second speech information then can mainly include the sound that the user sent, is favorable to accurate filtering noise, and keeps user's own sound.
103. And determining the operation intention of the user expressed in the voice information according to the text information.
In the embodiment of the present application, the operation intention may be understood as an idea of the user, for example, to start a certain function, to open a certain webpage, to purchase a shopping, to take a picture, and the like.
Optionally, in step 103, determining the operation intention of the user expressed in the speech information according to the text information includes the following steps:
31. performing feature extraction on the text information to obtain a feature set;
32. inputting the feature set into a semantic recognition model to obtain a plurality of intention information, wherein each intention information has a probability value representing the intention;
33. acquiring the emotion type of the user;
34. determining an adjusting coefficient corresponding to each intention in the plurality of intentions according to the emotion type and the plurality of intentions to obtain a plurality of adjusting coefficients;
35. adjusting the probability values of the intentions according to the adjustment coefficients to obtain a plurality of probability values;
36. and selecting the maximum value of the probability values, and taking the corresponding intention as the operation intention.
In the embodiment of the present application, the semantic recognition model may be a pre-trained model, and the feature set may be at least one of the following: and (4) keywords. The electronic equipment can extract features of text information to obtain a feature set, the feature set is input into a semantic recognition model to obtain a plurality of intentions, each intention corresponds to a probability value representing the intention, the emotion of a user is obtained, then an adjusting coefficient corresponding to each intention in the plurality of intentions can be determined according to the emotion and the plurality of intentions to obtain a plurality of adjusting coefficients, in the concrete implementation, adjustment of different intentions under different emotions can be set, the adjusting coefficient corresponding to each intention can be determined according to a table look-up mode, the probability values of the plurality of intentions are adjusted according to the plurality of adjusting coefficients to obtain a plurality of probability values, the maximum value of the plurality of probability values is selected, and the intention corresponding to the maximum value is used as an operation intention.
Further optionally, the step 33 of obtaining the emotion type of the user may include the following steps:
331. acquiring a oscillogram corresponding to brain wave signals of a user, wherein the horizontal axis of the oscillogram is time, and the vertical axis of the oscillogram is amplitude;
332. sampling the oscillogram to obtain a plurality of sampling points;
333. determining an average amplitude value and a first mean square error corresponding to the plurality of sampling points;
334. determining a first emotion value corresponding to the average amplitude according to a mapping relation between a preset amplitude and an emotion value;
335. determining a first adjusting coefficient corresponding to the first mean square error according to a mapping relation between a preset mean square error and an adjusting coefficient;
336. adjusting the first emotion value according to the first adjusting coefficient to obtain a second emotion value;
337. determining the emotion type corresponding to the second emotion value according to a mapping relation between a preset emotion value and the emotion type;
in the embodiment of the application, the preset emotion type may be set by a user or default by a system, and the preset emotion type may be at least one of the following types: happy, tense, depressed, oppressed, angry, etc., and is not limited herein. The electronic device may pre-store a mapping relationship between a preset amplitude and an emotion value, a mapping relationship between a preset mean square error and an adjustment coefficient, and a mapping relationship between a preset emotion value and an emotion type.
In the specific implementation, the electronic device may obtain a oscillogram corresponding to brain wave signals of a user, where a horizontal axis of the oscillogram is time and a vertical axis of the oscillogram is amplitude, sample the oscillogram to obtain a plurality of sampling points, and may adopt uniform sampling, so that an average amplitude and a first mean square error corresponding to the plurality of sampling points may be determined, a first emotion value corresponding to the average amplitude is determined according to a preset mapping relationship between the amplitude and the emotion value, a first adjustment coefficient corresponding to the first mean square error is determined according to a preset mapping relationship between the mean square error and the adjustment coefficient, the first emotion value is adjusted according to the first adjustment coefficient to obtain a second emotion value, where a value range of the adjustment coefficient is-0.1, and then
Second mood value (1+ first adjustment factor) first mood value
Then, the emotion type corresponding to the second emotion value can be determined according to the mapping relation between the preset emotion value and the emotion type, and thus accurate emotion recognition can be achieved through brain waves.
104. And executing corresponding target operation according to the operation intention.
In a specific implementation, different intentions represent different ideas of the user, the ideas can correspond to different operation instructions, and further, the operation corresponding to the intentions can be executed.
Optionally, the step 104, executing the corresponding target operation according to the operation intention, may include the following steps:
41. searching a target operation instruction having a mapping relation with the operation intention in a preset database, wherein the mapping relation between the operation intention and the operation instruction is stored in the database;
42. and executing corresponding target operation according to the target operation instruction.
The electronic device may store a mapping relationship between the operation intention and the operation instruction in the database, and further determine the operation instruction corresponding to the operation intention according to the mapping relationship, and execute a corresponding operation according to the operation instruction.
Optionally, before the step 101, the following steps may be included:
s1, acquiring a face image;
s2, determining an image quality evaluation value of the face image;
s3, when the image quality evaluation value is larger than a preset threshold value, matching the face image with a preset face template;
s4, when the face image is successfully matched with the preset face template, executing the step of acquiring the voice information replied by the user aiming at the question information;
s5, when the image quality evaluation value is smaller than or equal to the preset threshold value, determining an image enhancement parameter corresponding to the image quality evaluation value;
s6, carrying out image enhancement processing on the face image according to the image enhancement parameters to obtain a first face image;
s7, matching the first face image with the preset face template;
and S8, when the first face image is successfully matched with the preset face template, executing the step of acquiring the voice information responded by the user aiming at the question information.
The preset face template and the preset threshold value can be stored in the electronic device in advance. The preset threshold may be set by the user or by default. In specific implementation, the electronic device may acquire a face image, and may perform image quality evaluation on the face image by using at least one image quality evaluation index to obtain an image quality evaluation value, where the image quality evaluation index may include at least one of the following: signal-to-noise ratio, entropy, sharpness, edge preservation, mean square error, mean gradient, etc., and is not limited thereto. Further, the electronic device may match the face image with a preset face template when the image quality evaluation value is greater than a preset threshold, and execute step 101 when the face image is successfully matched with the preset face template.
Further, the electronic device may determine an image enhancement parameter corresponding to the image quality evaluation value when the image quality evaluation value is less than or equal to a preset threshold, in this embodiment, the image enhancement parameter may be an image enhancement algorithm and a corresponding image enhancement control parameter, and the image enhancement algorithm may be at least one of: gray scale stretching, wavelet transformation, histogram equalization, Retinex algorithm, etc., which are not limited herein, the image enhancement control parameter is a parameter for controlling the amplitude or effect of image enhancement, and different image enhancement algorithms may correspond to different image enhancement control parameters. The electronic device may further pre-store a mapping relationship between a preset image quality evaluation value and an image enhancement parameter, and determine the image enhancement parameter corresponding to the image quality evaluation value according to the mapping relationship. Furthermore, the electronic device can perform image enhancement processing on the face image according to the image enhancement parameters to obtain a first face image, the electronic device can match the first face image with a preset face template, and execute the step 101 when the first face image is successfully matched with the preset face template, otherwise, the electronic device can prompt the user to continue inputting the face image, so that the face recognition efficiency can be improved, and the safety of the electronic device is also promoted.
Further, the step S2 of determining the image quality evaluation value of the face image may include the following steps:
s21, extracting low-frequency components and high-frequency components of the face image;
s22, dividing the low-frequency component into a plurality of areas;
s23, determining the signal-to-noise ratio corresponding to each of the plurality of regions to obtain a plurality of signal-to-noise ratios;
s24, determining an average signal-to-noise ratio and a mean square error according to the signal-to-noise ratios;
s25, determining an adjusting coefficient corresponding to the mean square error;
s26, adjusting the average signal-to-noise ratio according to the adjusting coefficient to obtain a first signal-to-noise ratio;
s27, determining a first evaluation value corresponding to the first signal-to-noise ratio according to a mapping relation between a preset signal-to-noise ratio and the evaluation value;
s28, determining the energy ratio corresponding to the energy value of the low-frequency component and the energy value of the face image;
s29, determining a low-frequency weight corresponding to the energy ratio according to a preset mapping relation between the low-frequency energy ratio and the low-frequency weight, and determining a high-frequency weight according to the low-frequency weight;
s30, determining the distribution density of the first characteristic points according to the high-frequency components;
s31, determining a second evaluation value corresponding to the first feature point distribution density according to a preset mapping relation between the feature point distribution density and the evaluation value;
and S32, performing weighting operation according to the first evaluation value, the second evaluation value, the low-frequency weight and the high-frequency weight to obtain the image quality evaluation value of the face image.
In specific implementation, the electronic device may perform multi-scale feature decomposition on the face image by using a multi-scale decomposition algorithm to obtain a low-frequency component and a high-frequency component, where the multi-scale decomposition algorithm may be at least one of the following algorithms: the pyramid transform algorithm, wavelet transform, contourlet transform, non-down-sampling contourlet transform, ridgelet transform, shear wave transform, etc., are not limited herein. Further, the electronic device may divide the low frequency component into a plurality of regions, each region having the same or different area size. The low frequency component reflects the main features of the image, and the high frequency component reflects the detail information of the image.
Furthermore, the electronic device may determine a signal-to-noise ratio corresponding to each of the plurality of regions to obtain a plurality of signal-to-noise ratios, and determine an average signal-to-noise ratio and a mean square error according to the plurality of signal-to-noise ratios, where the signal-to-noise ratio reflects the amount of the image information to a certain extent, and the mean square error may reflect the stability of the image information. The electronic device may pre-store a mapping relationship between a preset mean square error and an adjustment coefficient, and further determine an adjustment coefficient corresponding to the mean square error according to the mapping relationship, in this embodiment, a value range of the adjustment coefficient may be-0.15 to 0.15.
Further, the electronic device may adjust the average snr according to the adjustment coefficient, to obtain a first snr, where the first snr is (1+ adjustment coefficient) of the average snr. The electronic device may pre-store a mapping relationship between a preset signal-to-noise ratio and an evaluation value, and further, may determine a first evaluation value corresponding to the first signal-to-noise ratio according to the mapping relationship between the preset signal-to-noise ratio and the evaluation value.
In addition, the electronic device may store a preset mapping relationship between a low-frequency energy ratio and a low-frequency weight in advance, where the low-frequency energy ratio is an energy ratio between a low-frequency component of the original image and the original image, determine an energy ratio between an energy value of the low-frequency component and an energy value of the face image, determine a low-frequency weight corresponding to the energy ratio according to the preset mapping relationship between the low-frequency energy ratio and the low-frequency weight, and determine a high-frequency weight according to the low-frequency weight, where the low-frequency weight + the high-frequency weight is 1.
Further, the electronic device may determine a first feature point distribution density from the high-frequency component, where the first feature point distribution density is the total number of feature points/area of the high-frequency component. The electronic device may further pre-store a mapping relationship between a preset feature point distribution density and an evaluation value, further determine a second evaluation value corresponding to the first feature point distribution density according to the mapping relationship between the preset feature point distribution density and the evaluation value, and finally perform a weighting operation according to the first evaluation value, the second evaluation value, the low-frequency weight, and the high-frequency weight to obtain an image quality evaluation value of the face image, which is specifically as follows:
image quality evaluation value is first evaluation value, low-frequency weight, second evaluation value, and high-frequency weight
In this way, the image quality evaluation can be performed based on two dimensions of the low-frequency component and the high-frequency component of the face image, and the image quality evaluation value of the image, that is, the image quality evaluation value can be accurately obtained.
It can be seen that, in the data processing method described in the embodiment of the present application, the voice information responded by the user to the question information is obtained, the voice information is converted into the text information, the text information is displayed, the operation intention of the user expressed in the voice information is determined according to the text information, and the corresponding target operation is executed according to the operation intention.
Referring to fig. 2, fig. 2 is a schematic flowchart of a data processing method provided in an embodiment of the present application, and is applied to an electronic device, where as shown in the figure, the data processing method includes:
201. the method comprises the steps of obtaining a voice set of a preset time period, wherein the voice set comprises a plurality of voice segments, and each voice segment corresponds to a label used for representing the identity of a user.
202. And acquiring the identity label of the user.
203. And screening at least one voice segment corresponding to the identity tag in the voice set according to the identity tag, and taking the screened voice segment as the voice information of the user.
204. And converting the voice information into text information, and displaying the text information.
205. And determining the operation intention of the user expressed in the voice information according to the text information.
206. And executing corresponding target operation according to the operation intention.
The detailed description of the steps 201 to 206 may refer to the corresponding steps described in the above fig. 1, and is not repeated herein.
It can be seen that, in the data processing method described in the embodiment of the present application, the voice information responded by the user to the question information is obtained, the voice information is converted into the text information, the text information is displayed, the operation intention of the user expressed in the voice information is determined according to the text information, and the corresponding target operation is executed according to the operation intention.
In accordance with the foregoing embodiments, please refer to fig. 3, fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and as shown in the drawing, the electronic device includes a processor, a memory, a communication interface, and one or more programs, the one or more programs are stored in the memory and configured to be executed by the processor, and in an embodiment of the present application, the programs include instructions for performing the following steps:
acquiring voice information responded by a user aiming at the question information;
converting the voice information into text information, and displaying the text information;
determining the operation intention of the user expressed in the voice information according to the text information;
and executing corresponding target operation according to the operation intention.
Therefore, the electronic device described in the embodiment of the application obtains the voice information responded by the user aiming at the question information, converts the voice information into the text information, displays the text information, determines the operation intention of the user expressed in the voice information according to the text information, and executes the corresponding target operation according to the operation intention.
Optionally, in terms of the converting the voice information into text information, the program includes instructions for performing the following steps:
intercepting first voice information containing the voice of the user from the voice information;
filtering the first voice information to obtain second voice information;
and inputting the second voice information into a preset semantic segmentation model to obtain the text information.
Optionally, in the aspect of filtering the first speech information to obtain the second speech information, the program includes instructions for performing the following steps:
acquiring voice characteristics of the user;
and filtering the first voice information to obtain the second voice information corresponding to the voice characteristics in the first voice information.
Optionally, in terms of the performing of the corresponding target operation according to the operation intention, the program includes instructions for performing the following steps:
searching a target operation instruction having a mapping relation with the operation intention in a preset database, wherein the mapping relation between the operation intention and the operation instruction is stored in the database;
and executing corresponding target operation according to the target operation instruction.
Optionally, in terms of obtaining the voice information to which the user replies to the question information, the program includes instructions for performing the following steps:
acquiring a voice set of a preset time period, wherein the voice set comprises a plurality of voice segments, and each voice segment corresponds to a label for representing the identity of a user;
acquiring an identity label of the user;
and screening at least one voice segment corresponding to the identity tag in the voice set according to the identity tag, and taking the screened voice segment as the voice information of the user.
Optionally, in the aspect of determining the operation intention of the user expressed in the voice information according to the text information, the program includes instructions for:
performing feature extraction on the text information to obtain a feature set;
inputting the feature set into a semantic recognition model to obtain a plurality of intention information, wherein each intention information has a probability value representing the intention;
acquiring the emotion type of the user;
determining an adjusting coefficient corresponding to each intention in the plurality of intentions according to the emotion type and the plurality of intentions to obtain a plurality of adjusting coefficients;
adjusting the probability values of the intentions according to the adjustment coefficients to obtain a plurality of probability values;
and selecting the maximum value of the probability values, and taking the corresponding intention as the operation intention of the user.
Optionally, in the obtaining of the emotion type of the user, the program includes instructions for:
acquiring a oscillogram corresponding to the brain wave signal of the user, wherein the horizontal axis of the oscillogram is time, and the vertical axis of the oscillogram is amplitude;
sampling the oscillogram to obtain a plurality of sampling points;
determining an average amplitude value and a first mean square error corresponding to the plurality of sampling points;
determining a first emotion value corresponding to the average amplitude according to a mapping relation between a preset amplitude and an emotion value;
determining a first adjusting coefficient corresponding to the first mean square error according to a mapping relation between a preset mean square error and an adjusting coefficient;
adjusting the first emotion value according to the first adjusting coefficient to obtain a second emotion value;
and determining the emotion type corresponding to the second emotion value according to a preset mapping relation between the emotion value and the emotion type.
The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the electronic device comprises corresponding hardware structures and/or software modules for performing the respective functions in order to realize the above-mentioned functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present application, the electronic device may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
Fig. 4 is a block diagram showing functional units of a data processing apparatus 400 according to an embodiment of the present application. The data processing apparatus 400, said apparatus comprising: an acquisition unit 401, a translation unit 402, a determination unit 403, and an execution unit 404, wherein,
the acquiring unit 401 is configured to acquire voice information in which a user replies to the question information;
the converting unit 402 is configured to convert the voice information into text information and display the text information;
the determining unit 403 is configured to determine, according to the text information, an operation intention of the user expressed in the voice information;
the execution unit 404 is configured to execute a corresponding target operation according to the operation intention.
It can be seen that, the data processing apparatus described in the embodiment of the present application obtains the voice information to which the user responds to the question information, converts the voice information into the text information, displays the text information, determines the operation intention of the user expressed in the voice information according to the text information, and executes the corresponding target operation according to the operation intention, so that the user intention can be recognized through the voice of the user, and the operation corresponding to the intention is executed, which is beneficial to improving the human-computer interaction experience.
Optionally, in terms of converting the voice information into text information, the converting unit 402 is specifically configured to:
intercepting first voice information containing the voice of the user from the voice information;
filtering the first voice information to obtain second voice information;
and inputting the second voice information into a preset semantic segmentation model to obtain the text information.
The preset semantic segmentation model can be trained in advance and is used for converting voice into text information.
Optionally, in terms of performing filtering processing on the first voice information to obtain second voice information, the converting unit 402 is specifically configured to:
acquiring voice characteristics of the user;
and filtering the first voice information to obtain the second voice information corresponding to the voice characteristics in the first voice information.
Optionally, in terms of performing the corresponding target operation according to the operation intention, the execution unit 404 is specifically configured to:
searching a target operation instruction having a mapping relation with the operation intention in a preset database, wherein the mapping relation between the operation intention and the operation instruction is stored in the database;
and executing corresponding target operation according to the target operation instruction.
Optionally, in respect of acquiring the voice information in which the user replies to the question information, the acquiring unit 401 is specifically configured to:
acquiring a voice set of a preset time period, wherein the voice set comprises a plurality of voice segments, and each voice segment corresponds to a label;
acquiring a voice set of a preset time period, wherein the voice set comprises a plurality of voice segments, and each voice segment corresponds to a label for representing the identity of a user;
acquiring an identity label of the user;
and screening at least one voice segment corresponding to the identity tag in the voice set according to the identity tag, and taking the screened voice segment as the voice information of the user.
Optionally, in terms of the determining, according to the text information, the operation intention of the user expressed in the voice information, the determining unit 403 is specifically configured to:
performing feature extraction on the text information to obtain a feature set;
inputting the feature set into a semantic recognition model to obtain a plurality of intention information, wherein each intention information has a probability value representing the intention;
acquiring the emotion type of the user;
determining an adjusting coefficient corresponding to each intention in the plurality of intentions according to the emotion type and the plurality of intentions to obtain a plurality of adjusting coefficients;
adjusting the probability values of the intentions according to the adjustment coefficients to obtain a plurality of probability values;
and selecting the maximum value of the probability values, and taking the corresponding intention as the operation intention of the user.
Optionally, in the aspect of obtaining the emotion type of the user, the determining unit 403 is specifically configured to:
acquiring a oscillogram corresponding to the brain wave signal of the user, wherein the horizontal axis of the oscillogram is time, and the vertical axis of the oscillogram is amplitude;
sampling the oscillogram to obtain a plurality of sampling points;
determining an average amplitude value and a first mean square error corresponding to the plurality of sampling points;
determining a first emotion value corresponding to the average amplitude according to a mapping relation between a preset amplitude and an emotion value;
determining a first adjusting coefficient corresponding to the first mean square error according to a mapping relation between a preset mean square error and an adjusting coefficient;
adjusting the first emotion value according to the first adjusting coefficient to obtain a second emotion value;
and determining the emotion type corresponding to the second emotion value according to a preset mapping relation between the emotion value and the emotion type.
It is to be understood that the functions of each program module of the data processing apparatus in this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the relevant description of the foregoing method embodiment, which is not described herein again.
Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enabling a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes an electronic device.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (10)
1. A method of data processing, the method comprising:
acquiring voice information responded by a user aiming at the question information;
converting the voice information into text information, and displaying the text information;
determining the operation intention of the user expressed in the voice information according to the text information;
and executing corresponding target operation according to the operation intention.
2. The method of claim 1, wherein converting the voice information into text information comprises:
intercepting first voice information containing the voice of the user from the voice information;
filtering the first voice information to obtain second voice information;
and inputting the second voice information into a preset semantic segmentation model to obtain the text information.
3. The method of claim 2, wherein the filtering the first speech information to obtain second speech information comprises:
acquiring voice characteristics of the user;
and filtering the first voice information to obtain the second voice information corresponding to the voice characteristics in the first voice information.
4. The method according to claim 1 or 2, wherein the performing of the respective target operation according to the operation intention comprises:
searching a target operation instruction having a mapping relation with the operation intention in a preset database, wherein the mapping relation between the operation intention and the operation instruction is stored in the database;
and executing corresponding target operation according to the target operation instruction.
5. The method according to claim 1 or 2, wherein the obtaining of the voice information to which the user replies about the question information comprises:
acquiring a voice set of a preset time period, wherein the voice set comprises a plurality of voice segments, and each voice segment corresponds to a label for representing the identity of a user;
acquiring an identity label of the user;
and screening at least one voice segment corresponding to the identity tag in the voice set according to the identity tag, and taking the screened voice segment as the voice information of the user.
6. The method according to claim 1 or 2, wherein the determining the operation intention of the user expressed in the voice message according to the text message comprises:
performing feature extraction on the text information to obtain a feature set;
inputting the feature set into a semantic recognition model to obtain a plurality of intention information, wherein each intention information has a probability value representing the intention;
acquiring the emotion type of the user;
determining an adjusting coefficient corresponding to each intention in the plurality of intentions according to the emotion type and the plurality of intentions to obtain a plurality of adjusting coefficients;
adjusting the probability values of the intentions according to the adjustment coefficients to obtain a plurality of probability values;
and selecting the maximum value of the probability values, and taking the corresponding intention as the operation intention of the user.
7. The method of claim 6, wherein the obtaining the emotion type of the user comprises:
acquiring a oscillogram corresponding to the brain wave signal of the user, wherein the horizontal axis of the oscillogram is time, and the vertical axis of the oscillogram is amplitude;
sampling the oscillogram to obtain a plurality of sampling points;
determining an average amplitude value and a first mean square error corresponding to the plurality of sampling points;
determining a first emotion value corresponding to the average amplitude according to a mapping relation between a preset amplitude and an emotion value;
determining a first adjusting coefficient corresponding to the first mean square error according to a mapping relation between a preset mean square error and an adjusting coefficient;
adjusting the first emotion value according to the first adjusting coefficient to obtain a second emotion value;
and determining the emotion type corresponding to the second emotion value according to a preset mapping relation between the emotion value and the emotion type.
8. A data processing apparatus, characterized in that the apparatus comprises: an acquisition unit, a conversion unit, a determination unit and an execution unit, wherein,
the acquisition unit is used for acquiring voice information responded by a user aiming at the question information;
the conversion unit is used for converting the voice information into text information and displaying the text information;
the determining unit is used for determining the operation intention of the user expressed in the voice information according to the text information;
and the execution unit is used for executing corresponding target operation according to the operation intention.
9. An electronic device comprising a processor, a memory for storing one or more programs and configured for execution by the processor, the programs comprising instructions for performing the steps of the method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011228484.7A CN112420049A (en) | 2020-11-06 | 2020-11-06 | Data processing method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011228484.7A CN112420049A (en) | 2020-11-06 | 2020-11-06 | Data processing method, device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112420049A true CN112420049A (en) | 2021-02-26 |
Family
ID=74827945
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011228484.7A Pending CN112420049A (en) | 2020-11-06 | 2020-11-06 | Data processing method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112420049A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113157966A (en) * | 2021-03-15 | 2021-07-23 | 维沃移动通信有限公司 | Display method and device and electronic equipment |
CN113297365A (en) * | 2021-06-22 | 2021-08-24 | 中国平安财产保险股份有限公司 | User intention determination method, device, equipment and storage medium |
CN113569918A (en) * | 2021-07-05 | 2021-10-29 | 北京淇瑀信息科技有限公司 | Classification temperature adjusting method, classification temperature adjusting device, electronic equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961780A (en) * | 2017-12-22 | 2019-07-02 | 深圳市优必选科技有限公司 | Man-machine interaction method, device, server and storage medium |
CN110085262A (en) * | 2018-01-26 | 2019-08-02 | 上海智臻智能网络科技股份有限公司 | Voice mood exchange method, computer equipment and computer readable storage medium |
CN111724789A (en) * | 2019-03-19 | 2020-09-29 | 华为终端有限公司 | Voice interaction method and terminal equipment |
-
2020
- 2020-11-06 CN CN202011228484.7A patent/CN112420049A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961780A (en) * | 2017-12-22 | 2019-07-02 | 深圳市优必选科技有限公司 | Man-machine interaction method, device, server and storage medium |
CN110085262A (en) * | 2018-01-26 | 2019-08-02 | 上海智臻智能网络科技股份有限公司 | Voice mood exchange method, computer equipment and computer readable storage medium |
CN111724789A (en) * | 2019-03-19 | 2020-09-29 | 华为终端有限公司 | Voice interaction method and terminal equipment |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113157966A (en) * | 2021-03-15 | 2021-07-23 | 维沃移动通信有限公司 | Display method and device and electronic equipment |
CN113157966B (en) * | 2021-03-15 | 2023-10-31 | 维沃移动通信有限公司 | Display method and device and electronic equipment |
CN113297365A (en) * | 2021-06-22 | 2021-08-24 | 中国平安财产保险股份有限公司 | User intention determination method, device, equipment and storage medium |
CN113297365B (en) * | 2021-06-22 | 2023-09-26 | 中国平安财产保险股份有限公司 | User intention judging method, device, equipment and storage medium |
CN113569918A (en) * | 2021-07-05 | 2021-10-29 | 北京淇瑀信息科技有限公司 | Classification temperature adjusting method, classification temperature adjusting device, electronic equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112420049A (en) | Data processing method, device and storage medium | |
CN110970018B (en) | Speech recognition method and device | |
CN110600059B (en) | Acoustic event detection method and device, electronic equipment and storage medium | |
CN109993150B (en) | Method and device for identifying age | |
CN110769111A (en) | Noise reduction method, system, storage medium and terminal | |
CN105489221A (en) | Voice recognition method and device | |
CN103106061A (en) | Voice input method and device | |
CN108962231A (en) | A kind of method of speech classification, device, server and storage medium | |
CN113033245A (en) | Function adjusting method and device, storage medium and electronic equipment | |
CN115798459B (en) | Audio processing method and device, storage medium and electronic equipment | |
CN110136726A (en) | A kind of estimation method, device, system and the storage medium of voice gender | |
CN108766416B (en) | Speech recognition method and related product | |
CN117593473B (en) | Method, apparatus and storage medium for generating motion image and video | |
CN113035176B (en) | Voice data processing method and device, computer equipment and storage medium | |
CN114627889A (en) | Multi-sound-source sound signal processing method and device, storage medium and electronic equipment | |
CN112347788A (en) | Corpus processing method, apparatus and storage medium | |
CN113868472A (en) | Method for generating digital human video and related equipment | |
CN113709291A (en) | Audio processing method and device, electronic equipment and readable storage medium | |
CN110910898A (en) | Voice information processing method and device | |
CN108962226A (en) | Method and apparatus for detecting the endpoint of voice | |
CN112740219A (en) | Method and device for generating gesture recognition model, storage medium and electronic equipment | |
CN110414295A (en) | Identify method, apparatus, cooking equipment and the computer storage medium of rice | |
CN115019788A (en) | Voice interaction method, system, terminal equipment and storage medium | |
CN114049875A (en) | TTS (text to speech) broadcasting method, device, equipment and storage medium | |
CN116486789A (en) | Speech recognition model generation method, speech recognition method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210226 |