CN113255323B - Description data processing method, system, electronic device and medium - Google Patents
Description data processing method, system, electronic device and medium Download PDFInfo
- Publication number
- CN113255323B CN113255323B CN202110663888.7A CN202110663888A CN113255323B CN 113255323 B CN113255323 B CN 113255323B CN 202110663888 A CN202110663888 A CN 202110663888A CN 113255323 B CN113255323 B CN 113255323B
- Authority
- CN
- China
- Prior art keywords
- description
- standard
- target
- keywords
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a description data processing method, a system, electronic equipment and a medium, the method generates a description text according to the obtained description data, carries out sentence segmentation processing to obtain a plurality of initial description sentences, obtains subject personal keywords in each initial description sentence, determines target subject personal keywords, carries out word segmentation processing on target description sentences to obtain a plurality of target description words, determines a plurality of event keywords from the target description words, replaces the event keywords in the target description sentences with standard keywords if standard keywords with the similarity higher than a preset similarity threshold exist in a preset standard word bank, generates a plurality of simplified description sentences according to the target description sentences comprising the standard keywords, and stores the simplified description sentences and the description data corresponding to the simplified description sentences in a correlation manner, the processing of the description data is completed, the arranging efficiency of the voice data can be improved, the time is saved, and the customer experience degree is improved.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a description data processing method, system, electronic device, and medium.
Background
Due to the fact that each person has different understanding degree, professional degree and description mode of the event to be described, different persons may have various descriptions of the same event, and some persons have too slow speech speed and insufficiently clear word pronouncing, so that the understanding difficulty of listeners is further improved, the receiving and processing speed of the description information is slow, and the transaction processing speed is slow.
In the related technology, description voice is often simply converted into description text and displayed to related workers to improve the transaction processing speed of the related workers, but if the description text is long document, the related workers still need to read the description text in a time-consuming manner, and the description text is time-consuming and tidy, so that the working efficiency is not high, and the customer experience degree is not good.
Disclosure of Invention
In view of the above disadvantages of the prior art, the present invention provides a description data processing method, system, electronic device and medium, so as to solve the technical problems that in the related art, when a description voice is simply converted into a description text, the related staff still needs to read the description text in a time-consuming manner, and to arrange the description text in a time-consuming manner, the work efficiency is not high, and the user experience is not good.
The invention provides a description data processing method, which comprises the following steps:
generating a description text according to the obtained description data, and performing sentence splitting processing to obtain a plurality of initial description sentences;
obtaining subject personal title keywords in each initial description sentence, and determining target subject personal title keywords;
performing word segmentation processing on a target description sentence to obtain a plurality of target description words, and determining a plurality of event keywords from the target description words, wherein the target description sentence is the initial description sentence comprising the target subject person name keywords;
if a standard keyword with the similarity higher than a preset similarity threshold with the event keyword exists in a preset standard word bank, replacing the event keyword in the target description sentence with the standard keyword;
and generating a plurality of simplified description sentences according to the target description sentences including the standard keywords, and storing the simplified description sentences and the description data corresponding to the simplified description sentences in a correlation manner to complete the processing of the description data.
Optionally, the generating a plurality of simplified description sentences according to the target description sentences including the standard keywords includes:
obtaining event descriptors in each target description sentence with the same standard keyword, wherein the event descriptors comprise the target descriptors except the event keyword;
carrying out similarity comparison on the event descriptors and standard descriptors in a preset standard lexicon;
if the similarity between the event descriptor and the standard descriptor is higher than a second preset similarity threshold, replacing the event descriptor with the standard descriptor;
and generating one simplified description sentence according to the standard keywords and the standard description words, wherein each simplified description sentence comprises one standard keyword.
Optionally, the determination method of the target subject person name keyword includes any one of the following:
comparing the similarity of the subject personal title key words with preset personal title key words, and if the similarity between the subject personal title key words and the preset personal title key words is higher than a first preset similarity threshold, taking the subject personal title key words as target subject personal title key words;
performing synonymy replacement processing on clear keywords in the subject personal title keywords to obtain a plurality of standard subject personal title keywords, performing disambiguation processing on fuzzy keywords in the subject personal title keywords, replacing the fuzzy keywords with the standard subject personal title keywords, determining the word proportion of each standard subject personal title keyword, and taking the standard subject personal title keyword with the highest word proportion as a target subject personal title keyword.
Optionally, the generation manner of the description data includes:
acquiring audio data, wherein the audio data comprises speech information and user emotion identification information, and the user emotion identification information comprises at least one of tone, pause, volume and speed;
and determining the user emotion according to the user emotion information, and generating the description data according to the user emotion and the speech information.
Optionally, the generation manner of the description data includes:
acquiring video data, wherein the video data comprises speech information and user emotion identification information, and the user emotion identification information comprises at least one of tone, pause, volume, speed of speech, facial expression and a presentation gesture;
and determining the user emotion according to the user emotion information, and generating the description data according to the user emotion and the speech information.
Optionally, the method further includes:
acquiring word sense attitude of each target descriptor in the simplified description sentence, wherein the word sense attitude comprises positive, negative and neutral;
determining sentence emotion information of the simplified description sentence according to the proportion of the target description words of the semantic attitude of each word in the simplified description sentence;
and storing the statement emotion information, the simplified description statement and description data corresponding to the simplified description statement in an associated manner.
Optionally, the method further includes:
displaying each simplified descriptive statement;
and selecting the simplified description statement, and displaying statement emotion information of the simplified description statement, the corresponding description data and the user emotion of the description data.
The present invention also provides a description data processing system, comprising:
the sentence dividing module is used for generating a description text according to the obtained description data and carrying out sentence dividing processing to obtain a plurality of initial description sentences;
the determining module is used for acquiring the subject personal title keywords in each initial description sentence and determining the target subject personal title keywords;
the word segmentation module is used for performing word segmentation processing on a target description sentence to obtain a plurality of target description words, and determining a plurality of event keywords from the target description words, wherein the target description sentence is the initial description sentence comprising the target subject language keyword;
the replacing module is used for replacing the event keyword in the target description sentence with the standard keyword if the standard keyword with the similarity higher than a preset similarity threshold exists in a preset standard word bank;
and the generating module is used for generating a plurality of simplified description sentences according to the target description sentences including the standard keywords, and storing the simplified description sentences and the description data corresponding to the simplified description sentences in an associated manner to complete the processing of the description data.
The invention also provides an electronic device, which comprises a processor, a memory and a communication bus;
the communication bus is used for connecting the processor and the memory;
the processor is configured to execute the computer program stored in the memory to implement the method according to any of the embodiments described above.
The invention also provides a computer-readable storage medium having stored thereon a computer program for causing a computer to perform the method according to any one of the embodiments described above.
The invention has the beneficial effects that: the method comprises the steps of generating a description text according to obtained description data, performing sentence segmentation processing to obtain a plurality of initial description sentences, obtaining subject personal keywords in each initial description sentence, determining target subject personal keywords, performing word segmentation processing on a target description sentence to obtain a plurality of target description words, determining a plurality of event keywords from the target description words, replacing the event keywords in the target description sentences with standard keywords if standard keywords with the similarity higher than a preset similarity threshold exist in a preset standard lexicon, generating a plurality of simplified description sentences according to the target description sentences comprising the standard keywords, and storing the simplified description sentences and the description data corresponding to the simplified description sentences in a correlation manner, the processing of the description data is completed, the arranging efficiency of the voice data can be improved, the time is saved, and the customer experience degree is improved.
Drawings
Fig. 1 is a schematic flow chart illustrating a data processing method according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a data processing system according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
In the following description, numerous details are set forth to provide a more thorough explanation of embodiments of the present invention, however, it will be apparent to one skilled in the art that embodiments of the present invention may be practiced without these specific details, and in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring embodiments of the present invention.
Example one
As shown in fig. 1, the present embodiment provides a description data processing method, including:
s101: and generating a description text according to the acquired description data, and performing sentence splitting processing to obtain a plurality of initial description sentences.
The description data may be generated in at least one of a plurality of manners, such as audio data, video data, image data, text data, sign language, and the like. For example, paper Character information is recognized by OCR (Optical Character Recognition), and description data is generated in conjunction with audio data.
The method of generating the description text according to the description data and performing sentence segmentation processing to obtain the initial description sentence can be implemented by using the existing related technical means, which is not limited herein.
In some embodiments, the generating of the description data comprises:
acquiring audio data, wherein the audio data comprises speech information and user emotion identification information, and the user emotion identification information comprises at least one of tone, pause, volume and speed;
and determining the emotion of the user according to the emotion information of the user, and generating description data according to the emotion and speech information of the user.
Verbal information is also the language description of the target user's answer to a certain question or an event. Because most people can reveal the current mood of the speaker or the real opinion of things through at least one of the tone, the strength (volume), the pause of the speech, the tone and the speed of the speech of the speaker naturally in the process of speaking, the information of the emotion classes is also part of the description information of the speaker actually, if the speech information is converted into the text description, the actual user emotion of the speaker is probably ignored, and further the real expression meaning of the speaker is mishaped. Therefore, the emotion of the user is determined through the emotion identification information of the user, the speech information is generated into the description data, and finally the simplified description sentence is generated, so that the speaker can know what the speaker says and what attitude the speaker specifically says. For example, one speaker says that the ten-foot-toe is high in strength and strong in strength, "i am all the year," the other speaker says that the breath is weak, "i am all the year," for the verbal information, the simplified descriptive sentences which are generated finally may be both "all the year pain," but it can be seen that the physical conditions of the two speakers are substantially different, the method is used in places such as hospitals, and the priority levels of the two speakers which need to see a doctor are obviously very different. In conclusion, by labeling the user emotion on the audio data, when a simplified description sentence is generated subsequently, the user emotion condition of the speaker can be correspondingly known, so that the "surface meaning" of the speaker is quickly known, and meanwhile, the "deep meaning" of the speaker can be more clearly understood, so that the description data processing is more accurate, and the use is more reliable.
In some embodiments, the method further comprises:
performing semantic recognition on speech information in the description data to obtain semantic content;
extracting the tone features in the description data, and determining the tone category corresponding to the tone features;
determining the emotion of the user according to the volume and the speed of speech in the emotion identification information of the user;
and judging whether semantic content, user emotion and tone categories in the same section of description data are coordinated, if not, performing exception marking on the end description data, and performing exception marking on a simplified description sentence corresponding to the section of description sentence.
The characteristic of tone can be determined by at least one of tone (rising tone, falling tone, question, etc.), a logogram (mu, mo, ni, o, post, true, difficult to do, possibility, etc.), or by a preset tone determination model.
An example of judging whether semantic content, user emotion, and mood category in the same piece of description data are coordinated is:
acquiring a plurality of preset normal user emotions and a plurality of preset normal tone categories corresponding to the semantic content, wherein if the emotion of each preset normal user is different from the user emotion and/or the category of each preset normal tone is different from the tone category, the semantic content, the user emotion and the tone category in the section of description data are not coordinated, and abnormal labeling is needed; otherwise, the semantic content, the user emotion and the tone category in the description data are coordinated.
Optionally, the emotion of the user and the semantic content may be matched first, if the semantic content is "i happy today", but the volume in the emotion identification information of the user is weak, the speech speed is slow, and the emotion of the user is determined as "missing", the emotion of the user is not matched with the semantic content, and the description data is abnormally labeled.
Optionally, if the emotion of the user is matched with the semantic content, determining whether the mood category of the segment of description data is matched with the semantic content. For example, the semantic content is "i happy today", but the volume in the user emotion identification information is high, the speech speed is fast, the user emotion is determined as "happy", the "very" tone in the pronunciation process is down, the "happy" tone is up, and the semantic is a reverse query tone, at this time, the tone category of the user emotion can be determined as a reverse query, and further, the current description data and the semantic content are determined to be substantially opposite, so that the tone category can be determined to be not matched with the semantic content, and the segment description data is abnormally labeled.
In some embodiments, the generating of the description data comprises:
acquiring video data, wherein the video data comprises speech information and user emotion identification information, and the user emotion identification information comprises at least one of tone, pause, volume, speed of speech, facial expression and presentation posture;
and determining the emotion of the user according to the emotion information of the user, and generating description data according to the emotion and speech information of the user.
The video data may include a facial expression of a speaker when speaking and a body motion (a lecture gesture) when speaking, and the facial expression and the lecture gesture may actually hide information to be expressed by the speaker, and the emotion of the user may be obtained to some extent by analyzing the facial expression, the lecture gesture, and the like. For example, when the speaker is speaking with a constant arm swing, the speech rate is fast and the face color is heavy, the emotion of the user of the speaker is excited and urgent.
In some embodiments, the determining of the user emotion further comprises:
acquiring micro-expression information of a target object;
determining the authenticity of the current lecture information according to the micro-expression information;
and taking the authenticity as the user emotion.
Wherein, the user emotion comprises true emotion and false emotion. How the micro-expression determines the authenticity of the currently taught information may be achieved through related technical means, and is not limited herein.
By carrying out true and false judgment on the words spoken by the target object (speaker) in advance, the reliability of the description information can be clarified fundamentally by related personnel using the simplified description sentences on the premise of not being personally on conversation with the speaker.
S102: and acquiring the subject personal title key words in each initial description sentence, and determining the target subject personal title key words.
The subject personal keyword may be a word in the initial descriptive sentence that indicates the identity of the subject in which the central subject is being discussed, for example: robert, my friend, my teacher, my mom, my neighbor, my, his, that person, my house dog, my pet, pitt, xiaoming, and the like. For another example, the initial descriptive sentence is "i have heard the disease even with the xiao ming, and he is headache for half a month", and the subject in the initial descriptive sentence is called the keyword "xiao ming".
The way of obtaining the subject personal title keyword can be performed by performing word segmentation processing on the initial description sentence, and then performing part-of-speech tagging, so that words with parts-of-speech being pronouns and the like are used as the subject personal title keyword. The manner of obtaining the subject person name keyword may also be other manners known to those skilled in the art, and is not limited herein.
In some embodiments, the target subject person keyword is determined in a manner that includes at least one of:
comparing the similarity of the subject personal title keyword with a preset personal title keyword, and if the similarity between the subject personal title keyword and the preset personal title keyword is higher than a first preset similarity threshold, taking the subject personal title keyword as a target subject personal title keyword;
performing synonymy replacement processing on clear keywords in the subject personal title keywords to obtain a plurality of standard subject personal title keywords, performing disambiguation processing on fuzzy keywords in the subject personal title keywords, replacing the fuzzy keywords with the standard subject personal title keywords, determining the word proportion of each standard subject personal title keyword, and taking the standard subject personal title keyword with the highest word proportion as the target subject personal title keyword.
For example, what needs to be collected at present is some circumstances and opinions of the narrator himself, but the previously acquired subject person is called a keyword as "i'm neighbor, i'm coworker, i'm father, i, keeping away from people and me", wherein there are circumstances of other people, by setting the preset person called the keyword as "i" at this moment, the numerical value of the first preset similarity threshold is promoted, and then "i'm, safe from people and me" can be screened out as a suitable object, and then "i'm, safe from people and me" is taken as the subject person called the keyword.
Alternatively, the clear keyword may be a word through which the identity of the object can be directly known, such as "i, my mom, my colleague", and the like. The fuzzy keyword may be a word that cannot directly understand the identity of the object through the keyword, such as "he, the person, the" and the like, and the fuzzy keyword is a reference to a certain clear keyword. For another example, "my mother's stomach is always painful, and she can be painful for several hours" in this sentence, "my mother" is a clear keyword, and "she" is a fuzzy keyword. The specific way of replacing the fuzzy keyword with the standard subject person-named keyword can be realized by adopting a related method of reference disambiguation, and can also be realized by adopting related technical means known by those skilled in the art.
Sometimes, people may use different nomenclatures for their or other names when telling things, e.g., "my, oneself, me, being" etc. for themselves, or "mother, my mother" etc. for mother. For such a situation, in order to improve the speed and accuracy of subsequent data processing, synonymy merging processing may be performed on the clear keywords, for example, "mother, mom, mother, my's" is merged and replaced with the corresponding standard subject person called the keyword "mother".
Taking the method as an example for being used in a hospital outpatient scene, a speaker often describes the condition of a patient, if the patient describes the condition of the patient himself, the subject person name keywords used by the speaker are often expressions similar to the words of 'I', wherein the conditions of a little other people are possibly included, but the occupation ratio is relatively small, so that the standard subject person name keywords with the highest occupation ratio can be directly used as the target subject person name keywords by counting the word occupation ratio conditions of the standard subject person name keywords. If relatives or friends of the patient describe the disease condition instead of the patient, the subject person name keywords used by the patient are often similar to the expressions of 'my XX', wherein a little non-patient condition may be included, but the percentage of the subject person name keywords is relatively small, so that the highest percentage of the standard subject person name keywords can be directly used as the target subject person name keywords by counting the word percentage of the standard subject person name keywords. For example, "my mother has always coughed recently, she has coughed for about a week, and has sputum and always does not go cleanly. The inventor refers to keywords of the subject language of the sentence, namely ' my mom, her, self and her ', firstly disambiguates the fuzzy keyword ' her ', so that the ' her ' is essentially ' my mom ', then carries out standardized processing, replaces the standard subject language with the keywords, namely ' mom is coughed almost always, and the mother coughs for about one week and still has phlegm, so that the mother can always spit. I coughs too before, and may not have been so long, so that today, when seeing it by mother ", determining the word proportion of each standard subject person for naming the keyword, the word proportion of" mother "can be obtained as 3/4=75%, and the word proportion of" i "is 1/4=25%, and at this time," mother "can be taken as the target subject person for naming the keyword.
S103: performing word segmentation processing on the target description sentence to obtain a plurality of target descriptors, and determining a plurality of event keywords from the target descriptors.
The target description sentence is an initial description sentence comprising the target subject person name keyword. That is, at this time, the sentences irrelevant to the target subject person name keyword are screened out, so that resource waste caused by processing invalid data can be avoided.
The word segmentation process can be implemented by means of related technologies known to those skilled in the art.
In some embodiments, after performing the word segmentation process on the target description sentence, before obtaining a plurality of target descriptors, the method further includes:
and performing data cleaning on each word of the target description sentence after word segmentation, wherein the data cleaning includes but is not limited to ways of stopping words and the like.
This may reduce meaningless words such as "in, bar" and the like in the object descriptor.
Alternatively, the event keyword may be a related name describing an event for which the data is directed, for example, when the method is applied to an outpatient service in a hospital setting, words related to body parts, organs or biological rhythms in the description of the patient's condition for himself may be used as the event keyword, such as: abdomen, stomach, teeth, heart, insomnia, frequent urination, etc. The event keyword may also be a word preset by a person skilled in the art according to needs.
S104: and if the standard keywords with the similarity higher than a preset similarity threshold with the event keywords exist in the preset standard word bank, replacing the event keywords in the target description sentence with the standard keywords.
The preset standard word bank stores a plurality of standard keywords which can be standard terms specified by the unit, the field and the like, and the event keywords are replaced by the standard keywords, so that the use and analysis of subsequent data can be facilitated, and the description data can be conveniently stored in a standardized manner.
The determination of the similarity can be implemented by means of related technologies in the field, and is not limited herein.
S105: and generating a plurality of simplified description sentences according to the target description sentences including the standard keywords, and storing the simplified description sentences and the description data corresponding to the simplified description sentences in an associated manner to complete the processing of the description data.
And generating simplified description sentences according to the target description sentences including the standard keywords, further screening the description texts, and further removing irrelevant sentences, so that the retained information is more concise.
Optionally, the simplified description statements may be generated according to the target description statements subjected to data washing in the order of the initial description statements.
In some embodiments, generating a number of simplified descriptive sentences from the target descriptive sentence that includes the standard keyword includes:
acquiring event descriptors in target description sentences with the same standard keywords, wherein the event descriptors comprise target descriptors except the event keywords;
carrying out similarity comparison on the event descriptors and standard descriptors in a preset standard lexicon;
if the similarity between the event descriptor and the standard descriptor is higher than a second preset similarity threshold, replacing the event descriptor with the standard descriptor;
and generating a simplified description sentence according to the standard keywords and the standard description words, wherein each simplified description sentence comprises one standard keyword.
Because descriptions of event keywords of the same type may be dispersed in a plurality of initial description sentences, and the plurality of initial description sentences may be continuous or discontinuous, if simplified description sentences are generated simply according to the sequence of the initial description sentences, it is likely that the same standard keywords are dispersed in a plurality of preceding and following sentences, and therefore, uniform event description word extraction and collection can be performed on target description sentences including the same standard keywords, and a simplified description sentence is generated, so that the description data can be simplified more simply.
The event descriptor can be labeled by the part-of-speech standard and the word meaning of the words obtained by segmenting the target description sentence, and then a plurality of words which have a description effect on the event are obtained. For example, when the method is applied to a hospital clinic, the patient states "i have a long headache, a special headache, lasting approximately three days", in this sentence, the event keyword is "head" and the event descriptor is "pain, special headache, three days".
The comparison between the event descriptors and the standard descriptors can be implemented by using the existing related technical means, which is not limited herein.
By integrating the event descriptors related to the event keywords in the description data into a simplified description sentence, standardizing the event keywords and the event descriptors and replacing the event descriptors with the standard keywords and the standard descriptors, the description data can be simplified and stored in a standardized manner, the central meaning of the description data can be known later, and time, energy and resources are saved.
Optionally, the first preset similarity threshold and the second preset similarity threshold may be set by those skilled in the art as needed.
In some embodiments, the method further comprises:
acquiring word sense attitude of each target descriptor in the simplified description sentence, wherein the word sense attitude comprises positive, negative and neutral;
determining sentence emotion information of the simplified description sentence according to the proportion of the target description words of the word sense attitude in the simplified description sentence;
and storing the statement emotion information, the simplified description statement and description data corresponding to the simplified description statement in an associated manner.
The word sense attitude can be obtained by analyzing the part-of-speech word sense of the target descriptor, for example, "open, insist, try" is positive word sense attitude, "anxiety, abandon" is negative word sense attitude, "three days" is neutral word sense attitude.
The meaning attitude of the event descriptors is essentially distinguished here, because the meaning attitude of the event keywords is "neutral", which does not express the attitude of the descriptors. The meaning attitude of the target descriptor in the simplified sentence can be directly obtained by labeling the meaning attitude of the standard descriptor in advance.
One specific way of determining the sentence emotion information of the simplified description sentence according to the proportion of the target description words of the word sense attitude in the simplified description sentence is as follows:
the numerical values of the target descriptors with positive, negative and neutral sense attitudes are respectively determined, and the sentence emotion information is determined according to the numerical values of the target descriptors, for example, the highest sense attitude is taken as the sentence emotion information, and the sentence emotion information of the simplified descriptive sentence is positive by taking the positive occupancy 50%, the negative occupancy 30% and the neutral occupancy 20% as examples. For another example, if the negative occupancy exceeds a preset occupancy threshold, such as 28%, the phrase emotion information of the simplified descriptive phrase is negative.
In some embodiments, the method further comprises:
displaying each simplified description sentence;
and selecting the simplified description sentence, and displaying sentence emotion information of the simplified description sentence, description data corresponding to the simplified description sentence and user emotion of the description data.
Therefore, the sentence emotion and the user emotion corresponding to the simplified descriptive sentence can be displayed to related personnel in a visual mode according to needs, and the related personnel can also view or listen to the original descriptive data according to needs, so that the method can rapidly assist the related personnel to locate the information of the related emotion class except the speech words, and better help the user of the simplified descriptive sentence to utilize the data.
In some embodiments, the initial description sentence is provided with identification information X, the description data and the description text corresponding to the initial description sentence are provided with the same identification information X, and the target description word and the target description sentence obtained subsequently also have the same identification information X. The identification information of the subsequent simplified description sentence is determined according to the identification information of the target descriptor included in the simplified description sentence, for example, if the target descriptor whose identification information is Y, Z, M is included in the simplified description sentence, the identification information of the simplified description sentence is YZM. And the identity identification information of the standard keywords and the standard descriptors is determined according to the event keywords and the event descriptors replaced by the standard keywords and the standard descriptors.
In some embodiments, the description data is generated from image data, the method further comprising:
and correspondingly storing the image and the simplified description sentence generated according to the description data converted from the image in a correlated manner.
The embodiment of the invention provides a description data processing method, which comprises the steps of generating a description text according to acquired description data, performing sentence segmentation processing to obtain a plurality of initial description sentences, acquiring subject personal keywords in each initial description sentence, determining target subject personal keywords, performing word segmentation processing on a target description sentence to obtain a plurality of target description words, determining a plurality of event keywords from the target description words, replacing the event keywords in the target description sentence with standard keywords if standard keywords with the similarity higher than a preset similarity threshold exist in a preset standard lexicon, generating a plurality of simplified description sentences according to the target description sentences comprising the standard keywords, and storing the simplified description sentences and the description data corresponding to the simplified description sentences in an associated manner to complete the processing of the description data, the voice data sorting efficiency can be improved, the time is saved, and the customer experience degree is improved.
Example two
Referring to FIG. 2, an embodiment of the present invention further provides a description data processing system 200, including:
a sentence dividing module 201, configured to generate a description text according to the obtained description data, and perform sentence dividing processing to obtain a plurality of initial description sentences;
a determining module 202, configured to obtain the subject personal keyword in each initial description sentence, and determine a target subject personal keyword;
the word segmentation module 203 is configured to perform word segmentation processing on the target description sentence to obtain a plurality of target description words, and determine a plurality of event keywords from the target description words, where the target description sentence is an initial description sentence including the target subject language personal keyword;
a replacing module 204, configured to replace an event keyword in a target description sentence with a standard keyword if the standard keyword in the preset standard lexicon has a similarity higher than a preset similarity threshold with the event keyword;
the generating module 205 is configured to generate a plurality of simplified description statements according to the target description statements including the standard keywords, and associate and store the simplified description statements and description data corresponding to the simplified description statements to complete processing of the description data.
In this embodiment, the system executes the method described in any of the above embodiments, and specific functions and technical effects are described with reference to the above embodiments, which are not described herein again.
Referring to fig. 3, an embodiment of the present application further provides an electronic device 1600, where the electronic device 1600 includes a processor 1601, a memory 1602 and a communication bus 1603;
the communication bus 1603 is used to connect the processor 1601 and the memory 1602;
the processor 1601 is configured to execute a computer program stored in the memory 1602 to implement the method according to any of the above embodiments.
Embodiments of the present application also provide a non-transitory readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the device may execute instructions (instructions) included in an embodiment of the present application.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, the computer program is used for causing the computer to execute the method according to the embodiment.
The foregoing embodiments are merely illustrative of the principles of the present invention and its efficacy, and are not to be construed as limiting the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.
In the corresponding figures of the above embodiments, the connecting lines may represent the connection relationship between the various components to indicate that more constituent signal paths (consistent _ signal paths) and/or one or more ends of some lines have arrows to indicate the main information flow direction, the connecting lines being used as a kind of identification, not a limitation on the scheme itself, but rather to facilitate easier connection of circuits or logic units using these lines in conjunction with one or more example embodiments, and any represented signal (determined by design requirements or preferences) may actually comprise one or more signals that may be transmitted in any one direction and may be implemented in any suitable type of signal scheme.
In the above embodiments, unless otherwise specified, the description of common objects by using "first", "second", etc. ordinal numbers only indicate that they refer to different instances of the same object, rather than indicating that the objects being described must be in a given sequence, whether temporally, spatially, in ranking, or in any other manner.
In the above-described embodiments, reference in the specification to "the embodiment," "an embodiment," "another embodiment," or "other embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments. The various appearances of the phrase "the present embodiment," "one embodiment," or "another embodiment" are not necessarily all referring to the same embodiment. If the specification states a component, feature, structure, or characteristic "may", "might", or "could" be included, that particular component, feature, structure, or characteristic is not necessarily included. If the specification or claim refers to "a" or "an" element, that does not mean there is only one of the element. If the specification or claim refers to "a further" element, that does not preclude there being more than one of the further element.
In the embodiments described above, although the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory structures (e.g., dynamic ram (dram)) may use the discussed embodiments. The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The invention is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Claims (9)
1. A description data processing method, comprising:
generating a description text according to the obtained description data, and performing sentence splitting processing to obtain a plurality of initial description sentences;
obtaining subject personal title keywords in each initial description sentence, and determining target subject personal title keywords;
performing word segmentation processing on a target description sentence to obtain a plurality of target description words, and determining a plurality of event keywords from the target description words, wherein the target description sentence is the initial description sentence comprising the target subject person name keywords;
if a standard keyword with the similarity higher than a preset similarity threshold with the event keyword exists in a preset standard word bank, replacing the event keyword in the target description sentence with the standard keyword;
generating a plurality of simplified description sentences according to the target description sentences including the standard keywords, and storing the simplified description sentences and description data corresponding to the simplified description sentences in an associated manner to complete the processing of the description data;
wherein the generating of the plurality of simplified description sentences from the target description sentence including the standard keyword includes,
obtaining event descriptors in each target description sentence with the same standard keyword, wherein the event descriptors comprise the target descriptors except the event keyword;
carrying out similarity comparison on the event descriptors and standard descriptors in a preset standard lexicon;
if the similarity between the event descriptor and the standard descriptor is higher than a second preset similarity threshold, replacing the event descriptor with the standard descriptor;
and generating one simplified description sentence according to the standard keywords and the standard description words, wherein each simplified description sentence comprises one standard keyword.
2. The description data processing method according to claim 1, wherein the target subject person keyword is determined in a manner including any one of:
comparing the similarity of the subject personal title key words with preset personal title key words, and if the similarity between the subject personal title key words and the preset personal title key words is higher than a first preset similarity threshold, taking the subject personal title key words as target subject personal title key words;
performing synonymy replacement processing on clear keywords in the subject personal title keywords to obtain a plurality of standard subject personal title keywords, performing disambiguation processing on fuzzy keywords in the subject personal title keywords, replacing the fuzzy keywords with the standard subject personal title keywords, determining the word proportion of each standard subject personal title keyword, and taking the standard subject personal title keyword with the highest word proportion as a target subject personal title keyword.
3. The description data processing method according to claim 1, wherein the generation manner of the description data includes:
acquiring audio data, wherein the audio data comprises speech information and user emotion identification information, and the user emotion identification information comprises at least one of tone, pause, volume and speed;
and determining the user emotion according to the user emotion information, and generating the description data according to the user emotion and the speech information.
4. The description data processing method according to claim 1, wherein the generation manner of the description data includes:
acquiring video data, wherein the video data comprises speech information and user emotion identification information, and the user emotion identification information comprises at least one of tone, pause, volume, speed of speech, facial expression and a presentation gesture;
and determining the user emotion according to the user emotion information, and generating the description data according to the user emotion and the speech information.
5. A method of description data processing as claimed in any one of claims 3 or 4, characterized in that the method further comprises:
acquiring word sense attitude of each target descriptor in the simplified description sentence, wherein the word sense attitude comprises positive, negative and neutral;
determining sentence emotion information of the simplified description sentence according to the proportion of the target description words of the semantic attitude of each word in the simplified description sentence;
and storing the statement emotion information, the simplified description statement and description data corresponding to the simplified description statement in an associated manner.
6. The description data processing method of claim 5, wherein the method further comprises:
displaying each simplified descriptive statement;
and selecting the simplified description statement, and displaying statement emotion information of the simplified description statement, the corresponding description data and the user emotion of the description data.
7. A description data processing system, comprising:
the sentence dividing module is used for generating a description text according to the obtained description data and carrying out sentence dividing processing to obtain a plurality of initial description sentences;
the determining module is used for acquiring the subject personal title keywords in each initial description sentence and determining the target subject personal title keywords;
the word segmentation module is used for performing word segmentation processing on a target description sentence to obtain a plurality of target description words, and determining a plurality of event keywords from the target description words, wherein the target description sentence is the initial description sentence comprising the target subject language keyword;
the replacing module is used for replacing the event keyword in the target description sentence with the standard keyword if the standard keyword with the similarity higher than a preset similarity threshold exists in a preset standard word bank;
the generating module is used for generating a plurality of simplified description sentences according to the target description sentences including the standard keywords, and storing the simplified description sentences and the description data corresponding to the simplified description sentences in an associated manner to complete the processing of the description data;
wherein the generating of the plurality of simplified description sentences from the target description sentence including the standard keyword includes,
obtaining event descriptors in each target description sentence with the same standard keyword, wherein the event descriptors comprise the target descriptors except the event keyword;
carrying out similarity comparison on the event descriptors and standard descriptors in a preset standard lexicon;
if the similarity between the event descriptor and the standard descriptor is higher than a second preset similarity threshold, replacing the event descriptor with the standard descriptor;
and generating one simplified description sentence according to the standard keywords and the standard description words, wherein each simplified description sentence comprises one standard keyword.
8. An electronic device comprising a processor, a memory, and a communication bus;
the communication bus is used for connecting the processor and the memory;
the processor is configured to execute a computer program stored in the memory to implement the method of any one of claims 1-6.
9. A computer-readable storage medium, having stored thereon a computer program for causing a computer to perform the method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110663888.7A CN113255323B (en) | 2021-06-16 | 2021-06-16 | Description data processing method, system, electronic device and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110663888.7A CN113255323B (en) | 2021-06-16 | 2021-06-16 | Description data processing method, system, electronic device and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113255323A CN113255323A (en) | 2021-08-13 |
CN113255323B true CN113255323B (en) | 2021-11-19 |
Family
ID=77188090
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110663888.7A Active CN113255323B (en) | 2021-06-16 | 2021-06-16 | Description data processing method, system, electronic device and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113255323B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114385890B (en) * | 2022-03-22 | 2022-05-20 | 深圳市世纪联想广告有限公司 | Internet public opinion monitoring system |
CN117541275B (en) * | 2024-01-09 | 2024-06-07 | 深圳市微购科技有限公司 | Intelligent terminal commodity sales management system based on cloud technology |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080215577A1 (en) * | 2007-03-01 | 2008-09-04 | Sony Corporation | Information processing apparatus and method, program, and storage medium |
CN106776570A (en) * | 2016-12-27 | 2017-05-31 | 竹间智能科技(上海)有限公司 | A kind of people claims mask method |
CN107392436A (en) * | 2017-06-27 | 2017-11-24 | 北京神州泰岳软件股份有限公司 | A kind of method and apparatus for extracting enterprise's incidence relation information |
CN109543985A (en) * | 2018-11-15 | 2019-03-29 | 李志东 | Business risk appraisal procedure, system and medium |
-
2021
- 2021-06-16 CN CN202110663888.7A patent/CN113255323B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN113255323A (en) | 2021-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102073979B1 (en) | Server and method for providing feeling analysis based emotional diary service using artificial intelligence based on speech signal | |
WO2019246239A1 (en) | Systems and methods for mental health assessment | |
CN113255323B (en) | Description data processing method, system, electronic device and medium | |
JP2017016566A (en) | Information processing device, information processing method and program | |
CN109871440B (en) | Intelligent prompting method, device and equipment based on semantic analysis | |
CN113591489B (en) | Voice interaction method and device and related equipment | |
CN111145903B (en) | Method and device for acquiring vertigo inquiry text, electronic equipment and inquiry system | |
US11756540B2 (en) | Brain-inspired spoken language understanding system, a device for implementing the system, and method of operation thereof | |
CN109299227B (en) | Information query method and device based on voice recognition | |
KR101584685B1 (en) | A memory aid method using audio-visual data | |
Antunes et al. | A framework to support development of sign language human-computer interaction: Building tools for effective information access and inclusion of the deaf | |
JP2017016296A (en) | Image display device | |
US20230260533A1 (en) | Automated segmentation of digital presentation data | |
CN116796857A (en) | LLM model training method, device, equipment and storage medium thereof | |
Theakston et al. | Handling agents and patients: Representational cospeech gestures help children comprehend complex syntactic constructions. | |
Wagner et al. | Applying cooperative machine learning to speed up the annotation of social signals in large multi-modal corpora | |
Yoon et al. | Fear emotion classification in speech by acoustic and behavioral cues | |
JP7049010B1 (en) | Presentation evaluation system | |
CN113254814A (en) | Network course video labeling method and device, electronic equipment and medium | |
JP2022056592A (en) | Conversation support device, conversation support system, conversation support method, and program | |
McTear et al. | Affective conversational interfaces | |
JP2020154427A (en) | Information processing apparatus, information processing method, and program | |
Ouyang | Effects of non-verbal paralanguage capturing on meaning transfer in consecutive interpreting | |
CN115171673A (en) | Role portrait based communication auxiliary method and device and storage medium | |
CN113761899A (en) | Medical text generation method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220708 Address after: 201615 room 1904, G60 Kechuang building, No. 650, Xinzhuan Road, Songjiang District, Shanghai Patentee after: Shanghai Mingping Medical Data Technology Co.,Ltd. Address before: 102400 no.86-n3557, Wanxing Road, Changyang, Fangshan District, Beijing Patentee before: Mingpinyun (Beijing) data Technology Co.,Ltd. |
|
TR01 | Transfer of patent right |