CN112037782A - RPA and AI combined early media identification method, device, equipment and storage medium - Google Patents

RPA and AI combined early media identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN112037782A
CN112037782A CN202010794278.6A CN202010794278A CN112037782A CN 112037782 A CN112037782 A CN 112037782A CN 202010794278 A CN202010794278 A CN 202010794278A CN 112037782 A CN112037782 A CN 112037782A
Authority
CN
China
Prior art keywords
early media
data
sample
trained
media identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010794278.6A
Other languages
Chinese (zh)
Inventor
胡一川
汪冠春
褚瑞
李玮
唐祥光
谷宇维
胡景超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Benying Network Technology Co Ltd
Beijing Laiye Network Technology Co Ltd
Original Assignee
Beijing Benying Network Technology Co Ltd
Beijing Laiye Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Benying Network Technology Co Ltd, Beijing Laiye Network Technology Co Ltd filed Critical Beijing Benying Network Technology Co Ltd
Publication of CN112037782A publication Critical patent/CN112037782A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

The application provides an early media identification method, device, equipment and storage medium combining RPA and AI. The method comprises the following steps: obtaining early media sample data, and performing data segmentation on the early media sample data to obtain a plurality of sample slice data; respectively carrying out voice recognition on each sample slice data, and determining the labeling information of each sample slice data according to the voice recognition result; and training the constructed early media identification model to be trained by utilizing the plurality of sample slice data and the corresponding marking information to obtain the trained early media identification model, wherein the trained early media identification model is used for identifying the input early media data to be identified. According to the early media identification method and device, efficient and accurate early media identification is achieved through automatic marking of early media training sample data and early media identification model training, occupation of computing resources is remarkably reduced, and concurrence of early media identification is improved.

Description

RPA and AI combined early media identification method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of text understanding, in particular to an early media identification method, device, equipment and storage medium combining RPAROBOTIC Process Automation (robotics Process Automation) and AI (Artificial Intelligence).
Background
Robot Process Automation (RPA) simulates the operation of a human on a computer through specific robot software and automatically executes Process tasks according to rules.
Artificial Intelligence (AI) is a technical science that studies and develops theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence. Early media is a portion of media information that may be particularly useful when present in the beginning of a piece of media information, such as a call waiting alert tone on a telephone call, etc. In the intelligent interactive system, the information of the early media is identified, and the necessary parts in the intelligent interaction are completed by adopting a corresponding interactive processing mode according to the identification result.
In the prior art, early media Recognition is implemented by using Automatic Speech Recognition (ASR for short). The voice information of the early media is converted into the text information, and then the semantics of the text information is analyzed to determine the subsequent interactive processing mode of the intelligent interactive system. However, since the ASR technique is a very machine resource consuming technique, the prior art will consume a large amount of computing resources and increase the service cost when the technique is used for early media recognition.
Disclosure of Invention
The embodiment of the application provides an early media identification method, device and equipment combining RPA and AI and a storage medium, which solve the problems of high occupation and low concurrency of computing resources of an automatic voice identification technology, thereby improving the concurrency and accuracy of early media identification.
In a first aspect, an embodiment of the present application provides an early media identification method combining an RPA and an AI, including:
s1, obtaining early media sample data, and performing data segmentation on the early media sample data to obtain a plurality of sample slice data; wherein, data intersection exists in partial data between any two adjacent sample slice data;
s2, respectively carrying out voice recognition on each sample slice data, and determining the labeling information of each sample slice data according to the voice recognition result;
s3, training the constructed early media identification model to be trained by using the plurality of sample slice data and the corresponding label information to obtain the trained early media identification model; and the trained early media identification model is used for identifying the input early media data to be identified.
In an optional embodiment, the above S3 specifically includes:
s31, respectively carrying out short-time Fourier transform on each sample slice data to obtain slice data to be trained of each sample slice data;
and S32, inputting the data of each slice to be trained and the corresponding label information to the early media identification model to be trained so as to train the constructed early media identification model to be trained and obtain the trained early media identification model.
In an optional embodiment, the above S2 specifically includes:
s21, performing voice recognition on the sample slice data by using an ASR technology to respectively obtain slice text information corresponding to the sample slice data;
and S22, comparing the slice text information with preset target text information to obtain labeling information corresponding to each sample slice data.
In an alternative embodiment, S1 is preceded by:
and S101, performing data enhancement processing on the obtained sample data to obtain early media sample data.
In an alternative embodiment, the data enhancement process comprises at least one of the following processes: adding noise processing, time scaling processing, and pitch scaling processing.
In an alternative embodiment, S3 further includes:
s33, determining the accuracy of the labeling information corresponding to each sample slice data;
s34, determining target sample slice data with inaccurate marking information according to the accuracy of each sample slice data, and re-marking the target sample slice data with inaccurate marking information to obtain new marking information;
and S35, after the step of training the constructed early media identification model to be trained by using the plurality of sample slice data and the corresponding marking information is executed, training the trained early media identification model by using the target sample slice data and the corresponding new marking information to obtain the trained early media identification model.
In a second aspect, an embodiment of the present application provides another early media identification method combining an RPA and an AI, including:
s4, acquiring early media data to be identified;
s5, processing the early media data to be recognized by using the trained early media recognition model obtained by the method of any one of the first aspect, and obtaining a recognition result.
In a third aspect, an embodiment of the present application provides an early media identification apparatus combining an RPA and an AI, including:
the data processing module is used for obtaining early media sample data, carrying out data segmentation on the early media sample data and obtaining a plurality of sample slice data; wherein, data intersection exists in partial data between any two adjacent sample slice data;
the first identification module is used for respectively carrying out voice identification on each sample slice data and determining the marking information of each sample slice data according to the voice identification result;
the training module is used for training the constructed early media identification model to be trained by utilizing the plurality of sample slice data and the corresponding marking information to obtain the trained early media identification model; and the trained early media identification model is used for identifying the input early media data to be identified.
In a fourth aspect, an embodiment of the present application provides another early media identification device combining an RPA and an AI, including:
the acquisition module is used for acquiring early media data to be identified;
the identification module is configured to process the early media data to be identified by using the trained early media identification model obtained by the method of any one of the first aspect, and obtain an identification result.
In a fifth aspect, an embodiment of the present application provides an electronic device, including:
a memory, a processor, and a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the early media identification method combining RPA and AI of any one of the first and second aspects.
In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the early media identification method combining RPA and AI according to any one of the first aspect and the second aspect.
The embodiment of the application provides an early media identification method, device, equipment and storage medium combining RPA and AI, which are used for obtaining early media sample data and carrying out data segmentation on the early media sample data to obtain a plurality of sample slice data; respectively carrying out voice recognition on each sample slice data, and determining the labeling information of each sample slice data according to the voice recognition result; training the constructed early media identification model to be trained by using the plurality of sample slice data and the corresponding marking information to obtain the trained early media identification model; and acquiring early media data to be recognized, and processing the early media data to be recognized by using the trained early media recognition model to obtain a recognition result. According to the scheme, the early media model training data is obtained by segmenting the early media sample data and performing voice recognition, the early media model training data is used for training the early media recognition model to be trained to obtain the early media recognition model, and the early media model after training is used for processing the early media data to obtain the recognition result, so that the training efficiency of the early media recognition model is improved, and the occupation of computing resources is reduced.
It should be understood that what is described in the summary section above is not intended to limit key or critical features of the embodiments of the application, nor is it intended to limit the scope of the application. Other features of the present application will become apparent from the following description.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a schematic diagram of a network architecture upon which the present disclosure is based;
fig. 2 is a flowchart of an early media identification method combining RPA and AI according to an embodiment of the disclosure;
fig. 3 is a flowchart of another early media identification method combining RPA and AI according to an embodiment of the disclosure;
fig. 4 is a flowchart of another early media identification method combining RPA and AI according to an embodiment of the disclosure;
fig. 5 is a flowchart of another early media identification method combining RPA and AI according to an embodiment of the disclosure;
fig. 6 is a flowchart of another early media identification method combining RPA and AI according to an embodiment of the disclosure;
fig. 7 is a schematic structural diagram of an early media identification device combining an RPA and an AI according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of another early media identification device combining RPA and AI according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present application. It should be understood that the drawings and embodiments of the present application are for illustration purposes only and are not intended to limit the scope of the present application.
In the field of intelligent interaction, identifying information of early media is an important link in the intelligent interaction process, wherein the early media exists in the initial part of a piece of media information, for example, in a piece of voice data, a prompt tone in front of voice is the early media. The early media information is identified, and the processing efficiency of the media information can be improved.
In the process that the robot contacts a potential customer through a telephone, identifying audio information transmitted back by the telephone is an important application scene of early media identification, for example, the robot dials a telephone of the customer, the voice transmitted back by the telephone contains 'power off', the robot needs to identify the information and hang up the telephone instead of continuing to wait. In the prior art, the Recognition of the audio information is realized by using an Automatic Speech Recognition technology (ASR for short), the ASR technology mainly comprises four major parts of feature extraction, model Recognition, a language model and a dictionary, and decoding, and the defects of high occupation of computing resources, low concurrency and the like exist in the using process, so that the efficiency of the Recognition of the audio information is reduced.
In order to solve these problems, the inventor researches and discovers that the audio information transmitted back by the phone received by the robot mainly includes the following situations: polyphonic ringtone sounds, advertisement sounds, telephone prompting sounds, customer sounds and the like can train a more efficient early media identification model to identify the audio information. Firstly, acquiring a large amount of audio sample data and slicing, performing voice recognition on sliced data through an ASR (asynchronous receiver-transmitter) technology to determine marking information, realizing automatic marking, manually checking the marking information and modifying error information; then, short-time Fourier transform is carried out on the slice data, and the transformed slice data and the corresponding marking confidence are input into a model to be trained for training to obtain a trained early media identification model; and finally, identifying the audio information to be identified through the trained early media identification model. According to the application scheme, the model training efficiency is improved under the condition that the voice recognition accuracy rate is ensured, and compared with the prior art, the trained early media recognition model has the advantages of low occupation of computing resources and high concurrency.
Fig. 1 is a schematic diagram of a network architecture based on the present disclosure, and as shown in fig. 1, the system provided in this embodiment includes a terminal 11 and a server 12. The terminal 11 may be a desktop computer, a notebook computer, a tablet computer, a smart phone, or other hardware devices. The present embodiment does not set any particular limitation to the implementation of the terminal 11 as long as it can normally communicate with the server.
When the model needs to be trained, a training instruction is input on the terminal 11, the server 12 starts to automatically label early media sample data, the labeled sample data is input into the model for training, and the trained model is stored in the server 12. When the early media identification is needed, an identification instruction is input on the terminal 11, and the early media data to be identified is transmitted to the model line for identification to obtain an identification result.
Specifically, the server 12 may obtain the early media sample data from the terminal 11, or may store the early media sample data for model training. The embodiment is not particularly limited to the specific implementation.
The technical solution of the present application will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 2 is a flowchart of an early media identification method combining an RPA and an AI according to an embodiment of the disclosure, and as shown in fig. 2, the method of this embodiment may include:
s1, obtaining early media sample data, and performing data segmentation on the early media sample data to obtain a plurality of sample slice data; and data intersection exists in partial data between any two adjacent sample slice data.
In this embodiment, the data segmentation may be performed on the early media sample data by self-defining a segmentation duration and a start time, that is, the duration and the start time of each piece of sample slice data may be automatically adjusted.
For example, the sample time length of a segment of early media sample is 2s, and in a possible case, the same slice time length is used for each sample slice, and the sample slice data after slicing includes: the data of the first sample slice is 0 s-1 s, the data of the second sample slice is 0.5 s-1.5 s, and the data of the third sample slice is 1 s-2 s; in a second possible case, each sample slice uses different slice durations, and the data of the sample slice after the slicing is completed includes: the first sample slice data is 0 s-0.5 s, the second sample slice data is 0.25 s-1.5 s, and the third sample slice data is 1 s-2 s. Only two possible sample slice data cases are listed here, and other identical or similar cases are not described in detail.
And S2, respectively carrying out voice recognition on each sample slice data, and determining the labeling information of each sample slice data according to the voice recognition result.
In this embodiment, the early media sample data after each segment is recognized by an automatic speech recognition technology, and the early media sample data is classified and labeled according to the recognition result.
For example, the early media sample data tagging information comprises: "turned off", "busy", "limited", identifying "your calling user is in call" from the first piece of early media sample data, and after semantic analysis, it can be known that "your calling user is in call" can be marked as "busy", and then the piece of early media sample data can be marked as "busy".
S3, training the constructed early media identification model to be trained by using the plurality of sample slice data and the corresponding label information to obtain the trained early media identification model; and the trained early media identification model is used for identifying the input early media data to be identified.
By adopting the method, the embodiment of the application realizes the automatic marking of the early media sample data and the training of the early media identification model, obviously reduces the occupancy rate of the computing resources compared with the prior art, and improves the early media identification efficiency.
On the basis of the embodiment described in fig. 2, fig. 3 is a flowchart of another early media identification method combining RPA and AI according to an embodiment of the present disclosure, as shown in fig. 3, on the basis of fig. 2, before S1, further including:
and S101, performing data enhancement processing on the obtained sample data to obtain early media sample data.
The data enhancement processing includes at least one of the following processes: adding noise processing, time scaling processing, and pitch scaling processing. The time scaling processing refers to shortening or lengthening the time of the audio data, and the pitch scaling processing refers to speeding up or slowing down the vibration frequency of the audio data.
In this embodiment, the priority order of the data enhancement processing is: firstly, adding noise processing; secondly, time scaling processing is carried out; and finally, the treble zooming processing is carried out.
The optional embodiment can obtain richer early media sample data types by performing enhancement processing on the obtained early media sample data, so that the early media sample data to be marked are closer to real data, and the robustness of the trained early media identification model is further improved.
Fig. 4 is a flowchart of another early media identification method combining RPA and AI according to an embodiment of the disclosure. As shown in fig. 4, based on fig. 2, S2 specifically includes:
and S21, performing voice recognition on the sample slice data by using an ASR technology, and respectively obtaining slice text information corresponding to the sample slice data.
And S22, comparing the slice text information with preset target text information to obtain labeling information corresponding to each sample slice data.
For example, the preset target text information includes: the method comprises the steps of turning off, turning on and turning off, recognizing a text of a first piece of slice data as 'in call' by utilizing an ASR (access service) technology, and obtaining marking information corresponding to the first piece of slice data as 'in busy' if the text of the first piece of slice data is recognized to be 'in call' through comparison, wherein the 'in call' and the 'in busy' are the same meanings.
Correspondingly, S3 specifically includes:
and S31, respectively carrying out short-time Fourier transform on each sample slice data to obtain the to-be-trained slice data of each sample slice data.
In this embodiment, only short-time fourier transform is used to process each sample slice data, and the slice data to be trained is obtained.
And S32, inputting the data of each slice to be trained and the corresponding label information to the early media identification model to be trained so as to train the constructed early media identification model to be trained and obtain the trained early media identification model.
Compared with the foregoing embodiment, in the embodiment, by comparing each piece of slice text information with the preset target text information, the standard of the early media sample data marking information can be unified, and by performing short-time fourier transform on each piece of sample slice data, occupation of computing resources is reduced without affecting accuracy of the ASR technology.
Fig. 5 is a flowchart of another early media identification method combining RPA and AI according to an embodiment of the disclosure. This embodiment may be performed on the basis of the embodiment shown in fig. 2, fig. 3, or fig. 4, where the description is performed on the basis of the embodiment shown in fig. 2, and as shown in fig. 5, S3 further includes:
and S33, determining the accuracy of the labeling information corresponding to each sample slice data.
In this embodiment, the labeling information corresponding to each sample slice data is manually checked to determine whether the labeling information is accurate.
And S34, determining target sample slice data with inaccurate marking information according to the accuracy of each sample slice data, and re-marking the target sample slice data with inaccurate marking information to obtain new marking information.
For example, if the sample slice data is marked as "shut down", and the manual inspection finds that the sample slice data is actually "shut down", it is determined that the sample slice data marking information is incorrect, and the sample slice data marking information is modified to "shut down".
And S35, after the step of training the constructed early media identification model to be trained by using the plurality of sample slice data and the corresponding marking information is executed, training the trained early media identification model by using the target sample slice data and the corresponding new marking information to obtain the trained early media identification model.
In this embodiment, training is performed on the basis of the previously trained early media recognition model, and the latest early media recognition model is obtained.
Compared with the foregoing embodiments, in the present embodiment, the annotation information corresponding to inaccurate early media slice data is corrected manually, so that the accuracy of the annotation information is improved, and further, the identification accuracy of the early media identification model is improved.
Fig. 6 is a flowchart of another early media identification method combining RPA and AI according to an embodiment of the disclosure, and as shown in fig. 6, the method of this embodiment may include:
s4, acquiring early media data to be identified;
and S5, processing the early media data to be recognized by using the trained early media recognition model obtained by the method of any one of the previous embodiments, and obtaining a recognition result.
S4 and S5 are performed on the basis of the above-described embodiment.
The embodiment of the application provides an early media identification method combining RPA and AI, which comprises the steps of obtaining early media sample data, carrying out data segmentation on the early media sample data, and obtaining a plurality of sample slice data; respectively carrying out voice recognition on each sample slice data, and determining the labeling information of each sample slice data according to the voice recognition result; training the constructed early media identification model to be trained by using the plurality of sample slice data and the corresponding marking information to obtain the trained early media identification model; and acquiring early media data to be recognized, and processing the early media data to be recognized by using the trained early media recognition model to obtain a recognition result. According to the scheme, the early media model training data is obtained by segmenting the early media sample data and performing voice recognition, the early media model training data is used for training the early media recognition model to be trained to obtain the early media recognition model, and the early media model after training is used for processing the early media data to obtain the recognition result, so that the training efficiency of the early media recognition model is improved, and the occupation of computing resources is reduced.
Fig. 7 is a schematic structural diagram of an early media identification device combining an RPA and an AI according to an embodiment of the present disclosure, as shown in fig. 7, the device of this embodiment may include:
the data processing module 71 is configured to obtain early media sample data, perform data segmentation on the early media sample data, and obtain multiple sample slice data; wherein, data intersection exists in partial data between any two adjacent sample slice data;
the first recognition module 72 is configured to perform speech recognition on each sample slice data, and determine labeling information of each sample slice data according to a speech recognition result;
the training module 73 is configured to train the constructed early media identification model to be trained by using the plurality of sample slice data and the corresponding label information, so as to obtain a trained early media identification model; and the trained early media identification model is used for identifying the input early media data to be identified.
Optionally, the training module 73 is specifically configured to perform short-time fourier transform on each sample slice data, so as to obtain to-be-trained slice data of each sample slice data; and inputting the data of each slice to be trained and the corresponding marking information into the early media identification model to be trained so as to train the constructed early media identification model to be trained and obtain the trained early media identification model.
Optionally, the first recognition module 72 is specifically configured to perform speech recognition on the sample slice data by using an ASR technique, and obtain slice text information corresponding to the sample slice data respectively; and comparing the text information of each sample slice with preset target text information to obtain the labeling information corresponding to the data of each sample slice.
Optionally, the data processing module 71 is further configured to perform data enhancement processing on the obtained sample data to obtain early media sample data.
Optionally, the data enhancement processing includes at least one of the following processing: adding noise processing, time scaling processing, and pitch scaling processing.
Optionally, the training module 73 is further configured to: determining the accuracy of the labeling information corresponding to each sample slice data; determining target sample slice data with inaccurate marking information according to the accuracy of each sample slice data, and re-marking the target sample slice data with inaccurate marking information to obtain new marking information of the target sample slice data; and after the step of training the constructed early media identification model to be trained by using the plurality of sample slice data and the corresponding marking information is executed, training the trained early media identification model by using the target sample slice data and the corresponding new marking information to obtain the trained early media identification model.
The apparatus of this embodiment may be configured to implement the technical solutions of the above method embodiments, and the implementation principles thereof are similar and will not be described herein again.
The embodiment of the application provides an early media identification device combining RPA and AI, which is used for carrying out data segmentation on early media sample data by obtaining the early media sample data to obtain a plurality of sample slice data; respectively carrying out voice recognition on each sample slice data, and determining the labeling information of each sample slice data according to the voice recognition result; training the constructed early media identification model to be trained by using the plurality of sample slice data and the corresponding marking information to obtain the trained early media identification model; and acquiring early media data to be recognized, and processing the early media data to be recognized by using the trained early media recognition model to obtain a recognition result. According to the scheme provided by the application, the early media model training data is obtained by segmenting the early media sample data and carrying out voice recognition, the early media model training data is used for training the early media recognition model to be trained to obtain the early media recognition model, and the early media recognition model after training is used for processing the early media data to obtain the recognition result, so that the marking efficiency of the early media data is improved, and the occupancy rate of computing resources is reduced.
Fig. 8 is a schematic structural diagram of another early media identification device combining an RPA and an AI according to an embodiment of the present disclosure, and as shown in fig. 8, the device according to this embodiment may include:
an obtaining module 81, configured to obtain early media data to be identified;
the identification module 82 is configured to process the early media data to be identified by using the trained early media identification model obtained by the method according to any one of the foregoing embodiments, so as to obtain an identification result.
Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, and as shown in fig. 9, an electronic device 60 according to this embodiment may include: memory 61, processor 62 and computer programs.
A memory 61 for storing a computer program (such as an application program, a functional module, and the like implementing one of the above-described RPA-based invoice information processing methods), computer instructions, and the like;
the computer programs, computer instructions, etc. described above may be stored in one or more memories 61 in partitions. And the computer programs, computer instructions, data, etc. described above may be invoked by the processor 62.
A processor 62 for executing the computer program stored in the memory 61 to implement the steps of the method according to the above embodiments.
Reference may be made in particular to the description relating to the preceding method embodiment.
The memory 61 and the processor 62 may be separate structures or may be an integrated structure integrated together. When the memory 61 and the processor 62 are separate structures, the memory 61 and the processor 62 may be coupled by a bus 64.
An electronic device of this embodiment may execute the technical solutions in the methods shown in fig. 2, fig. 3, fig. 4, fig. 5, and fig. 6, and the specific implementation process and technical principle of the electronic device refer to the related descriptions in the methods shown in fig. 2, fig. 3, fig. 4, fig. 5, and fig. 6, which are not described again here.
In addition, embodiments of the present application further provide a computer-readable storage medium, in which computer-executable instructions are stored, and when at least one processor of the user equipment executes the computer-executable instructions, the user equipment performs the above-mentioned various possible methods.
Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in user equipment. Of course, the processor and the storage medium may reside as discrete components in a communication device.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (11)

1. An early media identification method combining RPA and AI, comprising:
s1, obtaining early media sample data, and performing data segmentation on the early media sample data to obtain a plurality of sample slice data; wherein, data intersection exists in partial data between any two adjacent sample slice data;
s2, respectively carrying out voice recognition on each sample slice data, and determining the labeling information of each sample slice data according to the voice recognition result;
s3, training the constructed early media identification model to be trained by using the plurality of sample slice data and the corresponding label information to obtain the trained early media identification model; and the trained early media identification model is used for identifying the input early media data to be identified.
2. The early media identification method of claim 1, wherein S3 comprises:
s31, respectively carrying out short-time Fourier transform on each sample slice data to obtain slice data to be trained of each sample slice data;
and S32, inputting the data of each slice to be trained and the corresponding label information to the early media identification model to be trained so as to train the constructed early media identification model to be trained and obtain the trained early media identification model.
3. The early media identification method according to claim 1, wherein the S2 comprises:
s21, performing voice recognition on the sample slice data by using an ASR technology to respectively obtain slice text information corresponding to the sample slice data;
and S22, comparing the slice text information with preset target text information to obtain labeling information corresponding to each sample slice data.
4. The early media identification method of claim 1, further comprising:
and S101, performing data enhancement processing on the obtained sample data to obtain early media sample data.
5. The early media identification method according to claim 4, wherein the data enhancement process comprises at least one of the following processes: adding noise processing, time scaling processing, and pitch scaling processing.
6. The early media identification method according to any of claims 1-5, wherein S3 further comprises:
s33, determining the accuracy of the labeling information corresponding to each sample slice data;
s34, determining target sample slice data with inaccurate marking information according to the accuracy of each sample slice data, and re-marking the target sample slice data with inaccurate marking information to obtain new marking information;
and S35, after the step of training the constructed early media identification model to be trained by using the plurality of sample slice data and the corresponding marking information is executed, training the trained early media identification model by using the target sample slice data and the corresponding new marking information to obtain the trained early media identification model.
7. An early media identification method combining RPA and AI, comprising:
s4, acquiring early media data to be identified;
s5, processing the early media data to be recognized by using the trained early media recognition model obtained by the method of any one of claims 1 to 6 to obtain a recognition result.
8. An early media identification device combining RPA and AI, comprising:
the data processing module is used for obtaining early media sample data, carrying out data segmentation on the early media sample data and obtaining a plurality of sample slice data; wherein, data intersection exists in partial data between any two adjacent sample slice data;
the first identification module is used for respectively carrying out voice identification on each sample slice data and determining the marking information of each sample slice data according to the voice identification result;
the training module is used for training the constructed early media identification model to be trained by utilizing the plurality of sample slice data and the corresponding marking information to obtain the trained early media identification model; and the trained early media identification model is used for identifying the input early media data to be identified.
9. An early media identification device combining RPA and AI, comprising:
the acquisition module is used for acquiring early media data to be identified;
an identification module, configured to process the early media data to be identified by using the trained early media identification model obtained by the method according to any one of claims 1 to 6, so as to obtain an identification result.
10. An electronic device, comprising:
a memory, a processor, and a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-7.
11. A computer-readable storage medium, having stored thereon a computer program for execution by a processor to perform the method of any one of claims 1-7.
CN202010794278.6A 2020-06-30 2020-08-10 RPA and AI combined early media identification method, device, equipment and storage medium Pending CN112037782A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010616925 2020-06-30
CN2020106169254 2020-06-30

Publications (1)

Publication Number Publication Date
CN112037782A true CN112037782A (en) 2020-12-04

Family

ID=73576765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010794278.6A Pending CN112037782A (en) 2020-06-30 2020-08-10 RPA and AI combined early media identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112037782A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090063145A1 (en) * 2004-03-02 2009-03-05 At&T Corp. Combining active and semi-supervised learning for spoken language understanding
CN101777347A (en) * 2009-12-07 2010-07-14 中国科学院自动化研究所 Model complementary Chinese accent identification method and system
EP2541544A1 (en) * 2011-06-30 2013-01-02 France Telecom Voice sample tagging
CN110598210A (en) * 2019-08-29 2019-12-20 深圳市优必选科技股份有限公司 Entity recognition model training method, entity recognition device, entity recognition equipment and medium
CN110610698A (en) * 2019-09-12 2019-12-24 上海依图信息技术有限公司 Voice labeling method and device
CN111046656A (en) * 2019-11-15 2020-04-21 北京三快在线科技有限公司 Text processing method and device, electronic equipment and readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090063145A1 (en) * 2004-03-02 2009-03-05 At&T Corp. Combining active and semi-supervised learning for spoken language understanding
CN101777347A (en) * 2009-12-07 2010-07-14 中国科学院自动化研究所 Model complementary Chinese accent identification method and system
EP2541544A1 (en) * 2011-06-30 2013-01-02 France Telecom Voice sample tagging
CN110598210A (en) * 2019-08-29 2019-12-20 深圳市优必选科技股份有限公司 Entity recognition model training method, entity recognition device, entity recognition equipment and medium
CN110610698A (en) * 2019-09-12 2019-12-24 上海依图信息技术有限公司 Voice labeling method and device
CN111046656A (en) * 2019-11-15 2020-04-21 北京三快在线科技有限公司 Text processing method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN107015964B (en) Intelligent robot development-oriented custom intention implementation method and device
CN110310619A (en) Polyphone prediction technique, device, equipment and computer readable storage medium
US11398228B2 (en) Voice recognition method, device and server
CN109828906B (en) UI (user interface) automatic testing method and device, electronic equipment and storage medium
CN113836925B (en) Training method and device for pre-training language model, electronic equipment and storage medium
CN112084752A (en) Statement marking method, device, equipment and storage medium based on natural language
CN111312230A (en) Voice interaction monitoring method and device for voice conversation platform
CN112951233A (en) Voice question and answer method and device, electronic equipment and readable storage medium
CN110377708B (en) Multi-scene conversation switching method and device
CN114817478A (en) Text-based question and answer method and device, computer equipment and storage medium
CN111354354B (en) Training method, training device and terminal equipment based on semantic recognition
CN111368504A (en) Voice data labeling method and device, electronic equipment and medium
CN112201253B (en) Text marking method, text marking device, electronic equipment and computer readable storage medium
CN114090792A (en) Document relation extraction method based on comparison learning and related equipment thereof
CN112669850A (en) Voice quality detection method and device, computer equipment and storage medium
CN110675865B (en) Method and apparatus for training hybrid language recognition models
CN113158690A (en) Testing method and device for conversation robot
CN109766089B (en) Code generation method and device based on dynamic diagram, electronic equipment and storage medium
CN111241336A (en) Audio scene recognition method and device, electronic equipment and medium
CN111125379A (en) Knowledge base expansion method and device, electronic equipment and storage medium
CN110442858A (en) A kind of question sentence entity recognition method, device, computer equipment and storage medium
CN112037782A (en) RPA and AI combined early media identification method, device, equipment and storage medium
CN115759048A (en) Script text processing method and device
CN112560463B (en) Text multi-labeling method, device, equipment and storage medium
CN114297380A (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination