WO2020119508A1 - 视频切割方法、装置、计算机设备和存储介质 - Google Patents

视频切割方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020119508A1
WO2020119508A1 PCT/CN2019/122472 CN2019122472W WO2020119508A1 WO 2020119508 A1 WO2020119508 A1 WO 2020119508A1 CN 2019122472 W CN2019122472 W CN 2019122472W WO 2020119508 A1 WO2020119508 A1 WO 2020119508A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
dot
video
behavior
result
Prior art date
Application number
PCT/CN2019/122472
Other languages
English (en)
French (fr)
Inventor
王振华
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Priority to SG11202103326QA priority Critical patent/SG11202103326QA/en
Priority to EP19896863.8A priority patent/EP3890333A4/en
Priority to KR1020217017667A priority patent/KR20210088680A/ko
Priority to JP2021532494A priority patent/JP2022510479A/ja
Publication of WO2020119508A1 publication Critical patent/WO2020119508A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/26603Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel for automatically generating descriptors from content, e.g. when it is not made available by its provider, using content analysis techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/27Server based end-user applications
    • H04N21/274Storing end-user multimedia data in response to end-user request, e.g. network recorder
    • H04N21/2743Video hosting of uploaded data from client
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present application relates to a video cutting method, device, computer equipment and storage medium.
  • a video cutting method, device, computer device, and storage medium are provided.
  • a video cutting method includes:
  • the result of the recognition of the speech and the preset rules for the trigger of the operation the result of the recognition of the operation is obtained;
  • the video stream data is cut according to the cut point identification to obtain video segment data.
  • a video cutting device includes:
  • the identification data extraction module is used to extract the video data to be identified from the video stream data, and extract the image data and audio data from the video data to be identified;
  • the dot recognition processing module is used to input the image data into the preset dot behavior recognition model to obtain the dot behavior recognition result, and input the audio data into the preset dot behavior recognition model to obtain the dot voice recognition result;
  • the dot-result acquisition module is used to obtain the dot-recognition result according to the dot-response recognition result, the dot-speech voice recognition result, and the preset dot-striking trigger rules;
  • the cutting mark adding module is used to add a cutting point mark to the video data to be recognized when the type of the dot recognition result is operation dotting;
  • the video cutting module is used to cut the video stream data according to the cutting point identification to obtain video segment data.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the one or more processors are executed The following steps:
  • the result of the recognition of the speech and the preset rules for the trigger of the operation the result of the recognition of the operation is obtained;
  • the video stream data is cut according to the cut point identification to obtain video segment data.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the computer-readable instructions When executed by one or more processors, the one or more processors perform the following steps:
  • the result of the recognition of the speech and the preset rules for the trigger of the operation the result of the recognition of the operation is obtained;
  • the video stream data is cut according to the cut point identification to obtain video segment data.
  • FIG. 1 is an application scene diagram of a video cutting method according to one or more embodiments.
  • FIG. 2 is a schematic flowchart of a video cutting method according to one or more embodiments.
  • FIG. 3 is a schematic flow chart of responding to a dot cutting command according to one or more embodiments.
  • FIG. 4 is a schematic flowchart of a video cutting method in another embodiment.
  • FIG. 5 is a structural block diagram of a video cutting device according to one or more embodiments.
  • FIG. 6 is an internal structure diagram of a computer device according to one or more embodiments.
  • the video cutting method provided by this application can be applied in the application environment shown in FIG. 1.
  • the recording device 102 communicates with the server 104 through the network through the network.
  • the recording device 102 performs video recording and sends the recorded video stream data to the server 104.
  • the server 104 extracts image data and audio data from the video data to be recognized obtained from the video stream data, and inputs the image data and audio data respectively to the corresponding
  • the management recognition result is obtained according to the obtained management behavior recognition result, management speech recognition result and preset management rule.
  • the management recognition type is operation management, Add a cut point mark to the video data to be recognized, and finally cut the video stream data according to the cut point mark to obtain video segment data.
  • the recording device 102 may be, but not limited to, various video recording cameras, or terminals with video recording functions, such as personal computers, notebook computers, smart phones, tablets, and portable wearable devices.
  • the server 104 may use an independent server or It is realized by a server cluster composed of multiple servers.
  • a video cutting method is provided. Taking the method applied to the server 104 in FIG. 1 as an example for illustration, it includes the following steps:
  • Step S201 Extract video data to be identified from video stream data, and extract image data and audio data from the video data to be identified.
  • the video data to be recognized is extracted from the video stream data.
  • the video stream data is video data that needs to be cut, and can be recorded by a recording device.
  • the video stream data may be the video data captured by the camera in real time during the dual recording process.
  • the to-be-recognized video data is video data of a preset recognition length.
  • the recognition length is set according to actual requirements, and the corresponding cutting point identifier can be added by performing dot recognition on the video data to be recognized. Dot recognition by pre-recognizing the video data to be recognized can realize real-time cutting of the recorded video data to ensure the timeliness of the video cutting and improve the efficiency of video cutting.
  • video data is composed of two parts: video and audio, and both video and audio parts can be identified.
  • image data and audio data are extracted from the video data to be recognized, so that the image data and the audio data in the video data to be recognized are separately recognized and processed, so that the video image can be recognized Whether or not there is a dot behavior in the video, or whether there is a dot voice in the video and audio, which realizes the dot behavior recognition of the image behavior and audio voice, and improves the accuracy of the dot recognition.
  • Step S203 Input the image data into the preset dot behavior recognition model to obtain the dot behavior recognition result, and input the audio data into the preset dot behavior recognition model to obtain the dot voice recognition result.
  • the image data and audio data are input into the corresponding dot behavior recognition model and dot voice recognition model for dot recognition, respectively.
  • the hitting behavior recognition model may be based on an artificial neural network algorithm, which is obtained by training the historical hitting behavior data of the business personnel of the business system in the corresponding business scenario, for example, it may be a hitting action such as an applause action, a hand raising action, a tapping action, etc. ;
  • the dotted voice recognition model can be obtained by training the historical dotted voice data of the business personnel, for example, the keyword voice can be dotted, such as "first, second, third" and other keywords.
  • image data is input into a preset dot behavior recognition model to perform dot behavior recognition, and a dot behavior recognition result is obtained;
  • audio data is input into a preset dot behavior recognition model to perform dot voice recognition To get the results of dotted voice recognition.
  • Step S205 Based on the dotted behavior recognition result, the dotted voice recognition result, and the preset dotted trigger rule, the dotted recognition result is obtained.
  • a preset triggering rule for inquiring is queried, which is set according to actual business requirements. If it can be set to take the result of the recognition of the beating behavior and the result of the recognition of the beating voice, that is, as long as the type of any one of the beating behavior recognition result and the beating voice recognition result is the operation beating, that is, when the cutting point mark needs to be added, the beating will be triggered to obtain
  • the results of RBI recognition are operations RBI; can also be taken and processed for RBI behavior recognition results and RBI voice recognition results, that is, only when the types of RBI behavior recognition results and RBI voice recognition results are both RBI operations, the RBI is triggered and the resulting RBI
  • the type of recognition result is operation management.
  • Step S207 When the type of the dot recognition result is the operation dot, add a cut point identifier to the video data to be recognized.
  • the type of the dot recognition result is judged.
  • the type of the dot recognition result is the operation dot, it indicates that the image data and/or audio data in the video data to be recognized has triggered the dot.
  • the video data to be recognized is For the video cutting position, perform dot processing on it, and specifically, a cutting point identifier can be added to the video data to be recognized. Among them, the cutting point identifier is used to identify the cutting point of the video cutting. When cutting the video stream data, the cutting point identifier can be directly searched for cutting processing.
  • the cutting point identifier may be a cutting label.
  • the key frame is determined from the video data to be recognized according to the preset label addition rule, such as the video frame to be recognized
  • the first frame of is used as a key frame, and a cutting label is added to the key frame.
  • the cutting label may include, but is not limited to, a cutting point number, a cutting time value, and the like.
  • Step S209 Cut the video stream data according to the cut point identifier to obtain video segment data.
  • image data and audio data are extracted from the video data to be recognized obtained from the video stream data, and the image data and audio data are respectively input into the corresponding preset dot recognition model and dot speech recognition model, and then According to the obtained dot-action recognition result, dot-voice recognition result, and preset dot-trigger trigger rules, the dot-recognition result is obtained.
  • the type of dot-recognition result is operation dot
  • the cut-point mark is added to the video data to be recognized, and finally the cut-point mark Cut the video stream data to get the video segment data.
  • the dot data can be identified according to the image data and audio data in the to-be-recognized video data, and the cut point identifier can be added, and no manual dot operation is required, which improves the processing efficiency of the video cut.
  • extracting the video data to be identified from the video stream data includes: acquiring the video stream data; determining the video stream identification length; and extracting the video data to be identified from the video stream data according to the video stream identification length.
  • the video stream data directly recorded by the recording device 102 it is not possible to directly perform dot recognition processing, and it needs to be split into video data to be recognized with a fixed recognition length, and the dot recognition is performed by the video data to be recognized.
  • the video stream data is first obtained, specifically, the real-time recorded video stream data may be directly received from the recording device 102, or may be stored from a preset memory Read the video stream data after recording.
  • the video stream recognition length is determined, and the video stream recognition length is set according to actual needs, for example, it can be set according to the input requirements of the dot behavior recognition model and the dot voice recognition model, or can be set according to the processing resources of the server 104 set.
  • the video data to be recognized that meets the recognition length of the video stream can be sequentially extracted from the video stream data, and then the extracted video data to be recognized can be subjected to subsequent dot recognition processing.
  • the image data is input into a preset dot-behavior recognition model to obtain a dot-behavior recognition result
  • the audio data is input into a preset dot-behavior recognition model
  • the result of the dot-behavior recognition includes: determining to be recognized
  • the video data corresponds to the identification information of the business personnel to which it belongs; querying the identification information corresponds to the preset dot behavior recognition model and dot speech recognition model; extracting image feature data from the image data, and extracting audio feature data from the audio data; and will
  • the image feature data is input into the dot behavior recognition model to obtain the dot behavior recognition result
  • the audio feature data is input into the dot speech recognition model to obtain the dot speech recognition result.
  • the RBI behavior recognition model and the RBI speech recognition model are both trained based on the historical RBI data of each business person in the business system.
  • different business systems will have different management requirements for operations, and different business personnel will also have different management habits for operations.
  • each business service window is provided with a recording device 102, which can determine the corresponding business personnel according to the source of the video data to be identified, that is, according to the recording device 102, and further query the corresponding identification information of the business personnel.
  • the identification information may be, but not limited to, employee ID, employee name, etc. that can uniquely identify business personnel.
  • the RBI behavior recognition model and the RBI speech recognition model are based on the historical RBI behavior data and historical RBI voice data of the corresponding business personnel, respectively After training, the identification of the dots is highly targeted and the recognition accuracy is high.
  • image feature data is extracted from the image data, and the image feature data is input into the RBI behavior recognition model to obtain the RBI behavior recognition result.
  • the audio feature data is extracted from the audio data, and the audio feature data is input into the dotted voice recognition model to obtain the dotted voice recognition result.
  • the method before querying the identification information corresponding to the preset dot-action recognition model and dot-speech recognition model respectively, the method further includes: acquiring historical behavior image data and historical dot-sound voice data from the business system; Image data and historical RBI voice data are classified according to business personnel to obtain historical behavior image data corresponding to each business personnel and historical RBI voice data corresponding to each business personnel; training historical image data corresponding to each business personnel to obtain RBI behavior recognition model ; And train the historical dotted voice data corresponding to each business staff to get the dotted voice recognition model.
  • the historical behavior image data can be the dot image data captured by each business personnel in the business system during the process of business review, for example, it can include applause, raising hands, hands crossed, nodding, etc.; historical dot voice data Similar to historical behavioral image data, such as keyword sentences, "Question X", "OK, thank you", etc.
  • each business person will have different personal habits, and the corresponding historical behavior image data and historical dotted voice data will perform differently. Therefore, according to the business personnel, the historical behavior image data and historical dotted voice data will be different. Classify and build the corresponding RBI behavior recognition model and RBI speech recognition model for each business personnel.
  • the historical behavior image data corresponding to each business staff is trained to obtain a dotted behavior recognition model; the historical dotted speech data corresponding to each business staff is trained to obtain a dotted speech recognition model.
  • the historical behavior image data can be divided into a training sample set and a test sample set, and the training sample set is trained through a supervised learning method to obtain a test behavior model to be tested, and then the test sample behavior model is identified through the test sample set Accuracy test, after the recognition accuracy test is passed, get the RBI behavior recognition model.
  • the training process of the dotted speech recognition model is similar to the dotted behavior recognition model.
  • the voice recognition results of the dot and the preset trigger rules for the dot include: querying the preset trigger rules for the dot, the trigger rules for the trigger include the behavior trigger rules and the voice trigger rules ; Compare the dotted behavior recognition result with the behavior trigger rule to get the behavior trigger result; compare the dotted speech recognition result with the voice trigger rule to get the voice trigger result; and according to the behavior trigger result and the voice trigger result, get the dotted recognition result.
  • the results of the dot-recognition are obtained.
  • query preset preset trigger rules are set according to actual business needs, and can be set according to the type of business and the habits of business personnel. For example, when the business data is identified in the image data When applauding, or when the key sentence of "X question" is recognized in the audio data, it is considered to trigger the RBI.
  • the dot triggering rules include behavior triggering rules and voice triggering rules, which respectively correspond to dot data recognition of image data and voice data.
  • the behavior triggering result and the voice triggering result are combined to obtain the hitting recognition result.
  • the behavior triggering result and the voice triggering result can be ORed, that is, when either type of the behavior triggering result and the voice triggering result is the operation hitting, the hitting point is obtained
  • the type of the recognition result is operation dotting, and the cutting point mark is added to the video data to be recognized.
  • step of responding to the dot-cutting instruction specifically including:
  • Step S301 When the dot-cutting instruction is received, the cutting time value of the dot-cutting instruction is determined.
  • the dot-cutting command can be sent from outside, for example, the business personnel click the relevant dot-cutting button; the cutting time value is the sending time of the dot-cutting command, reflecting the time axis position in the video stream data where the dot-setting operation needs to be performed.
  • Step S303 Determine the cutting video frame corresponding to the cutting time value in the video data to be identified.
  • the cutting video frame corresponding to the cutting time value is determined from the video data to be recognized.
  • the dot cutting command is sent externally, it indicates that the video frame corresponding to the moment in the video data to be recognized needs to be manipulated and the corresponding cut video can be determined from the time axis of the video data to be recognized according to the cutting moment value of the dot cutting instruction frame.
  • Step S305 Add a cutting point identifier to the cutting video frame.
  • a cut point identifier is added to the cut video frame.
  • the cut point identifier is used to identify the cut point of the video cut.
  • the cut point identifier can be directly searched for cutting processing.
  • Step S307 Return to cutting the video stream data according to the cutting point identifier to obtain video segment data.
  • the dot cutting instruction sent from the outside is received in real time, and the video cutting processing is performed according to the dot cutting instruction to achieve external control of the video cutting. It can effectively expand the operation diversity of video cutting and improve the efficiency of video cutting processing.
  • the method further includes: extracting audio segment data from the video segment data; querying a preset voice recognition model; inputting the audio segment data into the voice recognition model to obtain a translation of the video segment data Data; and determine the service type corresponding to the video segment data according to the translation data, and store the video segment data to a storage location corresponding to the service type.
  • the video segment data obtained by cutting the video stream data after the video segment data obtained by cutting the video stream data is obtained, it can be stored in a corresponding storage location according to the service type of each video segment data.
  • audio segment data is extracted from the video segment data, and the audio segment data includes dialog data in the video segment data, and the service type corresponding to the video segment data can be determined according to the audio segment data.
  • the voice recognition model can perform voice recognition on the input voice data to obtain corresponding translation data.
  • the audio segment data is input into the speech recognition model to obtain translation data of the video segment data.
  • the translation data may be data in text form, and the service type corresponding to the video segment data may be determined according to the translation data.
  • business keywords can be extracted from the translation data, and the corresponding business type can be matched according to the obtained business keywords.
  • After determining the service type corresponding to the video segment data store the video segment data in the storage location corresponding to the service type. For example, the preset storage location corresponding to the service type can be queried, and the video segment data is stored in the storage location, thereby achieving automatic classification storage of the video segment data.
  • a video cutting method including:
  • Step S401 Obtain video stream data
  • Step S402 Determine the identification length of the video stream
  • Step S403 Extract the video data to be identified from the video stream data according to the video stream identification length
  • Step S404 Extract image data and audio data from the video data to be recognized.
  • the server 104 receives the video stream data sent by the recording device 102, and determines the length of the video stream identification set according to actual needs, and in accordance with the video stream identification length, sequentially extracts from the video stream data to meet the video stream identification The length of the to-be-recognized video data, and then the extracted to-be-recognized video data is subjected to subsequent dot recognition processing.
  • Step S405 Determine the identity information of the business personnel to which the video data to be identified corresponds
  • Step S406 Querying the identity identification information respectively corresponding to the pre-set behavior recognition model and the speech recognition model;
  • Step S407 extract image feature data from the image data, and extract audio feature data from the audio data
  • Step S408 Input the image feature data into the dot behavior recognition model to obtain a dot behavior recognition result, and input the audio feature data into the dot speech recognition model to obtain a dot speech recognition result.
  • the source of the video data to be identified is determined according to the recording device 102, and the corresponding business personnel are determined, and further identification information corresponding to the business personnel is queried.
  • the identity information is the employee number and/or employee Name.
  • the RBI behavior recognition model and RBI speech recognition model are trained based on the historical RBI behavior data and historical RBI speech data of the corresponding business personnel, respectively. It has strong pertinence and high recognition accuracy.
  • the image feature data is extracted from the image data, and the image feature data is input into the dot behavior recognition model to obtain the dot behavior recognition result.
  • the audio feature data is extracted from the audio data, and the audio feature data is input into the dotted voice recognition model to obtain the dotted voice recognition result.
  • Step S409 According to the results of the dot-action recognition, the voice recognition results of the dot and the preset trigger rules of the dot, the results of the dot recognition are obtained.
  • the results of the dot-recognition are obtained. Specifically, it may include: querying preset preset trigger rules, which include behavior trigger rules and voice trigger rules; comparing the results of the dot hit behavior recognition with the behavior trigger rules to obtain the behavior trigger results; comparing the dot hit speech recognition results with the voice trigger rules Compare to get the voice trigger result; according to the behavior trigger result and the voice trigger result, get the dot recognition result.
  • preset preset trigger rules which include behavior trigger rules and voice trigger rules
  • comparing the dot hit speech recognition results with the voice trigger rules Compare to get the voice trigger result; according to the behavior trigger result and the voice trigger result, get the dot recognition result.
  • Step S410 When the type of the dot recognition result is the operation dot, add a cut point identifier to the video data to be recognized;
  • Step S411 Cut the video stream data according to the cut point identifier to obtain video segment data.
  • the type is judged.
  • the type of dot recognition result is operation dot, it indicates that the video data to be recognized is a cut point, and the dot processing is performed on the video data to be recognized.
  • a cut point identifier can be added to the video data to be recognized.
  • Step S412 Extract audio segment data from video segment data
  • Step S413 query a preset voice recognition model
  • Step S414 Input audio segment data into the speech recognition model to obtain translation data of the video segment data;
  • Step S415 Determine the service type corresponding to the video segment data according to the translation data, and store the video segment data in a storage location corresponding to the service type.
  • the video segment data obtained by cutting the video stream data after the video segment data obtained by cutting the video stream data is obtained, it can be stored in a corresponding storage location according to the service type of each video segment data, so as to realize automatic classification storage of the video segment data.
  • steps in the flowcharts of FIGS. 2-4 are displayed in order according to the arrows, the steps are not necessarily executed in the order indicated by the arrows. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in FIGS. 2-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times. These sub-steps or stages The execution order of is not necessarily sequential, but may be executed in turn or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • a video cutting device including: an identification data extraction module 501, a dot recognition processing module 503, a dot result acquisition module 505, a cut identification adding module 507, and a video cutting module 509, where:
  • the identification data extraction module 501 is used to extract video data to be identified from video stream data, and extract image data and audio data from the video data to be identified;
  • the dot recognition processing module 503 is used to input the image data into the preset dot behavior recognition model to obtain the dot behavior recognition result, and input the audio data into the preset dot behavior recognition model to obtain the dot voice recognition result;
  • the hitting result acquisition module 505 is used to obtain the hitting recognition result according to the hitting behavior recognition result, the hitting voice recognition result, and the preset hitting trigger rules;
  • a cutting mark adding module 507 which is used to add a cutting point mark to the video data to be recognized when the type of the dot recognition result is the operation dot;
  • the video cutting module 509 is used to cut the video stream data according to the cut point identifier to obtain video segment data.
  • the identification data extraction module 501 includes a video stream acquisition unit, an identification length determination unit, and an identification data extraction unit, wherein: the video stream acquisition unit is used to acquire video stream data; and the identification length determination unit is used to determine Video stream identification length; and identification data extraction unit for extracting video data to be identified from the video stream data according to the video stream identification length.
  • the spot recognition processing module 503 includes an identity identification determination unit, an identification model query unit, a feature data extraction unit, and a spot identification unit, wherein: the identity identification determination unit is used to determine that the video data to be identified corresponds to the business personnel to which it belongs Identification information; identification model query unit for querying identity identification information corresponding to the preset dot behavior recognition model and dot voice recognition model; feature data extraction unit for extracting image feature data from image data and audio data Audio feature data is extracted from; and a dot recognition unit is used to input image feature data into a dot behavior recognition model to obtain a dot behavior recognition result, and input audio feature data into a dot speech recognition model to obtain a dot speech recognition result.
  • the identity identification determination unit is used to determine that the video data to be identified corresponds to the business personnel to which it belongs Identification information
  • identification model query unit for querying identity identification information corresponding to the preset dot behavior recognition model and dot voice recognition model
  • feature data extraction unit for extracting image feature data from image data and audio data Audio feature data is extracted
  • it also includes a historical data acquisition module, a historical data classification module, a behavior recognition model training module, and a speech recognition model training module, where: the historical data acquisition module is used to acquire historical behavior image data and Historical dotted voice data; historical data classification module, used to classify historical behavior image data and historical dotted voice data according to business personnel, respectively, to obtain historical behavior image data corresponding to each business personnel and historical dotted voice data corresponding to each business personnel; The behavior recognition model training module is used to train the historical behavior image data corresponding to each business personnel to obtain a dotted behavior recognition model; and the voice recognition model training module is used to train the historical dotted speech data corresponding to each business personnel to obtain a dotted speech recognition model .
  • the hitting result acquisition module 505 includes a triggering rule query unit, a behavior comparison unit, a voice comparison unit, and a hitting result acquisition unit, wherein: the triggering rule query unit is used to query preset hitting trigger rules and hitting triggers Rules include behavior triggering rules and voice triggering rules; behavior comparison unit, used to compare the results of dotted behavior recognition and behavior triggering rules to obtain behavior triggering results; voice comparison unit, used to compare the results of dotted speech recognition and voice triggering rules To get the voice trigger result; and the hitting result acquisition unit, which is used to get the hitting recognition result according to the behavior triggering result and the voice triggering result.
  • the triggering rule query unit is used to query preset hitting trigger rules and hitting triggers
  • Rules include behavior triggering rules and voice triggering rules
  • behavior comparison unit used to compare the results of dotted behavior recognition and behavior triggering rules to obtain behavior triggering results
  • voice comparison unit used to compare the results of dotted speech recognition and voice triggering rules To get the voice trigger result
  • the hitting result acquisition unit which is used to get the hitting recognition result
  • it further includes a cutting instruction receiving module, a cutting frame determining module, an identification adding module and a cutting processing module, wherein: the cutting instruction receiving module is used to determine the cutting of the cutting instruction when receiving the cutting instruction Time value; cutting frame determination module, used to determine the cutting time frame corresponding to the cutting time value in the video data to be identified; identification adding module, used to add cutting point identification to the cutting video frame; and cutting processing module, used to return to The cutting point identifier cuts the video stream data to obtain video segment data.
  • the audio segment extraction module is used to extract audio segment data from the video segment data
  • voice Recognition model query module used to query the preset speech recognition model
  • translation data acquisition module used to input audio segment data into the speech recognition model to obtain translated data of video segment data
  • video segment storage module used to search for translation The data determines the service type corresponding to the video segment data, and stores the video segment data in the storage location corresponding to the service type.
  • Each module in the above video cutting device may be implemented in whole or in part by software, hardware, or a combination thereof.
  • the above modules may be embedded in the hardware or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 6.
  • the computer device includes a processor, memory, and network interface connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and computer-readable instructions.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the network interface of the computer device is used to communicate with external terminals through a network connection.
  • the computer readable instructions are executed by the processor to implement a video cutting method.
  • FIG. 6 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • the specific computer equipment may It includes more or fewer components than shown in the figure, or some components are combined, or have a different component arrangement.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the steps of the video cutting method provided in any embodiment of the present application are implemented.
  • One or more non-volatile storage media storing computer-readable instructions.
  • the one or more processors implement the video provided in any embodiment of the present application Steps of cutting method.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

一种视频切割方法,包括:从视频流数据中提取待识别视频数据,并从待识别视频数据中提取影像数据和音频数据;将影像数据输入预设的打点行为识别模型中,得到打点行为识别结果,并将音频数据输入预设的打点语音识别模型中,得到打点语音识别结果;根据打点行为识别结果、打点语音识别结果和预设的打点触发规则,得到打点识别结果;当打点识别结果的类型为操作打点时,对待识别视频数据添加切割点标识;按照切割点标识将视频流数据进行切割处理,得到视频段数据。

Description

视频切割方法、装置、计算机设备和存储介质
相关申请的交叉引用
本申请要求于2018年12月14日提交中国专利局,申请号为201811536818X,申请名称为“视频切割方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种视频切割方法、装置、计算机设备和存储介质。
背景技术
随着多媒体技术的发展,以视频形式进行信息和资源传递的电影、电视、新闻、社交、教育和游戏等得到了广泛的应用,如视频聊天、视频会议、视频监控和影视剧等,视频已经成为人们工作、学习和生活中的重要部分。
在视频应用中,有需要对视频进行切割处理的场景,如电视新闻截取、录制视频中的脱敏处理等。发明人意识到,目前对于视频的切割处理,需要人工进行手动打点标记,确定视频切割的时间轴位置,视频切割处理的效率低。
发明内容
根据本申请公开的各种实施例,提供一种视频切割方法、装置、计算机设备和存储介质。
一种视频切割方法包括:
从视频流数据中提取待识别视频数据,并从待识别视频数据中提取影像数据和音频数据;
将影像数据输入预设的打点行为识别模型中,得到打点行为识别结果,并将音频数据输入预设的打点语音识别模型中,得到打点语音识别结果;
根据打点行为识别结果、打点语音识别结果和预设的打点触发规则,得到打点识别结果;
当打点识别结果的类型为操作打点时,对待识别视频数据添加切割点标识;及
按照切割点标识将视频流数据进行切割处理,得到视频段数据。
一种视频切割装置包括:
识别数据提取模块,用于从视频流数据中提取待识别视频数据,并从待识别视频数据中提取影像数据和音频数据;
打点识别处理模块,用于将影像数据输入预设的打点行为识别模型中,得到打点行为识别结果,并将音频数据输入预设的打点语音识别模型中,得到打点语音识别结果;
打点结果获取模块,用于根据打点行为识别结果、打点语音识别结果和预设的打点触发规则,得到打点识别结果;
切割标识添加模块,用于当打点识别结果的类型为操作打点时,对待识别视频数据添加切割点标识;及
视频切割模块,用于按照切割点标识将视频流数据进行切割处理,得到视频段数据。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:
从视频流数据中提取待识别视频数据,并从待识别视频数据中提取影像数据和音频数据;
将影像数据输入预设的打点行为识别模型中,得到打点行为识别结果,并将音频数据输入预设的打点语音识别模型中,得到打点语音识别结果;
根据打点行为识别结果、打点语音识别结果和预设的打点触发规则,得到打点识别结果;
当打点识别结果的类型为操作打点时,对待识别视频数据添加切割点标识;及
按照切割点标识将视频流数据进行切割处理,得到视频段数据。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
从视频流数据中提取待识别视频数据,并从待识别视频数据中提取影像数据和音频数据;
将影像数据输入预设的打点行为识别模型中,得到打点行为识别结果,并将音频数据输入预设的打点语音识别模型中,得到打点语音识别结果;
根据打点行为识别结果、打点语音识别结果和预设的打点触发规则,得到打点识别结果;
当打点识别结果的类型为操作打点时,对待识别视频数据添加切割点标识;及
按照切割点标识将视频流数据进行切割处理,得到视频段数据。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为根据一个或多个实施例中视频切割方法的应用场景图。
图2为根据一个或多个实施例中视频切割方法的流程示意图。
图3为根据一个或多个实施例中响应打点切割指令的流程示意图。
图4为另一个实施例中视频切割方法的流程示意图。
图5为根据一个或多个实施例中视频切割装置的结构框图。
图6为根据一个或多个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的视频切割方法,可以应用于如图1所示的应用环境中。录制设备102通过网络与服务器104通过网络进行通信。录制设备102进行视频录制,并将录制的视频流数据发送至服务器104,服务器104从视频流数据中得到的待识别视频数据中提取影像数据和音频数据,并将影像数据和音频数据分别输入对应预设的打点行为识别模型和打点语音识别模型中,再根据得到的打点行为识别结果、打点语音识别结果和预设的打点触发规则获得打点识别结果,当打点识别结果的类型为操作打点时,对待识别视频数据添加切割点标识,最后按照该切割点标识将视频流数据切割,得到视频段数据。
录制设备102可以但不限于是各种视频录制摄像机,也可以为具有视频录制功能的终端,如个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在其中一个实施例中,如图2所示,提供了一种视频切割方法,以该方法应用于图1中的服务器104为例进行说明,包括以下步骤:
步骤S201:从视频流数据中提取待识别视频数据,并从待识别视频数据中提取影像数据和音频数据。
本实施例中,从视频流数据中提取待识别视频数据。其中,视频流数据为需要进行切割处理的视频数据,可以由录制设备录制得到。例如,对于金融行业的面核过程,视频流数据可以为双录过程中摄像机实时拍摄的视频数据。待识别视频数据为预设识别长度的视频数据,该识别长度根据实际需求设定,可以通过对待识别视频数据进行打点识别,从而添加相应切割点标识。通过预设识别长度的待识别视频数据进行打点识别,可以实现对录制的视频数据进行实时切割,确保视频切割的时效性,提高视频切割效率。
一般地,视频数据由影像和音频两部分组成,而影像和音频两部分均可以进行打点识别。具体地,对待识别视频数据进行打点识别时,从待识别视频数据中提取影像数据和音频数据,以实现同时对待识别视频数据中的影像数据和音频数据分别进行识别处理,从而可以识别出视频影像中是否出现打点行为,或视频音频中是否出现打点语音,实现了影像行为和音频语音的打点识别,提高了打点识别的准确性。
步骤S203:将影像数据输入预设的打点行为识别模型中,得到打点行为识别结果,并将音频数据输入预设的打点语音识别模型中,得到打点语音识别结果。
从待识别视频数据中提取得到影像数据和音频数据后,分别将影像数据和音频数据输入对应的打点行为识别模型和打点语音识别模型中进行打点识别。其中,打点行为识别模型可以为基于人工神经网络算法,通过训练业务系统的业务人员在对应业务场景下的历史打点行为数据得到,例如可以为鼓掌动作、举手动作、敲击动作等打点行为动作;打点语音识别模型则可以通过训练业务人员的历史打点语音数据得到,例如可以为关键词语音打点,如“第一、第二、第三”等关键词。
本实施例中,一方面将影像数据输入预设的打点行为识别模型中进行打点行为识别,得到打点行为识别结果;另一方面,将音频数据输入预设的打点语音识别模型中进行打点语音识别,得到打点语音识别结果。通过对影像数据和音频数据分别进行打点识别,可以扩展打点操作的多样性,避免业务流程的流畅性,同时确保对视频切割的准确性。
步骤S205:根据打点行为识别结果、打点语音识别结果和预设的打点触发规则,得到打点识别结果。
得到打点行为识别结果和打点语音识别结果后,综合二者得到打点识别结果。具体地,查询预设的打点触发规则,该打点触发规则根据实际业务需求进行设定。如可以设为对打点行为识别结果和打点语音识别结果取或,即只要打点行为识别结果和打点语音识别结果中任意一个的类型为操作打点,即需要添加切割点标识时,则触发打点,得到的打点识别结果为操作打点;也可以为对打点行为识别结果和打点语音识别结果取并处理,即只有打点行为识别结果和打点语音识别结果的类型同时为操作打点时,触发打点,得到的打点识别结果的类型为操作打点。
步骤S207:当打点识别结果的类型为操作打点时,对待识别视频数据添加切割点标识。
得到打点识别结果后,判断该打点识别结果的类型,当打点识别结果的类型为操作打点时,表明该待识别视频数据中的影像数据和/或音频数据已触发打点,该待识别视频数据为视频切割位置,对其进行打点处理,具体可以对该待识别视频数据添加切割点标识。其中,切割点标识用于标识视频切割的切割点,在对视频流数据进行切割时,可以直接查找该切割点标识进行切割处理。
在具体实现时,切割点标识可以为切割标签,在对待识别视频数据添加切割点标识时,按照预设的标签添加规则,从该待识别视频数据中确定关键帧,如将待识别视频数据中的 第一帧作为关键帧,并为该关键帧添加切割标签,切割标签可以但不限于包括切割点序号、切割时间值等。
步骤S209:按照切割点标识将视频流数据进行切割处理,得到视频段数据。
在对视频流数据进行切割处理时,查找视频流数据中的切割点标识,按照该切割点标识进行切割处理,从而将视频流数据拆分,得到各视频段数据。
上述视频切割方法中,从视频流数据中得到的待识别视频数据中提取影像数据和音频数据,并将影像数据和音频数据分别输入对应预设的打点行为识别模型和打点语音识别模型中,再根据得到的打点行为识别结果、打点语音识别结果和预设的打点触发规则获得打点识别结果,当打点识别结果的类型为操作打点时,对待识别视频数据添加切割点标识,最后按照该切割点标识将视频流数据切割,得到视频段数据。在视频切割处理过程中,可以根据待识别视频数据中的影像数据和音频数据进行打点识别并添加切割点标识,不需要人工进行打点操作,提高了视频切割的处理效率。
在一些实施例中,从视频流数据中提取待识别视频数据包括:获取视频流数据;确定视频流识别长度;及按照视频流识别长度,从视频流数据中提取待识别视频数据。
对于录制设备102直接录制的视频流数据,无法直接进行打点识别处理,需将其拆分成固定识别长度的待识别视频数据,并通过待识别视频数据进行打点识别。本实施例中,在从视频流数据中提取待识别视频数据时,一方面,先获取视频流数据,具体可以直接从录制设备102接收到实时录制的视频流数据,也可以从预设的存储器中读取已录制结束的视频流数据。另一方面,确定视频流识别长度,该视频流识别长度根据实际需求进行设定,例如可以根据打点行为识别模型和打点语音识别模型的输入需求进行设置,也可以根据服务器104的处理资源进行设定。确定视频流识别长度后,按照该视频流识别长度,从视频流数据中提取待识别视频数据。在具体应用时,可以从视频流数据中依次提取满足视频流识别长度的待识别视频数据,再将提取得到的待识别视频数据进行后续的打点识别处理。
在其中一个实施例中,将影像数据输入预设的打点行为识别模型中,得到打点行为识别结果,并将音频数据输入预设的打点语音识别模型中,得到打点语音识别结果包括:确定待识别视频数据对应所属业务人员的身份标识信息;查询身份标识信息分别对应预设的打点行为识别模型和打点语音识别模型;从影像数据中提取影像特征数据,从音频数据中提取音频特征数据;及将影像特征数据输入打点行为识别模型中,得到打点行为识别结果,将音频特征数据输入打点语音识别模型中,得到打点语音识别结果。
本实施例中,打点行为识别模型和打点语音识别模型均基于业务系统中各业务人员的历史打点数据训练得到。一般地,在业务面核双录过程中,不同业务系统会有不同的打点操作要求,而不同的业务人员也会有不同的打点操作习惯。
具体地,在将影像数据输入预设的打点行为识别模型中,得到打点行为识别结果,并将音频数据输入预设的打点语音识别模型中,得到打点语音识别结果时,先确定待识别视频数据对应所属业务人员的身份标识信息。在应用时,对于各业务服务窗口,均设置有 录制设备102,可以通过待识别视频数据的来源,即根据录制设备102来确定对应所属业务人员,并进一步查询该业务人员对应的身份标识信息。身份标识信息可以但不限于为员工编号、员工姓名等可以唯一识别业务人员的身份信息。确定身份标识信息后,查询与该身份标识信息对应预设的打点行为识别模型和打点语音识别模型,打点行为识别模型和打点语音识别模型分别基于对应业务人员的历史打点行为数据和历史打点语音数据训练得到,打点识别的针对性强,识别准确度高。
得到打点行为识别模型和打点语音识别模型后,一方面,从影像数据中提取影像特征数据,将影像特征数据输入打点行为识别模型中,得到打点行为识别结果。另一方面,从音频数据中提取音频特征数据,并将音频特征数据输入打点语音识别模型中,得到打点语音识别结果。在对影像数据和音频数据进行打点识别时,进行特征提取,过滤无用的冗余信息,得到影像特征数据和音频特征数据,并进行后续的打点识别处理,得到打点行为识别结果和打点语音识别结果。
在其中一个实施例中,在查询身份标识信息分别对应预设的打点行为识别模型和打点语音识别模型之前,还包括:从业务系统中获取历史行为影像数据和历史打点语音数据;分别将历史行为影像数据和历史打点语音数据按照业务人员进行分类,得到各业务人员对应的历史行为影像数据和各业务人员对应的历史打点语音数据;训练各业务人员对应的历史行为影像数据,得到打点行为识别模型;及训练各业务人员对应的历史打点语音数据,得到打点语音识别模型。
在训练打点行为识别模型和打点语音识别模型时,先从业务系统中获取历史行为影像数据和历史打点语音数据。其中,历史行为影像数据可以为业务系统中各业务人员在进行业务面核过程中双录拍摄到的打点影像数据,例如可以包括鼓掌、举手、双手交叉、点头等打点行为;历史打点语音数据与历史行为影像数据类似,如关键词语句,“第X个问题”、“好的,谢谢”等。在具体应用中,各业务人员会有不同的个人习惯,其对应的历史行为影像数据和历史打点语音数据中打点操作的表现也不相同,所以按照业务人员将历史行为影像数据和历史打点语音数据进行分类,为各业务人员构建对应的打点行为识别模型和打点语音识别模型。
具体地,训练各业务人员对应的历史行为影像数据,得到打点行为识别模型;训练各业务人员对应的历史打点语音数据,得到打点语音识别模型。具体实现时,可以将历史行为影像数据划分为训练样本集和测试样本集,通过有监督学习方法训练该训练样本集,得到待测试打点行为模型,再通过测试样本集对待测试打点行为模型进行识别精度测试,在识别精度测试通过后,得到打点行为识别模型。打点语音识别模型的训练过程类同于打点行为识别模型。
在其中一个实施例中,根据打点行为识别结果、打点语音识别结果和预设的打点触发规则,得到打点识别结果包括:查询预设的打点触发规则,打点触发规则包括行为触发规则和语音触发规则;将打点行为识别结果与行为触发规则进行比较,得到行为触发结果; 将打点语音识别结果与语音触发规则进行比较,得到语音触发结果;及根据行为触发结果和语音触发结果,得到打点识别结果。
得到打点行为识别结果和打点语音识别结果后,结合实际业务需求的打点触发规则,得到打点识别结果。具体地,查询预设的打点触发规则,该打点触发规则根据实际业务需求进行设定,具体可以根据业务类型和业务人员的习惯进行设定,如设定为当影像数据中识别到业务人员的鼓掌行为时,或者当音频数据中识别到“第X个问题”的关键语句时,认为触发打点。打点触发规则包括行为触发规则和语音触发规则,分别对应于影像数据的打点识别和音频数据的打点识别。
一方面,将打点行为识别结果与行为触发规则进行比较,得到行为触发结果;另一方面将打点语音识别结果与语音触发规则进行比较,得到语音触发结果。最后综合行为触发结果和语音触发结果得到打点识别结果,如可以对行为触发结果和语音触发结果取或运算,即当行为触发结果和语音触发结果中任一类型为操作打点时,即得到的打点识别结果的类型为操作打点,并对待识别视频数据进行切割点标识添加处理。
在其中一个实施例中,如图3所示,还包括响应打点切割指令的步骤,具体包括:
步骤S301:当接收到打点切割指令时,确定打点切割指令的切割时刻值。
本实施例中,除了对从视频流数据中提取待识别视频数据,对待识别视频数据进行打点识别外,还可以响应外部发送的打点切割指令,实现人工操作打点。具体地,在接收到打点切割指令时,确定该打点切割指令的切割时刻值。其中,打点切割指令可以由外部发送,如业务人员点击相关打点按钮;切割时刻值为打点切割指令的发送时间,反映视频流数据中需要进行打点操作的时间轴位置。
步骤S303:确定切割时刻值在待识别视频数据中对应的切割视频帧。
确定打点切割指令的切割时刻值后,从待识别视频数据中确定该切割时刻值对应的切割视频帧。一般地,外部发送打点切割指令时,表明待识别视频数据中该时刻对应的视频帧需要进行操作打点,根据该打点切割指令的切割时刻值可以从待识别视频数据的时间轴确定对应的切割视频帧。
步骤S305:为切割视频帧添加切割点标识。
确定切割视频帧后,为该切割视频帧添加切割点标识,切割点标识用于标识视频切割的切割点,在对视频流数据进行切割时,可以直接查找该切割点标识进行切割处理。
步骤S307:返回按照切割点标识将视频流数据进行切割处理,得到视频段数据。
添加切割点标识后,返回按照切割点标识将视频流数据进行切割处理的步骤,通过查找视频流数据中的切割点标识,再按照该切割点标识进行切割处理,从而将视频流数据拆分,得到各视频段数据。
本实施例中,在对待识别视频数据的影像数据和音频数据进行打点识别外,还实时接收外部发送的打点切割指令,并按照该打点切割指令进行视频切割处理,实现外部对视频切割的控制,能够有效扩展视频切割的操作多样性,提高视频切割处理的效率。
在一些实施例中,在得到视频段数据之后,还包括:从视频段数据中提取音频段数据;查询预设的语音识别模型;将音频段数据输入语音识别模型中,得到视频段数据的译文数据;及根据译文数据确定视频段数据对应的业务类型,并将视频段数据存储至业务类型对应的存储位置中。
本实施例中,在得到视频流数据经过切割处理的视频段数据后,可以按照各视频段数据的业务类型将其存储至对应的存储位置中。具体地,从视频段数据中提取音频段数据,音频段数据包括视频段数据中的对话数据,根据该音频段数据可以确定该视频段数据对应的业务类型。查询预设的语音识别模型,语音识别模型可以将输入的语音数据进行语音识别,得到对应的译文数据。
本实施例中,将将音频段数据输入该语音识别模型中,得到视频段数据的译文数据,译文数据可以为文本形式的数据,根据该译文数据可以确定视频段数据对应的业务类型。在具体实现时,可以从译文数据中提取业务关键字,并根据得到的业务关键字匹配对应的业务类型。确定视频段数据对应的业务类型后,将该视频段数据存储至业务类型对应的存储位置中。如可以查询该业务类型对应预设的存储位置,并将视频段数据存储至该存储位置中,从而实现了对视频段数据的自动分类存储。
在其中一个实施例中,如图4所示,提供了一种视频切割方法,包括:
步骤S401:获取视频流数据;
步骤S402:确定视频流识别长度;
步骤S403:按照视频流识别长度,从视频流数据中提取待识别视频数据;
步骤S404:从待识别视频数据中提取影像数据和音频数据。
本实施例中,服务器104接收录制设备102发送的视频流数据,并确定根据实际需求进行设定的视频流识别长度,并按照该视频流识别长度,从视频流数据中依次提取满足视频流识别长度的待识别视频数据,再将提取得到的待识别视频数据进行后续的打点识别处理。
步骤S405:确定待识别视频数据对应所属业务人员的身份标识信息;
步骤S406:查询身份标识信息分别对应预设的打点行为识别模型和打点语音识别模型;
步骤S407:从影像数据中提取影像特征数据,从音频数据中提取音频特征数据;
步骤S408:将影像特征数据输入打点行为识别模型中,得到打点行为识别结果,将音频特征数据输入打点语音识别模型中,得到打点语音识别结果。
得到影像数据和音频数据后,通过待识别视频数据的来源,即根据录制设备102来确定对应所属业务人员,并进一步查询该业务人员对应的身份标识信息,身份标识信息为员工编号和/或员工姓名。查询与该身份标识信息对应预设的打点行为识别模型和打点语音识别模型,打点行为识别模型和打点语音识别模型分别基于对应业务人员的历史打点行为数据和历史打点语音数据训练得到,打点识别的针对性强,识别准确度高。一方面,从 影像数据中提取影像特征数据,将影像特征数据输入打点行为识别模型中,得到打点行为识别结果。另一方面,从音频数据中提取音频特征数据,并将音频特征数据输入打点语音识别模型中,得到打点语音识别结果。
步骤S409:根据打点行为识别结果、打点语音识别结果和预设的打点触发规则,得到打点识别结果。
得到打点行为识别结果和打点语音识别结果后,结合实际业务需求的打点触发规则,得到打点识别结果。具体可以包括:查询预设的打点触发规则,打点触发规则包括行为触发规则和语音触发规则;将打点行为识别结果与行为触发规则进行比较,得到行为触发结果;将打点语音识别结果与语音触发规则进行比较,得到语音触发结果;根据行为触发结果和语音触发结果,得到打点识别结果。
步骤S410:打点识别结果的类型为操作打点时,对待识别视频数据添加切割点标识;
步骤S411:按照切割点标识将视频流数据进行切割处理,得到视频段数据。
得到打点识别结果后,判断其类型,当打点识别结果的类型为操作打点时,表明该待识别视频数据为切割点,对其进行打点处理,具体可以对该待识别视频数据添加切割点标识。通过查找视频流数据中的切割点标识,按照该切割点标识进行切割处理,从而将视频流数据拆分,得到各视频段数据。
步骤S412:从视频段数据中提取音频段数据;
步骤S413:查询预设的语音识别模型;
步骤S414:将音频段数据输入语音识别模型中,得到视频段数据的译文数据;
步骤S415:根据译文数据确定视频段数据对应的业务类型,并将视频段数据存储至业务类型对应的存储位置中。
本实施例中,在得到视频流数据经过切割处理的视频段数据后,可以按照各视频段数据的业务类型将其存储至对应的存储位置中,从而实现了对视频段数据的自动分类存储。
应该理解的是,虽然图2-4的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-4中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在其中一个实施例中,如图5所示,提供了一种视频切割装置,包括:识别数据提取模块501、打点识别处理模块503、打点结果获取模块505、切割标识添加模块507和视频切割模块509,其中:
识别数据提取模块501,用于从视频流数据中提取待识别视频数据,并从待识别视频 数据中提取影像数据和音频数据;
打点识别处理模块503,用于将影像数据输入预设的打点行为识别模型中,得到打点行为识别结果,并将音频数据输入预设的打点语音识别模型中,得到打点语音识别结果;
打点结果获取模块505,用于根据打点行为识别结果、打点语音识别结果和预设的打点触发规则,得到打点识别结果;
切割标识添加模块507,用于当打点识别结果的类型为操作打点时,对待识别视频数据添加切割点标识;及
视频切割模块509,用于按照切割点标识将视频流数据进行切割处理,得到视频段数据。
在其中一个实施例中,识别数据提取模块501包括视频流获取单元、识别长度确定单元和识别数据提取单元,其中:视频流获取单元,用于获取视频流数据;识别长度确定单元,用于确定视频流识别长度;及识别数据提取单元,用于按照视频流识别长度,从视频流数据中提取待识别视频数据。
在其中一个实施例中,打点识别处理模块503包括身份标识确定单元、识别模型查询单元、特征数据提取单元和打点识别单元,其中:身份标识确定单元,用于确定待识别视频数据对应所属业务人员的身份标识信息;识别模型查询单元,用于查询身份标识信息分别对应预设的打点行为识别模型和打点语音识别模型;特征数据提取单元,用于从影像数据中提取影像特征数据,从音频数据中提取音频特征数据;及打点识别单元,用于将影像特征数据输入打点行为识别模型中,得到打点行为识别结果,将音频特征数据输入打点语音识别模型中,得到打点语音识别结果。
在其中一个实施例中,还包括历史数据获取模块、历史数据分类模块、行为识别模型训练模块和语音识别模型训练模块,其中:历史数据获取模块,用于从业务系统中获取历史行为影像数据和历史打点语音数据;历史数据分类模块,用于分别将历史行为影像数据和历史打点语音数据按照业务人员进行分类,得到各业务人员对应的历史行为影像数据和各业务人员对应的历史打点语音数据;行为识别模型训练模块,用于训练各业务人员对应的历史行为影像数据,得到打点行为识别模型;及语音识别模型训练模块,用于训练各业务人员对应的历史打点语音数据,得到打点语音识别模型。
在其中一个实施例中,打点结果获取模块505包括触发规则查询单元、行为比较单元、语音比较单元和打点结果获取单元,其中:触发规则查询单元,用于查询预设的打点触发规则,打点触发规则包括行为触发规则和语音触发规则;行为比较单元,用于将打点行为识别结果与行为触发规则进行比较,得到行为触发结果;语音比较单元,用于将打点语音识别结果与语音触发规则进行比较,得到语音触发结果;及打点结果获取单元,用于根据行为触发结果和语音触发结果,得到打点识别结果。
在其中一个实施例中,还包括切割指令接收模块、切割帧确定模块、标识添加模块和切割处理模块,其中:切割指令接收模块,用于当接收到打点切割指令时,确定打点切 割指令的切割时刻值;切割帧确定模块,用于确定切割时刻值在待识别视频数据中对应的切割视频帧;标识添加模块,用于为切割视频帧添加切割点标识;及切割处理模块,用于返回按照切割点标识将视频流数据进行切割处理,得到视频段数据。
在其中一个实施例中,还包括音频段提取模块、语音识别模型查询模块、译文数据获取模块和视频段存储模块,其中:音频段提取模块,用于从视频段数据中提取音频段数据;语音识别模型查询模块,用于查询预设的语音识别模型;译文数据获取模块,用于将音频段数据输入语音识别模型中,得到视频段数据的译文数据;及视频段存储模块,用于根据译文数据确定视频段数据对应的业务类型,并将视频段数据存储至业务类型对应的存储位置中。
关于视频切割装置的具体限定可以参见上文中对于视频切割方法的限定,在此不再赘述。上述视频切割装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在其中一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图6所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机可读指令。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种视频切割方法。
本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
一种计算机设备,包括存储器和一个或多个处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时实现本申请任意一个实施例中提供的视频切割方法的步骤。
一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现本申请任意一个实施例中提供的视频切割方法的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM (EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种视频切割方法,包括:
    从视频流数据中提取待识别视频数据,并从所述待识别视频数据中提取影像数据和音频数据;
    将所述影像数据输入预设的打点行为识别模型中,得到打点行为识别结果,并将所述音频数据输入预设的打点语音识别模型中,得到打点语音识别结果;
    根据所述打点行为识别结果、所述打点语音识别结果和预设的打点触发规则,得到打点识别结果;
    当所述打点识别结果的类型为操作打点时,对所述待识别视频数据添加切割点标识;及
    按照所述切割点标识将所述视频流数据进行切割处理,得到视频段数据。
  2. 根据权利要求1所述的方法,其特征在于,所述从视频流数据中提取待识别视频数据,包括:
    获取视频流数据;
    确定视频流识别长度;及
    按照所述视频流识别长度,从所述视频流数据中提取待识别视频数据。
  3. 根据权利要求1所述的方法,其特征在于,所述将所述影像数据输入预设的打点行为识别模型中,得到打点行为识别结果,并将所述音频数据输入预设的打点语音识别模型中,得到打点语音识别结果,包括:
    确定所述待识别视频数据对应所属业务人员的身份标识信息;
    查询所述身份标识信息分别对应预设的打点行为识别模型和打点语音识别模型;
    从所述影像数据中提取影像特征数据,从所述音频数据中提取音频特征数据;及
    将所述影像特征数据输入所述打点行为识别模型中,得到打点行为识别结果,将所述音频特征数据输入所述打点语音识别模型中,得到打点语音识 别结果。
  4. 根据权利要求3所述的方法,其特征在于,在所述查询所述身份标识信息分别对应预设的打点行为识别模型和打点语音识别模型之前,所述方法还包括:
    从业务系统中获取历史行为影像数据和历史打点语音数据;
    分别将所述历史行为影像数据和所述历史打点语音数据按照业务人员进行分类,得到各业务人员对应的历史行为影像数据和各业务人员对应的历史打点语音数据;
    训练所述各业务人员对应的历史行为影像数据,得到所述打点行为识别模型;及
    训练所述各业务人员对应的历史打点语音数据,得到所述打点语音识别模型。
  5. 根据权利要求1所述的方法,其特征在于,所述根据所述打点行为识别结果、所述打点语音识别结果和预设的打点触发规则,得到打点识别结果,包括:
    查询预设的打点触发规则,所述打点触发规则包括行为触发规则和语音触发规则;
    将所述打点行为识别结果与所述行为触发规则进行比较,得到行为触发结果;
    将所述打点语音识别结果与所述语音触发规则进行比较,得到语音触发结果;及
    根据所述行为触发结果和所述语音触发结果,得到打点识别结果。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述行为触发结果和所述语音触发结果,得到打点识别结果,包括:
    对所述行为触发结果和所述语音触发结果进行取或运算,得到打点识别结果。
  7. 根据权利要求1所述的方法,其特征在于,所述当所述打点识别结果 的类型为操作打点时,对所述待识别视频数据添加切割点标识,包括:
    确定所述打点识别结果的类型;
    当所述打点识别结果的类型为操作打点时,查询预设的标签添加规则;
    根据所述标签添加规则,从所述待识别视频数据中确定关键帧,并为所述关键帧添加切割标签,所述切割点标识包括所述切割标签。
  8. 根据权利要求1至7任意一项所述的方法,其特征在于,还包括:
    当接收到打点切割指令时,确定所述打点切割指令的切割时刻值;
    确定所述切割时刻值在所述待识别视频数据中对应的切割视频帧;
    为所述切割视频帧添加切割点标识;及
    返回所述按照所述切割点标识将所述视频流数据进行切割处理,得到视频段数据。
  9. 根据权利要求8所述的方法,其特征在于,在所述得到视频段数据之后,所述方法还包括:
    从所述视频段数据中提取音频段数据;
    查询预设的语音识别模型;
    将所述音频段数据输入所述语音识别模型中,得到所述视频段数据的译文数据;及
    根据所述译文数据确定所述视频段数据对应的业务类型,并将所述视频段数据存储至所述业务类型对应的存储位置中。
  10. 根据权利要求9所述的方法,其特征在于,所述根据所述译文数据确定所述视频段数据对应的业务类型,并将所述视频段数据存储至所述业务类型对应的存储位置中,包括:
    从所述译文数据中提取业务关键字;
    根据所述业务关键字确定所述视频段数据对应的业务类型;
    查询所述业务类型对应预设的存储位置;
    将所述视频段数据存储至所述存储位置中。
  11. 一种视频切割装置,包括:
    识别数据提取模块,用于从视频流数据中提取待识别视频数据,并从所述待识别视频数据中提取影像数据和音频数据;
    打点识别处理模块,用于将所述影像数据输入预设的打点行为识别模型中,得到打点行为识别结果,并将所述音频数据输入预设的打点语音识别模型中,得到打点语音识别结果;
    打点结果获取模块,用于根据所述打点行为识别结果、所述打点语音识别结果和预设的打点触发规则,得到打点识别结果;
    切割标识添加模块,用于当所述打点识别结果的类型为操作打点时,对所述待识别视频数据添加切割点标识;及
    视频切割模块,用于按照所述切割点标识将所述视频流数据进行切割处理,得到视频段数据。
  12. 根据权利要求11所述的装置,其特征在于,所述识别数据提取模块,包括:
    视频流获取单元,用于获取视频流数据;
    识别长度确定单元,用于确定视频流识别长度;及
    识别数据提取单元,用于按照所述视频流识别长度,从所述视频流数据中提取待识别视频数据。
  13. 根据权利要求11所述的装置,其特征在于,所述打点识别处理模块,包括:
    身份标识确定单元,用于确定所述待识别视频数据对应所属业务人员的身份标识信息;
    识别模型查询单元,用于查询所述身份标识信息分别对应预设的打点行为识别模型和打点语音识别模型;
    特征数据提取单元,用于从所述影像数据中提取影像特征数据,从所述音频数据中提取音频特征数据;及
    打点识别单元,用于将所述影像特征数据输入所述打点行为识别模型中,得到打点行为识别结果,将所述音频特征数据输入所述打点语音识别模型中, 得到打点语音识别结果。
  14. 根据权利要求13所述的装置,其特征在于,所述装置还包括:
    历史数据获取模块,用于从业务系统中获取历史行为影像数据和历史打点语音数据;
    历史数据分类模块,用于分别将所述历史行为影像数据和所述历史打点语音数据按照业务人员进行分类,得到各业务人员对应的历史行为影像数据和各业务人员对应的历史打点语音数据;
    行为识别模型训练模块,用于训练所述各业务人员对应的历史行为影像数据,得到所述打点行为识别模型;及
    语音识别模型训练模块,用于训练所述各业务人员对应的历史打点语音数据,得到所述打点语音识别模型。
  15. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    从视频流数据中提取待识别视频数据,并从所述待识别视频数据中提取影像数据和音频数据;
    将所述影像数据输入预设的打点行为识别模型中,得到打点行为识别结果,并将所述音频数据输入预设的打点语音识别模型中,得到打点语音识别结果;
    根据所述打点行为识别结果、所述打点语音识别结果和预设的打点触发规则,得到打点识别结果;
    当所述打点识别结果的类型为操作打点时,对所述待识别视频数据添加切割点标识;及
    按照所述切割点标识将所述视频流数据进行切割处理,得到视频段数据。
  16. 根据权利要求15所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    获取视频流数据;
    确定视频流识别长度;及
    按照所述视频流识别长度,从所述视频流数据中提取待识别视频数据。
  17. 根据权利要求15所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    确定所述待识别视频数据对应所属业务人员的身份标识信息;
    查询所述身份标识信息分别对应预设的打点行为识别模型和打点语音识别模型;
    从所述影像数据中提取影像特征数据,从所述音频数据中提取音频特征数据;及
    将所述影像特征数据输入所述打点行为识别模型中,得到打点行为识别结果,将所述音频特征数据输入所述打点语音识别模型中,得到打点语音识别结果。
  18. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    从视频流数据中提取待识别视频数据,并从所述待识别视频数据中提取影像数据和音频数据;
    将所述影像数据输入预设的打点行为识别模型中,得到打点行为识别结果,并将所述音频数据输入预设的打点语音识别模型中,得到打点语音识别结果;
    根据所述打点行为识别结果、所述打点语音识别结果和预设的打点触发规则,得到打点识别结果;
    当所述打点识别结果的类型为操作打点时,对所述待识别视频数据添加切割点标识;及
    按照所述切割点标识将所述视频流数据进行切割处理,得到视频段数据。
  19. 根据权利要求18所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    获取视频流数据;
    确定视频流识别长度;及
    按照所述视频流识别长度,从所述视频流数据中提取待识别视频数据。
  20. 根据权利要求18所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    确定所述待识别视频数据对应所属业务人员的身份标识信息;
    查询所述身份标识信息分别对应预设的打点行为识别模型和打点语音识别模型;
    从所述影像数据中提取影像特征数据,从所述音频数据中提取音频特征数据;及
    将所述影像特征数据输入所述打点行为识别模型中,得到打点行为识别结果,将所述音频特征数据输入所述打点语音识别模型中,得到打点语音识别结果。
PCT/CN2019/122472 2018-12-14 2019-12-02 视频切割方法、装置、计算机设备和存储介质 WO2020119508A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
SG11202103326QA SG11202103326QA (en) 2018-12-14 2019-12-02 Video cutting method and apparatus, computer device and storage medium
EP19896863.8A EP3890333A4 (en) 2018-12-14 2019-12-02 VIDEO CUTTING METHOD AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIA
KR1020217017667A KR20210088680A (ko) 2018-12-14 2019-12-02 비디오 커팅 방법, 장치, 컴퓨터 기기 및 저장매체
JP2021532494A JP2022510479A (ja) 2018-12-14 2019-12-02 ビデオカット方法、ビデオカット装置、コンピュータ機器及び記憶媒体

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811536818.X 2018-12-14
CN201811536818.XA CN109743624B (zh) 2018-12-14 2018-12-14 视频切割方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2020119508A1 true WO2020119508A1 (zh) 2020-06-18

Family

ID=66360325

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/122472 WO2020119508A1 (zh) 2018-12-14 2019-12-02 视频切割方法、装置、计算机设备和存储介质

Country Status (6)

Country Link
EP (1) EP3890333A4 (zh)
JP (1) JP2022510479A (zh)
KR (1) KR20210088680A (zh)
CN (1) CN109743624B (zh)
SG (1) SG11202103326QA (zh)
WO (1) WO2020119508A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380922A (zh) * 2020-10-23 2021-02-19 岭东核电有限公司 复盘视频帧确定方法、装置、计算机设备和存储介质
CN112487238A (zh) * 2020-10-27 2021-03-12 百果园技术(新加坡)有限公司 一种音频处理方法、装置、终端及介质
CN113096687A (zh) * 2021-03-30 2021-07-09 中国建设银行股份有限公司 音视频处理方法、装置、计算机设备及存储介质
CN113207033A (zh) * 2021-04-29 2021-08-03 读书郎教育科技有限公司 一种智慧课堂录制视频无效片段处理的系统及方法
CN113810766A (zh) * 2021-11-17 2021-12-17 深圳市速点网络科技有限公司 一种视频剪辑组合处理方法及系统
CN114374885A (zh) * 2021-12-31 2022-04-19 北京百度网讯科技有限公司 视频关键片段确定方法、装置、电子设备及可读存储介质
WO2023197979A1 (zh) * 2022-04-13 2023-10-19 腾讯科技(深圳)有限公司 一种数据处理方法、装置、计算机设备及存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109151615B (zh) * 2018-11-02 2022-01-25 湖南双菱电子科技有限公司 视频处理方法、计算机设备和计算机存储介质
CN109743624B (zh) * 2018-12-14 2021-08-17 深圳壹账通智能科技有限公司 视频切割方法、装置、计算机设备和存储介质
CN110446061B (zh) * 2019-07-04 2023-04-07 深圳壹账通智能科技有限公司 视频数据获取方法、装置、计算机设备及存储介质
CN114022828A (zh) * 2022-01-05 2022-02-08 北京金茂教育科技有限公司 视频流处理方法及装置
CN115866290A (zh) * 2022-05-31 2023-03-28 北京中关村科金技术有限公司 视频打点方法、装置、设备及存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060059120A1 (en) * 2004-08-27 2006-03-16 Ziyou Xiong Identifying video highlights using audio-visual objects
CN101616264A (zh) * 2008-06-27 2009-12-30 中国科学院自动化研究所 新闻视频编目方法及系统
CN104519401A (zh) * 2013-09-30 2015-04-15 华为技术有限公司 视频分割点获得方法及设备
CN104780388A (zh) * 2015-03-31 2015-07-15 北京奇艺世纪科技有限公司 一种视频数据的切分方法和装置
CN106658169A (zh) * 2016-12-18 2017-05-10 北京工业大学 一种基于深度学习多层次分割新闻视频的通用方法
CN107623860A (zh) * 2017-08-09 2018-01-23 北京奇艺世纪科技有限公司 多媒体数据分割方法和装置
CN108235141A (zh) * 2018-03-01 2018-06-29 北京网博视界科技股份有限公司 直播视频转碎片化点播的方法、装置、服务器和存储介质
CN109743624A (zh) * 2018-12-14 2019-05-10 深圳壹账通智能科技有限公司 视频切割方法、装置、计算机设备和存储介质
CN109831677A (zh) * 2018-12-14 2019-05-31 平安科技(深圳)有限公司 视频脱敏方法、装置、计算机设备和存储介质

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6999620B1 (en) * 2001-12-10 2006-02-14 Hewlett-Packard Development Company, L.P. Segmenting video input using high-level feedback
JP4228673B2 (ja) * 2002-12-04 2009-02-25 富士ゼロックス株式会社 映像処理装置、映像処理方法及びプログラム
US20080066107A1 (en) * 2006-09-12 2008-03-13 Google Inc. Using Viewing Signals in Targeted Video Advertising
JP2009272816A (ja) * 2008-05-02 2009-11-19 Visionere Corp サーバ、情報処理システム及び情報処理方法
JP5845801B2 (ja) * 2011-10-18 2016-01-20 ソニー株式会社 画像処理装置、画像処理方法、及び、プログラム
US20140328570A1 (en) * 2013-01-09 2014-11-06 Sri International Identifying, describing, and sharing salient events in images and videos
BR112016006860B8 (pt) * 2013-09-13 2023-01-10 Arris Entpr Inc Aparelho e método para criar um único fluxo de dados de informações combinadas para renderização em um dispositivo de computação do cliente
CN105931635B (zh) * 2016-03-31 2019-09-17 北京奇艺世纪科技有限公司 一种音频分割方法及装置
US9830516B1 (en) * 2016-07-07 2017-11-28 Videoken, Inc. Joint temporal segmentation and classification of user activities in egocentric videos
CN106782507B (zh) * 2016-12-19 2018-03-06 平安科技(深圳)有限公司 语音分割的方法及装置
CN107358945A (zh) * 2017-07-26 2017-11-17 谢兵 一种基于机器学习的多人对话音频识别方法及系统
CN108132995A (zh) * 2017-12-20 2018-06-08 北京百度网讯科技有限公司 用于处理音频信息的方法和装置

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060059120A1 (en) * 2004-08-27 2006-03-16 Ziyou Xiong Identifying video highlights using audio-visual objects
CN101616264A (zh) * 2008-06-27 2009-12-30 中国科学院自动化研究所 新闻视频编目方法及系统
CN104519401A (zh) * 2013-09-30 2015-04-15 华为技术有限公司 视频分割点获得方法及设备
CN104780388A (zh) * 2015-03-31 2015-07-15 北京奇艺世纪科技有限公司 一种视频数据的切分方法和装置
CN106658169A (zh) * 2016-12-18 2017-05-10 北京工业大学 一种基于深度学习多层次分割新闻视频的通用方法
CN107623860A (zh) * 2017-08-09 2018-01-23 北京奇艺世纪科技有限公司 多媒体数据分割方法和装置
CN108235141A (zh) * 2018-03-01 2018-06-29 北京网博视界科技股份有限公司 直播视频转碎片化点播的方法、装置、服务器和存储介质
CN109743624A (zh) * 2018-12-14 2019-05-10 深圳壹账通智能科技有限公司 视频切割方法、装置、计算机设备和存储介质
CN109831677A (zh) * 2018-12-14 2019-05-31 平安科技(深圳)有限公司 视频脱敏方法、装置、计算机设备和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3890333A4 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380922A (zh) * 2020-10-23 2021-02-19 岭东核电有限公司 复盘视频帧确定方法、装置、计算机设备和存储介质
CN112380922B (zh) * 2020-10-23 2024-03-22 岭东核电有限公司 复盘视频帧确定方法、装置、计算机设备和存储介质
CN112487238A (zh) * 2020-10-27 2021-03-12 百果园技术(新加坡)有限公司 一种音频处理方法、装置、终端及介质
CN112487238B (zh) * 2020-10-27 2024-05-17 百果园技术(新加坡)有限公司 一种音频处理方法、装置、终端及介质
CN113096687A (zh) * 2021-03-30 2021-07-09 中国建设银行股份有限公司 音视频处理方法、装置、计算机设备及存储介质
CN113096687B (zh) * 2021-03-30 2024-04-26 中国建设银行股份有限公司 音视频处理方法、装置、计算机设备及存储介质
CN113207033A (zh) * 2021-04-29 2021-08-03 读书郎教育科技有限公司 一种智慧课堂录制视频无效片段处理的系统及方法
CN113810766A (zh) * 2021-11-17 2021-12-17 深圳市速点网络科技有限公司 一种视频剪辑组合处理方法及系统
CN113810766B (zh) * 2021-11-17 2022-02-08 深圳市速点网络科技有限公司 一种视频剪辑组合处理方法及系统
CN114374885A (zh) * 2021-12-31 2022-04-19 北京百度网讯科技有限公司 视频关键片段确定方法、装置、电子设备及可读存储介质
WO2023197979A1 (zh) * 2022-04-13 2023-10-19 腾讯科技(深圳)有限公司 一种数据处理方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
EP3890333A1 (en) 2021-10-06
EP3890333A4 (en) 2022-05-25
CN109743624A (zh) 2019-05-10
KR20210088680A (ko) 2021-07-14
SG11202103326QA (en) 2021-05-28
JP2022510479A (ja) 2022-01-26
CN109743624B (zh) 2021-08-17

Similar Documents

Publication Publication Date Title
WO2020119508A1 (zh) 视频切割方法、装置、计算机设备和存储介质
WO2020140665A1 (zh) 双录视频质量检测方法、装置、计算机设备和存储介质
Zubiaga et al. Learning reporting dynamics during breaking news for rumour detection in social media
CN110444198B (zh) 检索方法、装置、计算机设备和存储介质
US10997258B2 (en) Bot networks
WO2021042503A1 (zh) 信息分类抽取方法、装置、计算机设备和存储介质
CN108595695B (zh) 数据处理方法、装置、计算机设备和存储介质
CN111444723B (zh) 信息抽取方法、计算机设备和存储介质
CN108427707B (zh) 人机问答方法、装置、计算机设备和存储介质
WO2020253350A1 (zh) 网络内容发布的审核方法、装置、计算机设备及存储介质
WO2020147395A1 (zh) 基于情感的文本分类处理方法、装置和计算机设备
WO2018006727A1 (zh) 机器人客服转人工客服的方法和装置
WO2020077896A1 (zh) 提问数据生成方法、装置、计算机设备和存储介质
CN109831677B (zh) 视频脱敏方法、装置、计算机设备和存储介质
WO2021114612A1 (zh) 目标重识别方法、装置、计算机设备和存储介质
US20230206928A1 (en) Audio processing method and apparatus
WO2019137391A1 (zh) 对视频进行分类匹配的方法、装置和挑选引擎
US20230032728A1 (en) Method and apparatus for recognizing multimedia content
WO2022057309A1 (zh) 肺部特征识别方法、装置、计算机设备及存储介质
WO2021063089A1 (zh) 规则匹配方法、规则匹配装置、存储介质及电子设备
CN113343108B (zh) 推荐信息处理方法、装置、设备及存储介质
US20190012610A1 (en) Self-feeding deep learning method and system
CN110633475A (zh) 基于计算机场景的自然语言理解方法、装置、系统和存储介质
Kodali et al. Attendance management system
CN114493902A (zh) 多模态信息异常监控方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19896863

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021532494

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217017667

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019896863

Country of ref document: EP

Effective date: 20210630