WO2021017277A1 - 一种图片截取方法、装置及计算机存储介质 - Google Patents

一种图片截取方法、装置及计算机存储介质 Download PDF

Info

Publication number
WO2021017277A1
WO2021017277A1 PCT/CN2019/117170 CN2019117170W WO2021017277A1 WO 2021017277 A1 WO2021017277 A1 WO 2021017277A1 CN 2019117170 W CN2019117170 W CN 2019117170W WO 2021017277 A1 WO2021017277 A1 WO 2021017277A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
picture
pictures
electronic device
subtitles
Prior art date
Application number
PCT/CN2019/117170
Other languages
English (en)
French (fr)
Inventor
王涛
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021017277A1 publication Critical patent/WO2021017277A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • This application relates to the field of computer technology, and in particular to a method, device and computer storage medium for capturing images.
  • the embodiments of the present application provide a method, device, and computer storage medium for capturing images, which can improve the efficiency of random inspection of business content involved in business activities.
  • An embodiment of the application provides a method for capturing a picture, the method including:
  • the electronic device performs video recording on the business activity process to obtain the first video, and performs audio recording on the business activity process to obtain the first audio, and the business activity process includes a salesperson negotiating business with a customer;
  • the electronic device intercepts pictures in the second video where the preset keyword group appears.
  • An embodiment of the present application also provides a picture capture device, including:
  • the recording unit is configured to perform video recording on the business activity process to obtain the first video, and perform audio recording on the business activity process to obtain the first audio, and the business activity process includes business clerk negotiating with customers;
  • a first adding unit configured to add subtitles to the first video according to the first audio to obtain a second video containing subtitles
  • the interception unit is configured to intercept pictures in the second video where the preset keyword group appears in the process of playing the second video.
  • the embodiment of the present application also provides an electronic device, which includes a processor, an input device, an output device, and a memory, and the processor, the input device, the output device, and the memory are connected to each other.
  • the communication interface is used to communicate with other electronic devices (for example, electronic devices)
  • the memory is used to store the implementation code of the above-mentioned picture interception method
  • the processor is used to execute the program code stored in the memory, that is, the above-mentioned image interception method is executed.
  • the embodiment of the present application also provides a computer non-volatile readable storage medium, which stores instructions on the non-volatile readable storage medium, and when the non-volatile readable storage medium runs on a processor, the processor executes the above-mentioned image capturing method.
  • the embodiment of the present application also provides a computer program product containing instructions, which when running on a processor, causes the processor to execute the above-mentioned image capturing method.
  • the electronic device can record and record the process of business activities, add subtitles to the recorded video according to the recorded audio, and extract pictures containing preset keyword groups from the subtitled video, and the extracted Pictures can be used by users (such as supervisors or business personnel) to view the business content involved in business activities.
  • Electronic equipment automatically extracts pictures containing preset keyword groups from subtitled videos to efficiently complete random inspections of business processes.
  • the electronic device can automatically further splice the multiple extracted pictures, and finally generate a picture, which can facilitate the user to directly use the spliced picture for random inspection of business content, without the user manually using the picture splicing APP to splice multiple pictures , It saves user operation time, reduces operation complexity, and improves the efficiency of random inspection of business content in business activities.
  • the electronic device can use the first picture among multiple pictures (ie the picture with the earliest playback time) as the basis for picture splicing.
  • the first picture is kept intact, and the remaining pictures only intercept the text part, and the text part is spliced in the above order Go to the bottom of the first picture, and finally generate a picture, which can facilitate the user to directly use the spliced picture to conduct random inspections of business content. It is not necessary for the user to manually use the picture splicing APP to splice multiple pictures, which saves user operation time and reduces The complexity of operation is improved, and the efficiency of random inspection of business content in business activities is improved.
  • the electronic device can also convert the audio recorded at the same time into text and add it to the video to obtain a video with subtitles, which facilitates later interception of pictures containing preset keyword groups in the video with subtitles, thereby improving the efficiency of business sampling.
  • FIG. 1 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the application.
  • FIG. 2 is a schematic flowchart of a method for capturing a picture according to an embodiment of the application
  • FIG. 3 is a schematic structural diagram of a picture interception device provided by an embodiment of the application.
  • the electronic devices involved in the embodiments of this application may include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various forms of user equipment (User Equipment, UE), mobile station (Mobile Station, MS), terminal device (terminal device), etc.
  • UE User Equipment
  • MS mobile Station
  • terminal device terminal device
  • it can be a mobile terminal such as a smart phone, a tablet computer, or other terminals, and there is no limitation here.
  • the devices mentioned above are collectively referred to as electronic devices.
  • the embodiments of the present application are described below in conjunction with the drawings.
  • FIG. 1 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 100 includes: at least one processor 101, at least one input device 102, and at least one output device 103, a memory 104, and at least one bus 105.
  • the bus 105 is used to implement connection and communication between these components.
  • the processor 101 may be a central processing unit (Central Processing Unit, CPU) or a graphics processing unit (Graphics Processing Unit, GPU). In some embodiments, it may also be referred to as an application processor (application processor). , AP) to distinguish it from the baseband processor.
  • the processor 101 can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (application software license Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array) , FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the input device 102 may include a touch panel, a fingerprint sensor (used to collect user fingerprint information and fingerprint orientation information), a camera, a microphone, etc.
  • the output device 103 may include a display (LCD, etc.), a speaker, etc.
  • the memory 104 may include a read-only memory and a random access memory, and provides instructions and data to the processor 101.
  • the processor 101 can be used to read and execute computer readable instructions. Specifically, the processor 101 may be used to call data stored in the memory 104.
  • a part of the memory 104 may also include a non-volatile random access memory.
  • the processor 101, the input device 102, and the output device 103 described in the embodiment of the present application can execute part or all of the processes involved in the image capture method shown in FIG. 2 below.
  • the electronic device 100 may further include a communication interface.
  • the communication interface may be a transceiver, a transceiver circuit, etc., where the communication interface is a general term and may include one or more interfaces, such as an interface between an electronic device and a server.
  • the communication interface may include a wired interface and a wireless interface, such as a standard interface, Ethernet, and a multi-machine synchronization interface.
  • the processor 101 when the processor 101 receives any message or data, it specifically receives it by driving or controlling the communication interface. Therefore, the processor 101 can be regarded as a control center that performs sending or receiving, and the communication interface is a specific performer of sending and receiving operations.
  • the electronic device 100 may be a terminal, server, computer, video recording device, video playback device, etc., capable of computing or processing.
  • FIG. 2 provides a method for capturing a picture related to an embodiment of the present application.
  • the method for capturing a picture includes but is not limited to the following steps S201-S202.
  • the electronic device performs video recording on the process of business activities to obtain the first video, and performs audio recording on the process of business activities to obtain the first audio, the process of business activities includes business clerk negotiating with customers;
  • S202 The electronic device adds subtitles to the first video according to the first audio to obtain a second video containing subtitles;
  • S203 During the process of playing the second video by the electronic device, the electronic device intercepts pictures in the second video where the preset keyword group appears.
  • business activities include the process of business clerk negotiating with customers, such as the process of selling products, where the products may be insurance products, electronic products, etc., for example.
  • electronic equipment can perform audio and video dual recording.
  • Double recording is recording and video recording, which can leave traces for the customer's business process, especially the risk disclosure process, to standardize the company's sales behavior, and also provide a basis for disputes afterwards.
  • double recording can restrict sales staff's behavior, standardize business processing procedures, and prevent sales staff from weakening or concealing risks and exaggerating product returns, which is conducive to the internal management of operating organizations.
  • Shuanglu can learn about product information, risk levels and their own rights and responsibilities in detail to protect their own legal rights.
  • This application is based on double-recorded audio and video, which can further efficiently and quickly conduct random inspections on whether the salesperson informs customers of key business information in business activities, such as product risks, product benefits, etc.
  • the electronic device when a salesperson starts to sell a certain product to a customer, he can activate the electronic device to perform audio and video dual recording.
  • the electronic device contains a button, and the salesperson clicks the button to trigger the dual recording.
  • the electronic device can be triggered to stop recording.
  • the electronic device contains a button, the salesperson clicks the button to trigger the end of recording.
  • the dual recording process can produce audio files and video files, and the time of the audio files and video files are aligned.
  • the electronic device can perform the above steps S202 and S203.
  • the electronic device can automatically perform the above steps S202 and S203 after the dual recording is over, or the user can trigger the electronic device to perform the above steps S202 and S203.
  • This application does not limited.
  • the electronic device adds subtitles to the first video according to the first audio to obtain the second video with subtitles, including: the electronic device uses an audio conversion tool to convert the first audio into voice content, and then adds the voice content to the subtitles in chronological order.
  • the second video with subtitles is obtained from the image frame of the first video.
  • the text information can be sequentially added to the frame of the video in chronological order to obtain a second video containing subtitles.
  • the electronic device After obtaining the second video containing the subtitles, the electronic device automatically intercepts the picture containing the preset keyword group from the second video.
  • the preset keyword group may be set by default by the system according to the business activity scenario, or manually selected by the user, which is not limited in this application.
  • the preset keyword groups in different business activities are different.
  • the preset keyword group may be “risk”, “return”, etc.
  • the preset keyword group may be “performance”, “disadvantage”, etc.
  • the electronic device when the electronic device is playing the above second video, when the video screen with the keywords "risk” appears, for example, it appears in the 31:21 split screen of the second video.
  • the electronic device started to take screenshots, intercepting every picture with subtitles, and the interception lasted for a period of time to get multiple pictures with subtitles.
  • the duration of interception is 1 minute, and the duration of interception can be set by default by the system, or it can be the duration of interception entered by the user when entering a keyword. It should be noted that this implementation method is described as an example of taking a screenshot for a period of time after the preset keywords (or keywords, key sentences, etc.) appear for the first time in the second video.
  • the electronic device can continue to take screenshots for a period of time every time the preset keywords (or keywords, key sentences, etc.) appear in the second video. For example, in the process of playing the second video, if the keywords "risk" appear in the 31:21 split screen, the electronic device starts to take screenshots, intercepting each picture with subtitles, and the interception lasts for a period of time. For example, in 1 minute, 10 pictures are obtained.
  • the electronic device starts the screenshot again, intercepts each picture with subtitles, and the interception lasts for a period of time, such as 1 minute, and then Get 10 pictures, and finally get 20 pictures.
  • the electronic device can also only capture the one or more video frames in the second video where the preset keywords (or keywords, key sentences, etc.) appear, and do not take screenshots for other pictures.
  • the electronic device will take a screenshot of the 31:51 screen, and when the video is played to 55:51 minutes When the keywords "risk” appeared again, the electronic device took a screenshot of the 55:51 screen and finally got 2 pictures.
  • the electronic device may only retain one of them, for example, the clearest captured picture may be retained.
  • the electronic device can intercept pictures containing preset keyword groups from videos containing subtitles, so that supervisors can quickly check whether business personnel inform customers of important key information during business activities based on the intercepted pictures. There is no need for supervisors to play the entire video for spot checks, which saves spot checks time and improves the efficiency of spot checks on business activities.
  • the captured image not only contains the preset keyword group, but also needs to contain the preset person, so that the business activities of the preset person can be randomly checked.
  • the electronic device intercepts the picture where the preset keyword group appears in the second video, specifically: the electronic device captures the preset keyword group appearing in the second video and the preset keyword group appears. Set the picture of the face image to be intercepted.
  • the preset face image may be set by default by the system according to the business activity scene, or manually selected by the user, which is not limited in this application.
  • the preset face image may be salesperson 1.
  • the electronic device When the electronic device is playing the above-mentioned second video with subtitles, when the screen showing the preset keyword group “risk” and the preset character image “salesman 1” appears, for example, when the video is played to the 31:21 split screen The screens of "risk” and “salesman 1" appeared in the video, and the electronic device started to take screenshots, intercepting each picture with subtitles, and the interception lasted for a period of time to get multiple pictures with subtitles.
  • the duration of the interception is 1 minute, and the duration of the interception may be set by default by the system, or it may be the duration of the interception input by the user when inputting a keyword group or presetting a character image.
  • this implementation method is explained by taking a screenshot for a period of time after the preset keyword group and the preset face appear in the second video for the first time. If subtitles and people appear more than once in the second video Face, the electronic device can continue to take a screenshot for a period of time every time the preset keyword group and the preset human face appear in the second video. For example, in the process of playing the second video, the screens of "risk” and "salesman 1" appear in the 31:21 split screen, the electronic device starts to take screenshots, and captures every picture with subtitles, and the capture continues Over a period of time, such as 1 minute, 10 pictures are obtained.
  • the electronic device starts to take screenshots again, intercepting each picture with subtitles, and the interception lasts for a period of time, for example, 1 minute. Get 10 more pictures, and finally get 20 pictures.
  • the electronic device may also only capture the one or more video frames in the second video where the preset keyword group and the preset human face appear, and no screenshots are taken for other pictures.
  • the electronic device will take a screenshot of the picture frame at 31:51, and the video will be played to 55 : At 51 minutes, the face images of "risk” and “salesman 1" appeared again, and the electronic device intercepted the 55:51 frame, and finally got 2 pictures.
  • the electronic device may only retain one of them, for example, the clearest captured picture may be retained.
  • the electronic device can intercept pictures containing preset keyword groups and preset characters from a video containing subtitles, so that supervisors can quickly check whether the preset characters will be important in the course of business activities based on the intercepted pictures
  • the key information of the company informs customers that there is no need for supervisors to play the entire video for random inspections, which saves time for random inspections and improves the efficiency of random inspections for business activities.
  • the electronic device intercepts the picture where the preset keyword group appears in the second video, specifically:
  • the electronic device uses optical character recognition (Optical Character Recognition, OCR) technology to recognize the text information in the second video, and intercepts pictures in the second video where the preset keyword group appears.
  • OCR Optical Character Recognition
  • OCR technology refers to the process of analyzing and recognizing images containing text to obtain text. Using OCR technology, the text in the image can be recognized and returned in the form of text.
  • the electronic device intercepts a picture in the second video where the preset keyword group appears and a preset face image appears, specifically:
  • the electronic device uses OCR technology to recognize the text information in the second video, and uses face recognition technology to recognize the face information in the second video, and captures pictures with preset keyword groups and preset face images in the second video .
  • face recognition technology can detect and track human faces in images, and then a series of related technologies for facial recognition of the detected human faces, which are usually called face recognition and facial recognition.
  • Face recognition technology is based on the facial features of a person. It first judges whether there is a face in the image or video stream. If there is a face, it further gives the location, size and location information of each major facial organ. And based on this information, further extract the identity features contained in each face, and compare it with the face image included in the interception instruction to identify the identity of the face.
  • the face image information includes face, iris, retina and other information.
  • N is an integer greater than 2
  • the pictures are multiple pictures
  • the method further includes:
  • the electronic device performs picture splicing on the multiple pictures to obtain one picture. That is, in order to facilitate the user to directly use the captured multiple pictures, the electronic device can automatically help the user to stitch the multiple pictures into one picture after capturing the multiple pictures, so that the user can directly use the one picture for random inspection of business activities.
  • the electronic device can automatically further splice the multiple extracted pictures, and finally generate a picture, which can facilitate the user to directly use the spliced picture for random inspection of business activities without the user manually using the picture splicing APP
  • the splicing of multiple pictures saves user operation time, reduces operation complexity, and improves the efficiency of random inspection of business activities.
  • the electronic device performs picture splicing on the multiple pictures to obtain one picture, including:
  • the electronic device extracts subtitles from pictures other than the picture with the earliest playing time among the pictures;
  • the electronic device splices the subtitles of the other pictures from top to bottom to the subtitles of the picture with the earliest playing time according to the order of playing time from morning to night to obtain a picture.
  • the pictures captured by the electronic device include N pictures.
  • the pictures except the one with the earliest playing time are N-1 pictures.
  • the electronic device can use OCR technology to extract text from the above N-1 pictures.
  • Information for example, the electronic device can use OCR to identify the area containing text in the picture.
  • the electronic device can intercept the area containing the text from each picture, such as the area below the picture, to obtain N-1 text pictures.
  • the electronic device After the electronic device extracts text information from the above N-1 pictures, it can add a time stamp to the text information of each picture.
  • the electronic device intercepts 10 pictures, and the playback time of the first picture is 31: 21 minutes, the playing time of the second picture is 31:28 minutes, the playing time of the third picture is 31:34 minutes, the playing time of the fourth picture is 31:41 minutes, and the playing time of the fifth picture is 31:48 minutes, the playing time of the sixth picture is 31:55 minutes, the playing time of the seventh picture is 32:02 minutes, the playing time of the eighth picture is 32:09 minutes, the playing time of the ninth picture The time is 32:15 minutes, and the playback time of the tenth picture is 32:21 minutes.
  • the electronic device obtains the text information of the next nine pictures
  • the time stamps added to the text information of each picture are 31:28, 31:34, 31:41, 31:48, 31:55, 32: 02, 32:09, 32:15, 32:21.
  • the electronic device arranges the text information of the last nine pictures (for example, picture areas containing text) in the order of time from morning to night, from top to bottom, under the first picture.
  • image splicing can be performed on the basis of the first picture of multiple pictures (that is, the picture with the earliest playing time), the first picture is kept intact, and the rest of the pictures only intercept the text part and combine the text part Spliced under the first picture in the above order, and finally generate a picture. It is convenient for users to directly use the spliced pictures to conduct random inspections of business activities. There is no need for users to manually use picture splicing APP to splice multiple pictures, which saves user operations. Time reduces the complexity of operations and improves the efficiency of random inspections of business activities.
  • FIG. 3 shows a schematic structural diagram of a picture interception device.
  • the picture interception device 300 includes: a recording unit 301, an adding unit 302 and an intercepting unit 303.
  • the recording unit 301 is configured to perform video recording on the business activity process to obtain the first video, and perform audio recording on the business activity process to obtain the first audio, and the business activity process includes business clerk negotiating with customers;
  • the adding unit 302 is configured to add subtitles to the first video according to the first audio to obtain a second video containing subtitles;
  • the interception unit 303 is configured to intercept pictures in the second video where the preset keyword group appears in the process of playing the second video.
  • the pictures are multiple pictures
  • the picture interception device 300 further includes: a first splicing unit, configured to perform, in the interception unit 303, pictures in the second video with preset keyword groups After the interception, picture stitching is performed on the multiple pictures to obtain a picture.
  • the first splicing unit includes:
  • An extraction unit configured to extract subtitles from pictures other than the picture with the earliest play time among the plurality of pictures
  • the second splicing unit is used for splicing the subtitles of the other pictures from top to bottom to the subtitles of the picture with the earliest playing time according to the order of playing time from morning to night to obtain a picture.
  • the interception unit 303 is specifically configured to intercept a picture in the second video where the preset keyword group appears and a preset face image appears.
  • the interception unit 303 is specifically configured to: use optical character recognition OCR technology to recognize subtitles in the second video, and intercept pictures in the second video where the preset keyword group appears .
  • the first adding unit includes:
  • a conversion unit configured to convert the first audio into voice content using an audio conversion tool
  • the second adding unit is configured to sequentially add the voice content to the image frames of the first video in chronological order to obtain the second video containing subtitles.
  • each unit in the picture intercepting device 300 can refer to the relevant description in the method embodiment shown in FIG. 2, which will not be repeated this time.
  • a computer-readable storage medium stores a computer program, the computer program includes program instructions, and the program instructions are implemented when executed by a processor.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
  • the readable storage medium may be any available medium that can be accessed by the computer or a data storage device such as a server or data center integrated with one or more available media.
  • the available medium may be a magnetic medium, (for example, a floppy disk, hard disk, Magnetic tape), optical media (for example, digital versatile disc (DVD), semiconductor media (for example, solid state disk, SSD), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

一种图片截取方法、装置及计算机存储介质,其中该方法包括:电子设备对业务活动过程进行视频录制得到第一视频,以及对所述业务活动过程进行音频录制得到第一音频,所述业务活动过程包括业务员与客户洽谈业务(S201);所述电子设备根据所述第一音频为所述第一视频添加字幕得到含字幕的第二视频(S202);在所述电子设备播放所述第二视频的过程中,所述电子设备对所述第二视频中出现预设关键词组的图片进行截取(S203)。上述方法可以提高对业务活动中涉及的业务内容进行抽检的效率。

Description

一种图片截取方法、装置及计算机存储介质
本申请要求于2019年07月30日提交中国专利局、申请号为2019107065936、申请名称为“一种图片截取方法、装置及计算机存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种图片截取方法、装置及计算机存储介质。
背景技术
在进行业务活动(例如业务员向客户销售某保险产品)过程中,通过对销售过程现场进行录音、录像等,可以便于监管部门随时进行监管抽查,并在发生纠纷时进行可回溯管理,进而保护消费者合法权益,但现有技术中仅仅是对销售过程进行全程录制,若监管部门需要查看销售过程中业务员是否将保险产品的收益、保单存在的风险等信息详尽地告知客户,往往需要监管人员对录制的音视频等进行全程播放,从中查看业务员的讲解过程中是否穷尽了保险产品的收益、保单存在的风险等信息,这种抽检过程比较复杂繁琐且耗时长,因此,如何提高对业务活动中涉及的业务内容进行高效抽检是目前亟需解决的技术问题。
发明内容
本申请实施例提供一种图片截取方法、装置及计算机存储介质,可以提高对业务活动中涉及的业务内容进行抽检的效率。
本申请实施例提供了一种图片截取方法,该方法包括:
电子设备对业务活动过程进行视频录制得到第一视频,以及对所述业务活动过程进行音频录制得到第一音频,所述业务活动过程包括业务员与客户洽谈业务;
所述电子设备根据所述第一音频为所述第一视频添加字幕得到含字幕的第二视频;
在所述电子设备播放所述第二视频的过程中,所述电子设备对所述第二视频中出现预设关键词组的图片进行截取。
本申请实施例还提供了一种图片截取装置,包括:
录制单元,用于对业务活动过程进行视频录制得到第一视频,以及对所述业务活动过程进行音频录制得到第一音频,所述业务活动过程包括业务员与客户洽谈业务;
第一添加单元,用于根据所述第一音频为所述第一视频添加字幕得到含字幕的第二视频;
截取单元,用于在播放所述第二视频的过程中,对所述第二视频中出现预设关键词组的图片进行截取。
本申请实施例还提供了一种电子设备,包括:处理器、输入装置、输出装置和存储器,处理器、输入装置、输出装置和存储器相互连接。其中,通信接口用于与其它电子设备(例如电子设备)进行通信,存储器用于存储上述图片截取方法的实现代码,处理器用于执行存储器中存储的程序代码,即执行上述图片截取方法。
本申请实施例还提供了一种计算机非易失性可读存储介质,非易失性可读存储介质上存储有指令,当其在处理器上运行时,使得处理器执行上述图片截取方法。
本申请实施例还提供了一种包含指令的计算机程序产品,当其在处理器上运行时,使得处理器执行上述图片截取方法。
实施本申请实施例,电子设备可以对业务活动过程进行录音、录像,并根据录制的音频为录制的视频添加字幕,从添加了字幕的视频中提取出包含预设关键词组的图片,提取 出的图片可以供用户(例如监管人员或业务人员)查看业务活动中涉及的业务内容,电子设备自动从含字幕视频中提取包含预设关键词组的图片可以高效完成对业务过程的抽检。电子设备可以自动将提取出的多张图片进一步进行拼接,最终生成一张图片,可以便于用户直接使用该拼接后的图片进行业务内容的抽检,无需用户手动采用图片拼接APP对多张图片进行拼接,节省了用户操作时间,降低了操作复杂度,提高了对业务活动中业务内容的抽检效率。电子设备可以以多张图片中的第一张图片(即播放时间最早的图片)为基础进行图片拼接,第一张图片完整保留,剩余其他图片只截取文字部分,并将文字部分按照上述顺序拼接到第一张图片下方,最终生成一张图片,可以便于用户直接使用该拼接后的图片进行业务内容的抽检,无需用户手动采用图片拼接APP对多张图片进行拼接,节省了用户操作时间,降低了操作复杂度,提高了对业务活动中业务内容的抽检效率。此外,电子设备还可以将同期录制的音频转换为文字,添加到视频中得到含字幕的视频,便于后期对含字幕视频中包含预设关键词组的图片进行截取,进而提高对业务抽检的效率。
本申请附加的方面和优点将在下面的描述中部分给出,这些将从下面的描述中变得明显,或通过本申请的实践了解到。
附图说明
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1为本申请实施例提供的一种电子设备的硬件结构示意图;
图2为本申请实施例提供的一种图片截取方法的流程示意图;
图3为本申请实施例提供的一种图片截取装置的结构示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述,显然,所描述的实施例仅仅是本申请一部份实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同的对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法或设备固有的其他步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
本申请实施例所涉及到的电子设备可以包括各种具有无线通信功能的手持设备、车载设备、可穿戴设备、计算设备或连接到无线调制解调器的其他处理设备,以及各种形式的用户设备(User Equipment,UE),移动台(Mobile Station,MS),终端设备(terminal device)等等。例如,可以为智能手机、平板电脑等移动终端,还可以为其他终端,此处不做限制。为方便描述,上面提到的设备统称为电子设备。下面结合附图对本申请实施例进行介绍。
请参见图1,图1是本申请实施例提供的一种电子设备的结构示意图,如图1所示,该电子设备100包括:至少一个处理器101,至少一个输入装置102,至少一个输出装置103,存储器104,至少一个总线105。其中,总线105用于实现这些组件之间的连接通信。
本申请实施例中,处理器101可为中央处理器(Central Processing Unit,CPU)或图形处理器(Graphics Processing Unit,GPU),在一些实施方式中,还可以被称为应用处理器(Application processor,AP),以与基带处理器进行区分。该处理器101还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(应用程序软件lication Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
输入设备102可以包括触控板、指纹采传感器(用于采集用户的指纹信息和指纹的方向信息)、摄像头、麦克风等,则输出设备103可以包括显示器(LCD等)、扬声器等。
该存储器104可以包括只读存储器和随机存取存储器,并向处理器101提供指令和数据。处理器101可用于读取和执行计算机可读指令。具体的,处理器101可用于调用存储于存储器104中的数据。存储器104的一部分还可以包括非易失性随机存取存储器。
具体实现中,本申请实施例中所描述的处理器101、输入设备102、输出设备103可执行下述图2所示图片截取方法涉及的部分或全部流程。
可选的,电子设备100还可以包括通信接口。通信接口可以是收发器、收发电路等,其中,通信接口是统称,可以包括一个或多个接口,例如电子设备与服务器之间的接口。通信接口可以包括有线接口和无线接口,例如标准接口、以太网、多机同步接口。可选地,当处理器101接收任何消息或数据时,其具体通过驱动或控制通信接口做接收。因此,处理器101可以被视为是执行发送或接收的控制中心,通信接口是发送和接收操作的具体执行者。
本申请实施例中,电子设备100可以是具备计算或处理能力的终端、服务器、电脑、视频录制设备、视频播放设备等。
基于图1所示的电子设备的结构,图2提供了本申请实施例涉及的一种图片截取方法,该图片截取方法包括但不限于如下步骤S201-S202。
S201:电子设备对业务活动过程进行视频录制得到第一视频,以及对业务活动过程进行音频录制得到第一音频,业务活动过程包括业务员与客户洽谈业务;
S202:电子设备根据第一音频为第一视频添加字幕得到含字幕的第二视频;
S203:在电子设备播放第二视频的过程中,电子设备对第二视频中出现预设关键词组的图片进行截取。
其中,业务活动包括业务员与客户洽谈业务的过程,例如销售产品的过程,这里,产品例如可以是保险产品、电子产品等。在业务活动进行中,电子设备可以进行音视频双录。双录就是录音、录像,可以针对客户办理业务的过程,特别是风险揭示的过程进行留痕,规范企业的销售行为,同时也为事后产生争议时提供依据。对经营机构来说,双录可以约束销售人员行为,规范业务办理流程,避免销售人员弱化或隐瞒风险、夸大产品收益,有利于经营机构的内部管理。双录对于客户来说,可以详细地了解产品信息、风险等级及自己的权利责任,保障自身的合法权益。
本申请基于双录的音视频,可以进一步高效快捷的对业务活动中业务员是否将关键业务信息告知客户这一事件进行抽检,关键业务信息例如产品风险、产品收益等等。
具体的,业务员在开启向客户销售某一款产品时,可以启动电子设备进行音视频双录,例如,电子设备包含按钮,业务员点击按钮触发启动双录。业务员向客户介绍完产品的全部信息后,可以触发电子设备停止录制,例如,电子设备包含按钮,业务员点击按钮触发结束录制。双录过程可以产生音频文件和视频文件,并且,音频文件和视频文件的时间是对齐的。之后,电子设备可以执行上述步骤S202和S203,这里,电子设备可以在双录结束之后自动执行上述步骤S202和S203,也可以是由用户触发电子设备执行上述步骤S202 和S203,本申请对此不限定。
具体的,电子设备根据第一音频为第一视频添加字幕得到含字幕的第二视频,包括:电子设备利用音频转换工具将第一音频转换为语音内容,然后将语音内容按照时间顺序依次添加至第一视频的图像帧中得到含字幕的第二视频。
由于电子设备录制的音频和视频是时间同步的,在将音频转换文字后,可以按照时间顺序将文字信息依次添加至视频的画面帧中,得到包含字幕的第二视频。
在得到包含字幕的第二视频后,电子设备自动从该第二视频中截取包含预设关键词组的图片。这里,预设关键词组可以是系统根据业务活动场景默认设置的,也可以是由用户手动选取的,本申请对此不进行限定。例如,不同的业务活动中预设的关键词组不同。例如,针对保险产品销售场景来说,预设关键词组可以是“风险”、“收益”等等,针对电子产品销售场景来说,预设关键词组可以是“性能”、“缺点”等等。针对某一业务活动场景,设置的预设关键词组可以是一个或多个。
例如,针对保险业务场景,电子设备在播放上述第二视频的过程中,当播放到出现“风险”这几个关键字的视频画面时,例如在第二视频播放到31:21分画面中出现了“风险”这几个关键字,电子设备开始启动截图,截取每一张含字幕的图片,截取持续一段时间,得到多张含字幕的图片。其中,截取的持续时间为1分钟,该截取的持续时间可以由系统默认设置,也可以由用户在输入关键字时输入的截取持续时间。需要说明的是,这种实现方式是以第二视频中首次出现了预设关键字(或关键词、关键句等)后持续截图一段时间为例进行的说明,若第二视频中不止一次出现了预设关键字(或关键词、关键句等),则电子设备可以在第二视频中每次出现预设关键字(或关键词、关键句等)时,均持续截图一段时间。例如,在播放第二视频的过程中,在31:21分画面中出现了“风险”这几个关键字,则电子设备开始启动截图,截取每一张含字幕的图片,截取持续一段时间,例如1分钟,得到了10张图片。并且在第二视频播放到55:51分钟时再次出现了“风险”这几个关键字,则电子设备再次启动截图,截取每一张含字幕的图片,截取持续一段时间,例如1分钟,又得到了10张图片,则最终得到20张图片。当然,电子设备也可以只截取第二视频中出现了预设关键字(或关键词、关键句等)的那一张或多张视频画面帧,针对其他图片则不进行截图。例如,在播放第二视频的过程中,在31:51分视频中出现了“风险”这几个关键字,则电子设备对31:51的画面进行截图,在视频播放到55:51分钟时再次出现了“风险”这几个关键字,则电子设备对55:51的画面进行截图,最终得到了2张图片。可选的,为了减少不必要的图片数量,若电子设备针对同一帧画面截取到多张相同图片,则电子设备可以只保留其中一张,例如,可以保留其中截取的最清晰的一张。
实施本申请实施例,电子设备可以从包含字幕的视频中截取包含预设关键词组的图片,使得监管人员能够基于截取的图片快速抽检业务人员在业务活动过程中是否将重要的关键信息告知客户,无需监管人员播放整个视频来抽检,节省了抽检时间,提高了对业务活动的抽检效率。
在又一种实现方式中,电子设备在截图时,截取的图片不仅要包含预设关键词组,同时需要包含预设人物,以便于对预设人物的业务活动进行抽检。这种情况下,所述电子设备对所述第二视频中出现预设关键词组的图片进行截取,具体为:所述电子设备对所述第二视频中出现所述预设关键词组并且出现预设人脸图像的图片进行截取。
其中,预设人脸图像可以是系统根据业务活动场景默认设置的,也可以是由用户手动选取的,本申请对此不进行限定。针对某一业务活动场景,设置的预设人脸图像可以是一个或多个。例如,针对保险产品销售场景,预设人脸图像可以是业务员1。电子设备在播放上述含字幕的第二视频的过程中,当播放到出现预设关键词组“风险”以及预设人物图像“业务员1”的画面时,例如在视频播放到31:21分画面中出现了“风险”以及“业务员 1”的画面,电子设备开始启动截图,截取每一张含字幕的图片,截取持续一段时间,得到多张含字幕的图片。其中,截取的持续时间为1分钟,该截取的持续时间可以由系统默认设置,也可以由用户在输入关键词组或者预设人物图像时输入的截取持续时间。需要说明的是,这种实现方式是以第二视频中首次出现了预设关键词组以及预设人脸后持续截图一段时间为例进行的说明,若第二视频中不止一次出现了字幕以及人脸,则电子设备可以在第二视频中每次出现预设关键词组以及预设人脸时,均持续截图一段时间。例如,在播放第二视频的过程中,在31:21分画面中出现了“风险”以及“业务员1”的画面,则电子设备开始启动截图,截取每一张含字幕的图片,截取持续一段时间,例如1分钟,得到了10张图片。并且在视频播放到55:51分钟时再次出现了“风险”以及“业务员1”的画面,则电子设备再次启动截图,截取每一张含字幕的图片,截取持续一段时间,例如1分钟,又得到了10张图片,则最终得到20张图片。当然,电子设备也可以只截取第二视频中出现了预设关键词组以及预设人脸的那一张或多张视频画面帧,针对其他图片则不进行截图。例如,在播放第二视频的过程中,在31:51分视频中出现了“风险”以及“业务员1”的画面,则电子设备对31:51的画面帧进行截图,在视频播放到55:51分钟时再次出现了“风险”以及“业务员1”的人脸画面,则电子设备对55:51的画面帧进行截取,最终得到了2张图片。可选的,为了减少不必要的图片数量,若电子设备针对同一帧画面截取到多张相同图片,则电子设备可以只保留其中一张,例如,可以保留其中截取的最清晰的一张。
实施本申请实施例,电子设备可以从包含字幕的视频中截取包含预设关键词组以及预设人物的图片等,使得监管人员能够基于截取的图片快速抽检预设人物在业务活动过程中是否将重要的关键信息告知客户,无需监管人员播放整个视频来抽检,节省了抽检时间,提高了对业务活动的抽检效率。
可选的,电子设备对第二视频中出现预设关键词组的图片进行截取,具体为:
电子设备采用光学字符识别(Optical Character Recognition,OCR)技术识别第二视频中的文字信息,并对第二视频中出现预设关键词组的图片进行截取。
OCR技术是指对含文本的图像进行分析识别处理,获取文字的过程。采用OCR技术可以将图像中的文字进行识别,并以文本的形式返回。
可选的,所述电子设备对所述第二视频中出现所述预设关键词组并且出现预设人脸图像的图片进行截取,具体为:
电子设备采用OCR技术识别第二视频中的文字信息,并采用人脸识别技术识别第二视频中的人脸信息,从第二视频中出现预设关键词组以及预设人脸图像的图片进行截取。
其中,人脸识别技术能够在图像中检测和跟踪人脸,进而对检测到的人脸进行脸部识别的一系列相关技术,通常也叫做人像识别、面部识别。人脸识别技术是基于人的脸部特征,首先判断图像或者视频流中是否存在人脸,如果存在人脸,则进一步的给出每个脸的位置、大小和各个主要面部器官的位置信息。并依据这些信息,进一步提取每个人脸中所蕴涵的身份特征,并将其与截取指令中包括的人脸图像进行对比,从而识别人脸的身份。其中,人脸图像信息包括脸、虹膜、视网膜等信息。
可选的,N为大于2的整数,所述图片为多张图片,所述电子设备对所述第二视频中出现预设关键词组的图片进行截取之后,还包括:
所述电子设备对所述多张图片进行图片拼接,得到一张图片。即为了便于用户直接使用截取的多张图片,电子设备可以在截取得到多张图片后,自动帮用户将这多张图片拼接成一张图片,便于用户直接使用该一张图片进行业务活动的抽检。
实施本申请实施例,电子设备可以自动将提取出的多张图片进一步进行拼接,最终生成一张图片,可以便于用户直接使用该拼接后的图片进行业务活动的抽检,无需用户手动采用图片拼接APP对多张图片进行拼接,节省了用户操作时间,降低了操作复杂度,提高 了对业务活动的抽检效率。
可选的,所述电子设备对所述多张图片进行图片拼接,得到一张图片,包括:
所述电子设备从除所述多张图片中播放时间最早的一张图片以外的其他图片中提取字幕;
所述电子设备按照播放时间由早到晚的顺序将所述其他图片的字幕由上至下依次拼接到所述播放时间最早的一张图片的字幕下方,得到一张图片。
其中,电子设备截取的图片包括N张,上述多张图片中除播放时间最早的一张图片以外的其他图片为N-1张,电子设备可以采用OCR技术从上述N-1张图片中提取文字信息,例如,电子设备可以采用OCR识别图片中包含文字的区域,确定出文字区域后,电子设备可以从每张图片中截取包含文字的区域,例如图片下方区域,得到N-1张文字图片。
电子设备从上述N-1张图片中提取文字信息后,可以为每一张图片的文字信息添加时间标记,例如,电子设备截取了10张图片,其中,第一张图片的播放时间是31:21分,第二张图片的播放时间是31:28分,第三张图片的播放时间是31:34分,第四张图片的播放时间是31:41分,第五张图片的播放时间是31:48分,第六张图片的播放时间是31:55分,第七张图片的播放时间是32:02分,第八张图片的播放时间是32:09分,第九张图片的播放时间是32:15分,第十张图片的播放时间是32:21分。则电子设备在得到后9张图片的文字信息后,依次为每张图片的文字信息添加的时间标记分别是31:28、31:34、31:41、31:48、31:55、32:02、32:09、32:15、32:21。最终,电子设备将后9张图片的文字信息(例如含文字的图片区域)按照时间由早到晚的顺序由上到下依次排列在第一张图片的下方。
实施本申请实施例,可以以多张图片中的第一张图片(即播放时间最早的图片)为基础进行图片拼接,第一张图片完整保留,剩余其他图片只截取文字部分,并将文字部分按照上述顺序拼接到第一张图片下方,最终生成一张图片,可以便于用户直接使用该拼接后的图片进行业务活动抽检,无需用户手动采用图片拼接APP对多张图片进行拼接,节省了用户操作时间,降低了操作复杂度,提高了对业务活动的抽检效率。
参见图3,图3示给出了一种图片截取装置的结构示意图,如图3所示,该图片截取装置300包括:录制单元301、添加单元302和截取单元303。
其中,录制单元301,用于对业务活动过程进行视频录制得到第一视频,以及对所述业务活动过程进行音频录制得到第一音频,所述业务活动过程包括业务员与客户洽谈业务;
添加单元302,用于根据所述第一音频为所述第一视频添加字幕得到含字幕的第二视频;
截取单元303,用于在播放所述第二视频的过程中,对所述第二视频中出现预设关键词组的图片进行截取。
在一种实现方式中,所述图片为多张图片,图片截取装置300还包括:第一拼接单元,用于在所述截取单元303对所述第二视频中出现预设关键词组的图片进行截取之后,对所述多张图片进行图片拼接,得到一张图片。
在一种实现方式中,所述第一拼接单元包括:
提取单元,用于从除所述多张图片中播放时间最早的一张图片以外的其他图片中提取字幕;
第二拼接单元,用于按照播放时间由早到晚的顺序将所述其他图片的字幕由上至下依次拼接到所述播放时间最早的一张图片的字幕下方,得到一张图片。
在一种实现方式中,所述截取单元303具体用于:对所述第二视频中出现所述预设关键词组并且出现预设人脸图像的图片进行截取。
在一种实现方式中,所述截取单元303具体用于:采用光学字符识别OCR技术识别所述第二视频中的字幕,并从所述第二视频中截取出现所述预设关键词组的图片。
在一种实现方式中,所述第一添加单元,包括:
转换单元,用于利用音频转换工具将所述第一音频转换为语音内容;
第二添加单元,用于将所述语音内容按照时间顺序依次添加至所述第一视频的图像帧中得到含字幕的所述第二视频。
需要说明的是,图片截取装置300中各个单元的功能和实现可以参考前述图2所示方法实施例中的相关描述,此次不再赘述。
在本申请的另一实施例中提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行时实现。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如数字多功能光盘(digital versatile disc,DVD)、半导体介质(例如固态硬盘solid state disk,SSD)等。
以上所述的具体实施方式,对本申请实施例的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请实施例的具体实施方式而已,并不用于限定本申请实施例的保护范围,凡在本申请实施例的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请实施例的保护范围之内。

Claims (20)

  1. 一种图片截取方法,其特征在于,包括:
    电子设备对业务活动过程进行视频录制得到第一视频,以及对所述业务活动过程进行音频录制得到第一音频,所述业务活动过程包括业务员与客户洽谈业务;
    所述电子设备根据所述第一音频为所述第一视频添加字幕得到含字幕的第二视频;
    在所述电子设备播放所述第二视频的过程中,所述电子设备对所述第二视频中出现预设关键词组的图片进行截取。
  2. 根据权利要求1所述的方法,其特征在于,所述图片为多张图片,所述电子设备对所述第二视频中出现预设关键词组的图片进行截取之后,还包括:
    所述电子设备对所述多张图片进行图片拼接,得到一张图片。
  3. 根据权利要求2所述的方法,其特征在于,所述电子设备对所述多张图片进行图片拼接,得到一张图片,包括:
    所述电子设备从除所述多张图片中播放时间最早的一张图片以外的其他图片中提取字幕;
    所述电子设备按照播放时间由早到晚的顺序将所述其他图片的字幕由上至下依次拼接到所述播放时间最早的一张图片的字幕下方,得到一张图片。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述电子设备对所述第二视频中出现预设关键词组的图片进行截取,包括:
    所述电子设备对所述第二视频中出现所述预设关键词组并且出现预设人脸图像的图片进行截取。
  5. 根据权利要求4所述的方法,其特征在于,所述电子设备对所述第二视频中出现所述预设关键词组并且出现预设人脸图像的图片进行截取,包括:
    所述电子设备采用OCR技术识别所述第二视频中的字幕,并采用人脸识别技术识别所述第二视频中的人脸信息,并从所述第二视频中提取包含所述预设关键词组以及所述预设人脸图像的图片。
  6. 根据权利要求1至3任一项所述的方法,其特征在于,所述电子设备对所述第二视频中出现预设关键词组的图片进行截取,包括:
    所述电子设备采用光学字符识别OCR技术识别所述第二视频中的字幕,并从所述第二视频中截取出现所述预设关键词组的图片。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述电子设备根据所述第一音频为所述第一视频添加字幕得到含字幕的第二视频,包括:
    所述电子设备利用音频转换工具将所述第一音频转换为语音内容;
    所述电子设备将所述语音内容按照时间顺序依次添加至所述第一视频的图像帧中得到含字幕的所述第二视频。
  8. 一种图片截取装置,其特征在于,包括:
    录制单元,用于对业务活动过程进行视频录制得到第一视频,以及对所述业务活动过程进行音频录制得到第一音频,所述业务活动过程包括业务员与客户洽谈业务;
    添加单元,用于根据所述第一音频为所述第一视频添加字幕得到含字幕的第二视频;
    截取单元,用于在播放所述第二视频的过程中,对所述第二视频中出现预设关键词组的图片进行截取。
  9. 根据权利要求8所述的装置,其特征在于,所述图片为多张图片,所述装置还包括:
    拼接单元,用于在所述截取单元对所述第二视频中出现预设关键词组的图片进行截取之后,对所述多张图片进行图片拼接,得到一张图片。
  10. 根据权利要求9所述的装置,其特征在于,所述第一拼接单元包括:
    提取单元,用于从除所述多张图片中播放时间最早的一张图片以外的其他图片中提取字幕;
    第二拼接单元,用于按照播放时间由早到晚的顺序将所述其他图片的字幕由上至下依次拼接到所述播放时间最早的一张图片的字幕下方,得到一张图片。
  11. 根据权利要求8至10任一项所述的装置,其特征在于,所述截取单元具体用于:对所述第二视频中出现所述预设关键词组并且出现预设人脸图像的图片进行截取。
  12. 根据权利要求11所述的装置,其特征在于,所述截取单元具体用于:采用OCR技术识别所述第二视频中的字幕,并采用人脸识别技术识别所述第二视频中的人脸信息,并从所述第二视频中提取包含所述预设关键词组以及所述预设人脸图像的图片。
  13. 根据权利要求8至10任一项所述的装置,其特征在于,所述截取单元具体用于:
    采用光学字符识别OCR技术识别所述第二视频中的字幕,并从所述第二视频中截取出现所述预设关键词组的图片。
  14. 根据权利要求8至13任一项所述的装置,其特征在于,所述第一添加单元,包括:
    转换单元,用于利用音频转换工具将所述第一音频转换为语音内容;
    第二添加单元,用于将所述语音内容按照时间顺序依次添加至所述第一视频的图像帧中得到含字幕的所述第二视频。
  15. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    存储器;
    一个或多个应用程序,其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个应用程序配置用于执行以下步骤:
    对业务活动过程进行视频录制得到第一视频,以及对所述业务活动过程进行音频录制得到第一音频,所述业务活动过程包括业务员与客户洽谈业务;
    根据所述第一音频为所述第一视频添加字幕得到含字幕的第二视频;
    在播放所述第二视频的过程中,对所述第二视频中出现预设关键词组的图片进行截取。
  16. 根据权利要求15所述的电子设备,其特征在于,所述图片为多张图片,所述对所述第二视频中出现预设关键词组的图片进行截取之后,所述一个或多个应用程序被配置用于执行以下步骤:
    对所述多张图片进行图片拼接,得到一张图片。。
  17. 根据权利要求16所述的电子设备,其特征在于,所述对所述多张图片进行图片拼接,得到一张图片时,所述一个或多个应用程序被配置用于执行以下步骤:
    从除所述多张图片中播放时间最早的一张图片以外的其他图片中提取字幕;
    按照播放时间由早到晚的顺序将所述其他图片的字幕由上至下依次拼接到所述播放时间最早的一张图片的字幕下方,得到一张图片。
  18. 根据权利要求15至17任一项所述的电子设备,其特征在于,所述对所述第二视频中出现预设关键词组的图片进行截取时,所述一个或多个应用程序还被配置用于执行以下步骤:
    对所述第二视频中出现所述预设关键词组并且出现预设人脸图像的图片进行截取。
  19. 根据权利要求18所述的电子设备,其特征在于,所述对所述第二视频中出现所述预设关键词组并且出现预设人脸图像的图片进行截取时,所述一个或多个应用程序被配置用于执行以下步骤:
    采用OCR技术识别所述第二视频中的字幕,并采用人脸识别技术识别所述第二视频中的人脸信息,并从所述第二视频中提取包含所述预设关键词组以及所述预设人脸图像的图 片。
  20. 一种计算机非易失性可读存储介质,其特征在于,所述计算机非易失性可读存储介质上存储有计算机程序,该程序被处理器执行时实现权利要求1至7任一项所述的图片截取方法。
PCT/CN2019/117170 2019-07-30 2019-11-11 一种图片截取方法、装置及计算机存储介质 WO2021017277A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910706593.6 2019-07-30
CN201910706593.6A CN110490101A (zh) 2019-07-30 2019-07-30 一种图片截取方法、装置及计算机存储介质

Publications (1)

Publication Number Publication Date
WO2021017277A1 true WO2021017277A1 (zh) 2021-02-04

Family

ID=68548973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117170 WO2021017277A1 (zh) 2019-07-30 2019-11-11 一种图片截取方法、装置及计算机存储介质

Country Status (2)

Country Link
CN (1) CN110490101A (zh)
WO (1) WO2021017277A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298819A (zh) * 2020-06-09 2021-08-24 阿里巴巴集团控股有限公司 一种视频的处理方法、装置及电子设备
CN113766149A (zh) * 2020-08-28 2021-12-07 北京沃东天骏信息技术有限公司 字幕拼接图片的拼接方法、装置、电子设备和存储介质
CN114359920A (zh) * 2020-09-30 2022-04-15 北京小米移动软件有限公司 图像处理方法、装置、设备及存储介质
CN112380922B (zh) * 2020-10-23 2024-03-22 岭东核电有限公司 复盘视频帧确定方法、装置、计算机设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103260082A (zh) * 2013-05-21 2013-08-21 王强 一种视频处理方法及装置
CN106851401A (zh) * 2017-03-20 2017-06-13 惠州Tcl移动通信有限公司 一种自动添加字幕的方法及系统
US20170280200A1 (en) * 2016-03-24 2017-09-28 Echostar Technologies L.L.C. Direct capture and sharing of screenshots from video programming
CN107729522A (zh) * 2017-10-27 2018-02-23 优酷网络技术(北京)有限公司 多媒体资源片段截取方法及装置
CN107835388A (zh) * 2017-11-22 2018-03-23 成都欧远信电子科技有限公司 监控录像智能且自动分析系统
CN109146789A (zh) * 2018-08-23 2019-01-04 北京优酷科技有限公司 画面拼接方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103634605B (zh) * 2013-12-04 2017-02-15 百度在线网络技术(北京)有限公司 视频画面的处理方法及装置
CN105871681A (zh) * 2015-12-14 2016-08-17 乐视网信息技术(北京)股份有限公司 一种字幕添加方法及装置
US20170366661A1 (en) * 2016-06-21 2017-12-21 Angel Macklin Add text and audio to a selfie
CN109783338B (zh) * 2019-01-02 2022-11-15 深圳壹账通智能科技有限公司 基于业务信息的录制处理方法、装置和计算机设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103260082A (zh) * 2013-05-21 2013-08-21 王强 一种视频处理方法及装置
US20170280200A1 (en) * 2016-03-24 2017-09-28 Echostar Technologies L.L.C. Direct capture and sharing of screenshots from video programming
CN106851401A (zh) * 2017-03-20 2017-06-13 惠州Tcl移动通信有限公司 一种自动添加字幕的方法及系统
CN107729522A (zh) * 2017-10-27 2018-02-23 优酷网络技术(北京)有限公司 多媒体资源片段截取方法及装置
CN107835388A (zh) * 2017-11-22 2018-03-23 成都欧远信电子科技有限公司 监控录像智能且自动分析系统
CN109146789A (zh) * 2018-08-23 2019-01-04 北京优酷科技有限公司 画面拼接方法及装置

Also Published As

Publication number Publication date
CN110490101A (zh) 2019-11-22

Similar Documents

Publication Publication Date Title
WO2021017277A1 (zh) 一种图片截取方法、装置及计算机存储介质
US10425679B2 (en) Method and device for displaying information on video image
JP6293269B2 (ja) コンテンツ視聴確認装置及びその方法
WO2021175019A1 (zh) 音视频录制引导方法、装置、计算机设备及存储介质
US11151364B2 (en) Video image overlay of an event performance
US11004163B2 (en) Terminal-implemented method, server-implemented method and terminal for acquiring certification document
ES2767105T3 (es) Un sistema de registro para generar una transcripción de un diálogo
US10244209B1 (en) Remote agent capture and monitoring
WO2021159644A1 (zh) 一种截屏管理方法、装置及移动终端
US20230350944A1 (en) Digital media authentication
JP2017184120A (ja) 画像処理システム
JP2024010239A (ja) 保険相談システム、保険相談端末、録画方法、および録画プログラム
WO2023124944A1 (zh) 视频通话方法、装置、电子设备和存储介质
WO2016188079A1 (zh) 终端设备的数据存储方法及终端设备
CN114265759A (zh) 一种数据信息泄露后的溯源方法、系统及电子设备
US20220253551A1 (en) System and method for preventing sensitive information from being recorded
CN112565512A (zh) 一种Android应用之间的通过截图和录音获取数据的方法及系统
WO2020253118A1 (zh) 一种展业方法及装置
CN112135187B (zh) 多媒体数据生成方法、截取方法、装置、设备及存储介质
US12028319B1 (en) Image-based firewall for synthetic media prevention
US11784975B1 (en) Image-based firewall system
US10536729B2 (en) Methods, systems, and media for transforming fingerprints to detect unauthorized media content items
WO2022068024A1 (zh) 图像检索方法及装置、电子设备及存储介质
US11966500B2 (en) Systems and methods for isolating private information in streamed data
WO2019206090A1 (zh) 媒体文件备注方法、装置、移动终端和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19939989

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19939989

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19939989

Country of ref document: EP

Kind code of ref document: A1