CN115761266A - Picture processing method and device, storage medium and electronic equipment - Google Patents

Picture processing method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN115761266A
CN115761266A CN202211490584.6A CN202211490584A CN115761266A CN 115761266 A CN115761266 A CN 115761266A CN 202211490584 A CN202211490584 A CN 202211490584A CN 115761266 A CN115761266 A CN 115761266A
Authority
CN
China
Prior art keywords
picture
contour
determining
voice
outline
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211490584.6A
Other languages
Chinese (zh)
Inventor
霍飞龙
杭云
郭宁
施唯佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Digital Life Technology Co Ltd
Original Assignee
Tianyi Digital Life Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Digital Life Technology Co Ltd filed Critical Tianyi Digital Life Technology Co Ltd
Priority to CN202211490584.6A priority Critical patent/CN115761266A/en
Publication of CN115761266A publication Critical patent/CN115761266A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a picture processing method and device, a storage medium and an electronic device, wherein the method comprises the following steps: acquiring a picture shared in a current online conference; identifying each object in the picture and an object outline of each object; displaying each object outline on the picture, and marking a corresponding outline number for each object outline on the picture to obtain a marked picture; displaying the marked pictures at each participant terminal participating in the online conference; extracting key information from object-designated voice corresponding to the marked picture; and determining the target object contour in each object contour based on the key information, and highlighting the target object contour and the contour number of the target object contour on each participant terminal. The invention can highlight the outline of the object to be appointed, is convenient for the personnel participating in the conference to quickly and accurately determine the object to be appointed, reduces the probability of mistakenly recognizing the object and further ensures that the personnel participating in the conference can accurately receive information.

Description

Picture processing method and device, storage medium and electronic equipment
Technical Field
The present invention relates to the field of video networking technologies, and in particular, to a method and an apparatus for processing pictures, a storage medium, and an electronic device.
Background
With the development of internet technology, remote technology brings great convenience to the life of people. For example, people can start an online meeting in a remote mode, or teachers can give lessons to students in a remote mode, so that the system is not limited by time, space and place conditions, and great convenience is provided for people.
When an online meeting or online teaching is carried out, a speaker usually puts contents to be explained on a screen to share the contents to a participant, and then explains the contents in the screen to the participant in a remote explanation mode. When the speaker needs to designate a certain object in the screen, complex description needs to be performed on the object, so that the participant can understand the object designated by the speaker, sometimes the participant can have wrong understanding on the description of the speaker, and at this time, the object designated by the speaker can be mistaken, so that the participant cannot correctly receive the information transmitted by the speaker.
Disclosure of Invention
In view of this, the present invention provides a picture processing method and apparatus, a storage medium, and an electronic device, which can highlight a specified object in a picture shared in a conference, thereby preventing a participant from mistakenly recognizing the specified object, and further ensuring that the participant can accurately receive information.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
a picture processing method comprises the following steps:
acquiring a picture shared in a current online conference;
identifying individual objects in the picture and an object contour for each of the objects;
displaying each object outline on the picture, and marking a corresponding outline number for each object outline on the picture to obtain a marked picture;
determining each participant terminal of the online conference, and displaying the marked picture at each participant terminal;
acquiring object specified voice corresponding to the marked picture, and extracting key information from the object specified voice;
and determining a target object contour in each object contour based on the key information, and highlighting the target object contour and a contour number of the target object contour on each participant terminal.
The above method, optionally, the identifying the objects in the picture and the object contour of each object, includes:
acquiring an object identification parameter;
determining a picture identification algorithm corresponding to the object identification parameters;
and processing the picture by using the picture recognition algorithm, and determining each object in the picture and the object outline of each object.
Optionally, the above method, wherein the obtaining of the object specifying voice corresponding to the tag picture includes:
receiving a voice acquisition instruction;
determining a voice input terminal in each participant terminal based on the voice acquisition instruction;
and determining the voice input by the voice input terminal based on the marked picture as the object specified voice.
The above method, optionally, the extracting key information from the object-specific speech includes:
converting the object-specific speech into text information;
performing word segmentation processing on the text information to obtain each description word;
and screening each description participle, and determining the screened description participle as key information.
In the above method, optionally, the highlighting the target object contour and the contour number of the target object contour on each of the participant terminals includes:
highlighting the target object contour and the contour number of the target object contour on the marked picture, and hiding the rest object contour and the contour number to obtain a processed picture;
and displaying the processed pictures on each participating terminal.
A picture processing apparatus comprising:
the first acquisition unit is used for acquiring a picture shared in a current online conference;
the identification unit is used for identifying each object in the picture and the object outline of each object;
the marking unit is used for displaying each object outline on the picture and marking a corresponding outline number for each object outline on the picture to obtain a marked picture;
the determining unit is used for determining each participating terminal of the online conference and displaying the marked picture on each participating terminal;
a second acquisition unit, configured to acquire an object specifying voice corresponding to the tag picture, and extract key information from the object specifying voice;
and the display unit is used for determining a target object contour in each object contour based on the key information and highlighting the target object contour and the contour number of the target object contour on each participant terminal.
The above apparatus, optionally, the identification unit includes:
an acquisition subunit, configured to acquire an object identification parameter;
the first determining subunit is used for determining a picture identification algorithm corresponding to the object identification parameters;
and the second determining subunit is used for processing the picture by using the picture recognition algorithm and determining each object in the picture and the object outline of each object.
The above apparatus, optionally, the second obtaining unit includes:
the receiving subunit is used for receiving the voice acquisition instruction;
the third determining subunit is used for determining a voice input terminal in each participating terminal based on the voice acquisition instruction;
and the fourth determining subunit is used for determining the voice input by the voice input terminal based on the marked picture as the object specified voice.
The above apparatus, optionally, the second obtaining unit includes:
a conversion subunit, configured to convert the object specifying voice into text information;
the word segmentation processing subunit is used for carrying out word segmentation processing on the text information to obtain each description word;
and the screening subunit is used for screening the description participles and determining the description participles obtained by screening as key information.
The above apparatus, optionally, the display unit, includes:
the processing subunit is configured to perform highlighting processing on the target object contour and the contour number of the target object contour on the tag image, and perform hiding processing on the remaining object contour and the contour number to obtain a processed image;
and the display subunit is used for displaying the processed pictures on each participating terminal.
A storage medium comprising stored instructions, wherein the instructions, when executed, control a device on which the storage medium resides to perform the picture processing method as described above.
An electronic device comprising a memory, and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by one or more processors to perform a picture processing method as described above.
Compared with the prior art, the invention has the following advantages:
the invention provides a picture processing method and device, a storage medium and an electronic device, wherein the method comprises the following steps: acquiring a picture shared in a current online conference; identifying each object in the picture and an object outline of each object; displaying each object outline on the picture, and marking a corresponding outline number for each object outline on the picture to obtain a marked picture; determining each participant terminal of the online conference, and displaying the marked picture at each participant terminal; acquiring object specified voice corresponding to the marked picture, and extracting key information from the object specified voice; and determining the target object contour in each object contour based on the key information, and highlighting the target object contour and the contour number of the target object contour on each participant terminal. The invention can highlight the outline of the object to be appointed, is convenient for the personnel participating in the conference to quickly and accurately determine the object to be appointed, reduces the probability of mistakenly recognizing the object, and further ensures that the participating personnel can accurately receive information.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a method of processing an image according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for identifying objects in a picture and determining an object contour for each object according to an embodiment of the present invention;
FIG. 3 is a flowchart of a method for obtaining object-specific speech according to an embodiment of the present invention;
FIG. 4 is a flowchart of a method for extracting key information from object-specific speech according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a scenario of a method according to an embodiment of the present invention;
fig. 6 is an exemplary diagram of a currently displayed picture in an online conference in the method provided by the embodiment of the present invention;
fig. 7 is an exemplary diagram of a picture obtained by marking a currently displayed picture of an online conference in the method provided by the embodiment of the present invention;
FIG. 8 is an exemplary diagram of a picture after a designated object is highlighted according to an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of a picture processing apparatus according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
Today's scientific and technological development, people's life has been supported and enriched to various electronic equipment, has greatly drawn people's distance through electronic equipment for even can also carry out the conversation at any time at a distance of ten thousand and exchange, even online meeting, online lecture also become the indispensable form at present gradually.
However, when people are talking, meeting and online class by watching the screen, there is often a situation that when the speaker wants to specify an object in the screen, it is usually necessary to perform a complicated description to let other people know which object the speaker specifies, for example: when a picture containing a plurality of people is displayed in a video conference and introduced, the pictures are generally described from the aspects of clothing, appearance, position and the like, so that a participant can be prompted to judge which one of the pictures the currently described people is, and further, the sight of people can be gradually focused on the related people. If there are many target objects in the picture, or the features are not obvious, or the description of the user is not clear enough, it is generally difficult for others to understand which object is specifically described at present, which is easy to cause misunderstanding, resulting in that the participant cannot correctly receive the information transmitted by the speaker.
In view of the above problems, the present invention provides a picture processing method and apparatus, a storage medium, and an electronic device, which can identify each object in a picture shared by an online conference, determine an object specified by a speaker according to the voice of the speaker, and then highlight the object at each participant terminal participating in the conference, so that a participant can accurately and quickly determine the object specified by the speaker, thereby reducing the probability of misrecognition, and ensuring that the participant can correctly receive information transmitted by the speaker.
The invention is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multi-processor apparatus, distributed computing environments that include any of the above devices or equipment, and the like. Preferably, the present invention can be applied to a video system constructed using a video networking technology, which can implement a real-time video call function such as an online conference, an online lecture, and the like, and the video system can be constructed using a computer or a multiprocessor device.
Referring to fig. 1, a flowchart of a method of processing an image according to an embodiment of the present invention is specifically described as follows:
s101, obtaining the picture shared in the current online meeting.
In the process of starting the online conference, the picture shared in the online conference is acquired, preferably, the online conference is shared by calling a video call module in a video system, wherein the video call module can support terminal equipment to carry out remote video call, video conference and the like, and a user can share the picture which needs to be shared by other conference participants through a GIA video call module.
The online conference can be a conference started by the staff of an enterprise in a real-time video mode, and can also be a real-time video for online teaching performed by teachers and students.
Preferably, when the online conference is a video conference started by an enterprise, the main control end in the conference is a terminal of a speaker, and the shared picture is a picture shared by the speaker in the conference; when the online conference gives lessons for the teachers, the main control end in the conference is the terminal of the teachers, and the shared pictures are pictures shared by the teachers.
When the picture shared in the online conference is obtained, the picture currently displayed by the video call module can be obtained by using the image obtaining module, preferably, the picture obtaining module can obtain the picture from the video call module in real time, can also obtain the picture at regular time or obtain the picture when the picture shared by the main control terminal is identified to be changed, and can also activate the picture obtaining module to obtain the picture currently displayed by the video call module when a corresponding event trigger instruction is received.
Preferably, the picture acquisition module extracts the picture currently displayed by the video call module, so that the problem that the updating frequency of relevant identification and target outline expression is too high due to the real-time picture change of the video, and the user is troubled is avoided.
S102, identifying each object in the picture and the object outline of each object.
Preferably, the image recognition module can be used to recognize the individual objects in the picture and the object outline of each object.
Referring to fig. 2, a flowchart of a method for identifying each object in a picture and determining an object contour of each object according to an embodiment of the present invention is specifically described as follows:
s201, obtaining object identification parameters.
Preferably, the object identification parameters include, but are not limited to, the type of the object to be identified, the type of the contour of the object, and the like.
S202, determining an image recognition algorithm corresponding to the object recognition parameters.
The image recognition algorithm includes, but is not limited to, an object contour recognition algorithm, an edge algorithm, a face recognition algorithm, a vehicle recognition algorithm, and the like.
Preferably, the corresponding picture recognition algorithm is determined based on the object type in the object recognition parameters, and the parameters in the picture recognition algorithm are adjusted according to the information such as the object contour type, so as to obtain the final picture recognition algorithm.
Exemplarily, when the recognized object type is human and the contours of the human face organs such as eyes, nose, mouth, ears and the like in the human face need to be recognized, after the human face recognition algorithm is determined, parameters of the contours of the human face organs can be added to the human face recognition algorithm, so that a final image recognition algorithm can be obtained; furthermore, when the human face organ does not need to be recognized and only the human face needs to be recognized, parameters for recognizing the human face contour can be added to the human face recognition algorithm, so that the final image recognition algorithm can be obtained.
For example, when the identified object is an animal and only the body contour of the animal needs to be identified, an algorithm for identifying the animal can be determined, parameters for identifying the body contour of the animal are added to the algorithm, and the final algorithm is determined as a picture identification algorithm.
S203, processing the picture by using a picture recognition algorithm, and determining each object in the picture and the object outline of each object.
Preferably, each object has a plurality of object contours or one object contour, for example, when only a face needs to be recognized, the recognized object has only one face contour, and when a face organ in the face needs to be recognized, the recognized object contour includes contours of organs on the face in addition to the face contour.
S103, displaying each object contour on the picture, and marking a corresponding contour number for each object contour on the picture to obtain a marked picture.
After the object outlines of the objects are identified, each object outline can be drawn in the picture by using lines so as to show each object outline on the picture, and further, a unique outline number can be marked for each object outline on the picture, so that a marked picture is obtained.
Further, the color of the lines of each object outline may be different.
And S104, determining each participating terminal of the online conference, and displaying the marked picture at each participating terminal.
And determining each participant terminal participating in the online conference, further comprising a main control terminal, and transmitting the marked picture to each parameter terminal in a remote mode so as to display the picture on each participant terminal.
And S105, acquiring object specified voice corresponding to the marked picture, and extracting key information from the object specified voice.
Referring to fig. 3, a flowchart of a method for acquiring an object-specific voice according to an embodiment of the present invention is specifically described as follows:
s301, receiving a voice acquisition instruction.
Preferably, the voice acquisition instruction may be an instruction generated by the main control terminal, and the instruction is used to open a voice input authority to a terminal that needs to designate an object in the marked picture, and is also used to prompt that the system has the terminal that needs to designate the object in the marked picture.
S302, determining a voice input terminal in each participant terminal based on the voice acquisition instruction.
And analyzing the voice acquisition instruction, acquiring a terminal identification code in the voice acquisition instruction, comparing the terminal identification code with the identity identification code of each participant terminal, and determining the participant terminal to which the identity identification code consistent with the terminal identification code belongs as a voice input terminal.
And S303, determining the voice input by the voice input terminal based on the marked picture as the object designated voice.
Preferably, after the voice input terminal is determined, that is, the authority of voice input is opened for the voice input terminal, the voice acquisition module is called to receive the voice input by the voice input terminal based on the marked picture, so as to acquire the object specified voice. For example, the voice input terminal may capture voice that the user describes based on the tagged picture and then transmit the voice to the system.
The voice input terminal is determined through the voice acquisition instruction, the voice input authority is developed for the voice input terminal, the order in the conference can be effectively maintained, the noisy condition caused by multi-person speech is avoided, and the use experience of users is effectively improved.
Referring to fig. 4, a flowchart of a method for extracting key information from object-specific speech according to an embodiment of the present invention is specifically described as follows:
s401, converting the object designated voice into text information.
The object-specific speech may be processed using an acoustic model with a speech recognition algorithm, such as a hidden markov model (hmm) to implement an acoustic model, a CTC-based acoustic model, including but not limited to the example model, to obtain text information corresponding to the object-specific speech.
S402, performing word segmentation processing on the text information to obtain each description word.
When the word segmentation is carried out on the text information, the word segmentation can be carried out in a character string matching mode, and the word segmentation can also be carried out on the text information by using a machine learning model, so that each description word is obtained.
And S403, screening each description participle, and determining the screened description participle as key information.
And screening each description participle, thereby filtering invalid participles in the description participles and obtaining the key information.
According to the invention, after the voice is converted into the text information, word segmentation processing is carried out on the text information, each obtained word segmentation is screened, invalid word segmentation is filtered out, and effective word segmentation is reserved, so that key information is obtained, and the probability of determining an object which a user wants to specify is effectively improved.
S106, determining the target object contour in each object contour of the marked picture based on the key information, and highlighting the target object contour and the contour number of the target object contour on each participant terminal.
Preferably, the key information includes an input number, and the input number is a number of an outline of the object to be specified, which is input by a user of the voice input terminal.
And traversing the input number by the contour number of each object contour, and determining the object contour to which the contour number consistent with the input number belongs as the target object contour.
Highlighting the contour of the target object and the contour number of the contour of the target object on the marked picture, and hiding the remaining contour of the object and the contour number to obtain a processed picture; and displaying the processed pictures on each participating terminal.
Preferably, when the highlight processing is performed on the target object contour and the contour number of the target object contour, the lines of the target object contour and the contour number of the target object contour may be highlighted, thickened, or processed with a more striking color, so that the contour numbers of the target object contour and the target object contour are more striking on the picture. Furthermore, the color value can be modified according to the contour coordinate point, so that the visual contrast is increased.
It should be noted that, the video system applied in the embodiment of the present invention is further configured with a speech recognition module, and the speech recognition module is configured to extract key information from the object-specific speech, determine a target object contour in each object contour of the tagged picture based on the key information, and highlight the target object contour and a contour number of the target object contour on each participant terminal.
The video system is provided with an event triggering module besides a video call module, an image acquisition module, an image recognition module and a voice recognition module; preferably, in order to avoid disturbing a user and reduce resource consumption, videos in the video system are kept in normal operation by module defaults to support displaying of pictures to be shared in a conference at each participating terminal, other modules are in a silent state, when contents in the pictures to be shared need to be specified, an activation instruction needs to be sent to the event trigger module, the event trigger module activates the image acquisition module, the image recognition module and the voice recognition module to enable the modules to be in an operation state, and then the corresponding modules are called to realize the method provided by the embodiment of the invention.
In the method provided by the embodiment of the invention, the picture shared in the current online meeting is obtained; identifying each object in the picture and an object outline of each object; displaying each object contour on a picture, and marking a corresponding contour number for each object contour on the picture to obtain a marked picture; determining each participant terminal of the online conference, and displaying the marked picture at each participant terminal; acquiring object specified voice corresponding to the marked picture, and extracting key information from the object specified voice; and determining the target object contour in each object contour based on the key information, and highlighting the target object contour and the contour number of the target object contour on each participant terminal. The invention can highlight the outline of the object to be appointed, is convenient for the personnel participating in the conference to quickly and accurately determine the object to be appointed, reduces the probability of mistakenly recognizing the object, and further ensures that the participating personnel can accurately receive information.
Referring to fig. 5, a diagram illustrating a scenario of a method according to an embodiment of the present invention is specifically described as follows: the figure includes a video system and users 1 to 4, preferably, the number of users is not limited to 4 in practical application, and different users may use different intelligent devices, such as a computer terminal, a tablet, a mobile phone, and the like, and may be connected to the video system.
Preferably, the video system is provided with a video call module, an event triggering module, an image recognition module, an image acquisition module and a voice recognition module, the user 1 is assumed to be a main control terminal of the conference, and the users 2 to 4 are all other participant terminals, referring to fig. 6, the video call module is a picture displayed by the main control terminal through the video passing module, the picture has 6 faces in total, and when the main control terminal needs to designate one of the faces, an activation instruction is sent to the event triggering module in the video system, so that the image recognition module, the image acquisition module and the voice recognition module are activated; the video system acquires the picture through the image acquisition module, identifies each face in the picture by using the image recognition module, and allocates a number to each face, so that the number is displayed on the picture, as shown in fig. 7, the picture is displayed on each terminal, the main control end can input the voice of the portrait to be designated according to the picture, for example, the main control end inputs "number 6", the voice recognition module is called to process the voice, and if the number 6 is identified, the portrait with the number 6 is subjected to highlighting processing, and the portraits with the numbers 1 to 5 are subjected to stealth processing, specifically, lines of the portrait with the number 6 are thickened, lines of the remaining portraits are lightened, and corresponding numbers are hidden, specifically, as shown in fig. 8, the pictures shown in fig. 8 are displayed between the users 1 to 4, and therefore, the personnel participating in the conference can obviously know that the face to be designated by the main control end is the face with the number 6.
Preferably, the teacher can also use the method provided by the present invention to give lessons online, as shown in fig. 5, user 1 is the teacher, and users 2 to 4 are all students. The student can see the picture shared by the teacher and listen to the sound of the teacher through the video system, and the picture shared by the teacher is shown in fig. 6. When a teacher introduces a character on a show, the use of a traditional description designation method may cause a situation that students find out by mistake, because there are a few characters in the picture. Therefore, the method provided by the invention can be used, and specifically comprises the following steps: the teacher sends an activation instruction to the event triggering module, so that the image recognition module, the image acquisition module and the voice recognition module are activated, the image acquisition module acquires a currently displayed picture, namely, the picture 6 is called, the picture 6 is processed by the image recognition module, the picture shown in the picture 7 is obtained, then the picture shown in the picture 7 is displayed at the terminal of the teacher and the terminal of a student, the voice input by the teacher through the terminal is obtained, the voice is processed by the voice recognition module, then the portrait specified by the teacher is determined, when the portrait specified by the teacher is the portrait with the serial number of 6, the picture is processed, the picture 8 is obtained, the picture 8 is displayed at the terminal of the teacher and the terminal of the student, and therefore, the specified object can be more striking, and the situation of mistaken recognition is reduced. Preferably, the teacher may specify that the student describes a person in the picture, and the teacher needs to give the terminal used by the specified student a permission for voice input, for example, the teacher specifies that the student of the user 3 describes any person in the picture, and the teacher gives the terminal of the student of the user 3 a permission for voice input, and then the system receives the voice input by the student, processes the voice, identifies the person to be described by the student, and then processes the picture and displays the picture on each terminal.
The scheme provided by the invention can be applied to people through remote video calls and conferences, can ensure that the sight focuses of all users are consistent and unanimous, obtains pictures concerned by the sight of people, obtains the outlines of all objects in the pictures through image recognition, automatically generates a unique number for each object, further enables the objects to be more striking through modifying the color of the coordinate point of the outline, automatically improves the color of the outline corresponding to the number N to be more striking when a voice recognition module recognizes that the user speaks a keyword number N, hides or reduces other numbers, focuses the sight of all users on the same number, and reduces or eliminates the possibility of misjudgment.
Under the scene of meeting the real-time video call, the method can uniformly, accurately and effectively describe the target object in the picture, so that the sight lines of all participants are focused on the target, and the method is more practical; the situation that the sight focus of people cannot be effectively synchronized due to different subjective feelings or unclear description and the like when people describe objects is reduced; the problem that information confusion is caused by the fact that the color, the size, the shape and the like cannot be quantized by language when people describe the picture again is avoided, and the problem that the participants cannot be accurately distinguished due to the problems of color blindness, color weakness and the like is also avoided. The object appointed by the speaker can be accurately identified when people participate in the conference, and the condition of misunderstanding is avoided.
The embodiment of the invention provides a picture processing device, which can be configured in a video system and is used for supporting the specific implementation of the method shown in fig. 1.
Referring to fig. 9, a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention is specifically described as follows:
a first obtaining unit 501, configured to obtain a picture shared in a current online conference;
an identifying unit 502 for identifying the respective objects in the picture and an object contour of each of the objects;
a marking unit 503, configured to display each object contour on the picture, and mark a corresponding contour number for each object contour on the picture to obtain a marked picture;
a determining unit 504, configured to determine each participant terminal of the online conference, and display the tag picture at each participant terminal;
a second obtaining unit 505, configured to obtain an object specifying voice corresponding to the tagged picture, and extract key information from the object specifying voice;
a presentation unit 506, configured to determine a target object contour among the object contours based on the key information, and highlight the target object contour and a contour number of the target object contour on each of the participant terminals.
In the device provided by the embodiment of the invention, the picture shared in the current online meeting is obtained; identifying each object in the picture and an object outline of each object; displaying each object contour on a picture, and marking a corresponding contour number for each object contour on the picture to obtain a marked picture; determining each participating terminal of the online conference, and displaying the marked picture at each participating terminal; acquiring object specified voice corresponding to the marked picture, and extracting key information from the object specified voice; and determining the target object contour in each object contour based on the key information, and highlighting the target object contour and the contour number of the target object contour on each participant terminal. The invention can highlight the outline of the object to be appointed, is convenient for the personnel participating in the conference to quickly and accurately determine the object to be appointed, reduces the probability of mistakenly recognizing the object, and further ensures that the participating personnel can accurately receive information.
In another apparatus provided in the embodiment of the present invention, the identifying unit 502 includes:
an acquisition subunit, configured to acquire an object identification parameter;
the first determining subunit is used for determining a picture identification algorithm corresponding to the object identification parameters;
and the second determining subunit is used for processing the picture by using the picture recognition algorithm and determining each object in the picture and the object outline of each object.
In another apparatus provided in the embodiment of the present invention, the second obtaining unit 505 includes:
the receiving subunit is used for receiving the voice acquisition instruction;
the third determining subunit is used for determining a voice input terminal in each participating terminal based on the voice acquisition instruction;
and the fourth determining subunit is used for determining the voice input by the voice input terminal based on the marked picture as the object specified voice.
In another apparatus provided in the embodiment of the present invention, the second obtaining unit 505 includes:
a conversion subunit, configured to convert the object specifying voice into text information;
the word segmentation processing subunit is used for carrying out word segmentation processing on the text information to obtain each description word;
and the screening subunit is used for screening the description participles and determining the description participles obtained by screening as key information.
In another apparatus provided by the embodiment of the present invention, the display unit 506 includes:
the processing subunit is configured to perform highlighting processing on the target object contour and the contour number of the target object contour on the tag image, and perform hiding processing on the remaining object contour and the contour number to obtain a processed image;
and the display subunit is used for displaying the processed pictures on each participating terminal.
The embodiment of the invention also provides a storage medium, which comprises a stored instruction, wherein when the instruction runs, the device where the storage medium is located is controlled to execute the image processing method.
An embodiment of the present invention further provides an electronic device, which is shown in fig. 10 and specifically includes a memory 601 and one or more instructions 602, where the one or more instructions 602 are stored in the memory 601 and configured to be executed by one or more processors 603 to perform the following operations for the one or more instructions 602:
the specific implementation procedures and derivatives thereof of the above embodiments are within the scope of the present invention.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments, which are substantially similar to the method embodiments, are described in a relatively simple manner, and reference may be made to some descriptions of the method embodiments for relevant points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An image processing method, comprising:
acquiring a picture shared in a current online conference;
identifying objects in the picture and an object contour for each of the objects;
displaying each object outline on the picture, and marking a corresponding outline number for each object outline on the picture to obtain a marked picture;
determining each participant terminal of the online conference, and displaying the marked picture at each participant terminal;
acquiring object specified voice corresponding to the marked picture, and extracting key information from the object specified voice;
and determining a target object contour in each object contour based on the key information, and highlighting the target object contour and the contour number of the target object contour on each participant terminal.
2. The method of claim 1, wherein said identifying objects in said picture and an object contour for each of said objects comprises:
acquiring object identification parameters;
determining a picture identification algorithm corresponding to the object identification parameters;
and processing the picture by using the picture recognition algorithm, and determining each object in the picture and the object outline of each object.
3. The method according to claim 1, wherein the acquiring of the object specifying voice corresponding to the tag picture comprises:
receiving a voice acquisition instruction;
determining a voice input terminal in each participant terminal based on the voice acquisition instruction;
and determining the voice input by the voice input terminal based on the marked picture as the object specified voice.
4. The method according to claim 1, wherein the extracting key information from the object-specific speech includes:
converting the object-specific speech into text information;
performing word segmentation processing on the text information to obtain each description word;
and screening each description participle, and determining the screened description participle as key information.
5. The method of claim 1, wherein said highlighting the target object contour and the contour number of the target object contour on each of the participant terminals comprises:
highlighting the target object contour and the contour number of the target object contour on the marked picture, and hiding the rest object contour and the contour number to obtain a processed picture;
and displaying the processed pictures on each participating terminal.
6. A picture processing apparatus, comprising:
the first acquisition unit is used for acquiring a picture shared in a current online conference;
the identification unit is used for identifying each object in the picture and the object outline of each object;
the marking unit is used for displaying each object outline on the picture and marking a corresponding outline number for each object outline on the picture to obtain a marked picture;
the determining unit is used for determining each participating terminal of the online conference and displaying the marked picture on each participating terminal;
a second acquisition unit, configured to acquire an object specifying voice corresponding to the tag picture, and extract key information from the object specifying voice;
and the display unit is used for determining a target object contour in each object contour based on the key information and highlighting the target object contour and the contour number of the target object contour on each participant terminal.
7. The apparatus of claim 6, wherein the identification unit comprises:
an acquisition subunit, configured to acquire an object identification parameter;
the first determining subunit is used for determining a picture identification algorithm corresponding to the object identification parameters;
and the second determining subunit is used for processing the picture by using the picture recognition algorithm and determining each object in the picture and the object outline of each object.
8. The apparatus of claim 6, wherein the second obtaining unit comprises:
the receiving subunit is used for receiving the voice acquisition instruction;
the third determining subunit is used for determining a voice input terminal in each participating terminal based on the voice acquisition instruction;
and the fourth determining subunit is used for determining the voice input by the voice input terminal based on the marked picture as the object specified voice.
9. A storage medium comprising stored instructions, wherein the instructions, when executed, control a device on which the storage medium resides to perform the picture processing method according to any one of claims 1 to 5.
10. An electronic device comprising a memory, and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by the one or more processors to perform the picture processing method according to any one of claims 1-5.
CN202211490584.6A 2022-11-25 2022-11-25 Picture processing method and device, storage medium and electronic equipment Pending CN115761266A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211490584.6A CN115761266A (en) 2022-11-25 2022-11-25 Picture processing method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211490584.6A CN115761266A (en) 2022-11-25 2022-11-25 Picture processing method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN115761266A true CN115761266A (en) 2023-03-07

Family

ID=85338013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211490584.6A Pending CN115761266A (en) 2022-11-25 2022-11-25 Picture processing method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN115761266A (en)

Similar Documents

Publication Publication Date Title
CN112075075A (en) Computerized intelligent assistant for meetings
WO2016048579A1 (en) Method and apparatus to synthesize voice based on facial structures
KR102178176B1 (en) User terminal, video call apparatus, video call sysyem and method of controlling thereof
CN112148922A (en) Conference recording method, conference recording device, data processing device and readable storage medium
CN110418095A (en) Processing method, device, electronic equipment and the storage medium of virtual scene
JP2011065467A (en) Conference relay device and computer program
CN111063355A (en) Conference record generation method and recording terminal
CN110767005A (en) Data processing method and system based on intelligent equipment special for children
JP2018174439A (en) Conference support system, conference support method, program of conference support apparatus, and program of terminal
KR102412823B1 (en) System for online meeting with translation
KR20110060039A (en) Communication robot and controlling method therof
JP7119615B2 (en) Server, sound data evaluation method, program, communication system
CN110491384B (en) Voice data processing method and device
CN115761266A (en) Picture processing method and device, storage medium and electronic equipment
US20230061210A1 (en) Method and system of automated question generation for speech assistance
JP7130290B2 (en) information extractor
Sindoni Multimodality and Translanguaging in Video Interactions
CN115171673A (en) Role portrait based communication auxiliary method and device and storage medium
US20230274101A1 (en) User terminal, broadcasting apparatus, broadcasting system comprising same, and control method thereof
CN112820265B (en) Speech synthesis model training method and related device
CN111160051B (en) Data processing method, device, electronic equipment and storage medium
CN112673423A (en) In-vehicle voice interaction method and equipment
JP2020052846A (en) Drawing system, drawing method, and program
CN117135305B (en) Teleconference implementation method, device and system
CN111901552B (en) Multimedia data transmission method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination