CN119854600A

CN119854600A - Video creation method and system

Info

Publication number: CN119854600A
Application number: CN202411959441.4A
Authority: CN
Inventors: 白浩泉
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2024-12-27
Filing date: 2024-12-27
Publication date: 2025-04-18

Abstract

The embodiment of the present application discloses a video creation method and system. The method includes: obtaining a material search request sent by a requesting object; forwarding the material search request to a server, so that the server searches for video materials according to the material search request, and obtains video material search results. If the copyright information of the first video material corresponding to the video material search results is the first copyright information, the visual data corresponding to the video material search results is returned to the client; local video material search is performed according to the visual data returned by the server to obtain the second video material, and the second video material is displayed to the requesting object, so that the requesting object creates a video according to the second video material. The solution provided by the embodiment of the present application can help the requesting object create high-quality videos and improve the user experience. In addition, the present application realizes rapid retrieval of video materials through end-cloud collaboration, which effectively improves the speed and accuracy of material retrieval during video creation.

Description

Video creation method and system

Technical Field

The embodiment of the application relates to the technical field of Internet, in particular to a video creation method and system.

Background

With the rise of short video platforms, more and more users participate in video production, however, videos authored by users have low quality, cannot be attracted to other users for viewing, and are easy to infringe. Therefore, how to quickly and author high quality video becomes an urgent issue for users to solve.

Disclosure of Invention

The present application has been developed in response to the above-discussed problems, and in order to provide a video authoring method, system, computing device, computer storage medium, and computer program product that overcome, or at least partially solve, the above-discussed problems.

According to an aspect of an embodiment of the present application, there is provided a video authoring method, the method being applied to a client, the method including:

Acquiring a material search request sent by a request object;

Forwarding the material search request to a server so that the server searches video materials according to the material search request to obtain a video material search result, and returning visual data corresponding to the video material search result to the client if the copyright information of the first video material corresponding to the video material search result is the first copyright information;

And searching the local video material according to the visual data returned by the server to obtain a second video material, and displaying the second video material to the request object so that the request object performs video creation according to the second video material.

Further, the server judges that the copyright information of the first video material corresponding to the video material searching result is the second copyright information, and returns the first video material to the client;

And receiving the first video material returned by the server, and displaying the first video material to the request object so that the request object performs video creation according to the first video material.

Further, the method includes extracting video frames from the video material if the video material includes video, vectorizing the extracted video frames using a pre-trained multimodal vector model to obtain image feature vectors, and/or

If the video material contains images, vectorizing the images by utilizing a pre-trained multi-modal vector model to obtain image feature vectors, wherein the multi-modal vector model is obtained based on image-text pair training;

The server is used for storing all or part of information including video materials, video material identifications, extracted video frames, image feature vectors and occurrence time stamps and copyright information of the video frames corresponding to the image feature vectors in a video in an associated mode, and the client is used for storing all or part of information including the video materials, the video material identifications, the extracted video frames and the image feature vectors in a local associated mode.

Further, the material search request carries a search keyword;

the server searches the video material according to the material search request, and the obtaining of the video material search result further comprises:

The server uses a pre-trained multi-modal vector model to carry out vectorization processing on the search keywords to obtain feature vectors of the search keywords;

Searching the video material according to the characteristic vector of the search keyword to obtain a first image characteristic vector, a video material identifier and an appearance time stamp of a video frame corresponding to the image characteristic vector in the video;

the returning of the visual data corresponding to the video material searching result to the client further comprises:

and returning the first image feature vector to the client or returning the first image corresponding to the first image feature vector to the client.

Further, searching the local video material according to the visual data returned by the server, and obtaining the second video material further includes:

Calculating the similarity between the first image feature vector and the image feature vector locally stored by the client, and screening a second image feature vector matched with the first image feature vector according to the similarity;

And determining a second video material corresponding to the second image feature vector.

Vectorizing a first image returned by a server by utilizing a pre-trained multi-mode vector model to obtain a first image feature vector;

According to another aspect of the embodiment of the application, a video authoring system is provided, the system comprises a client and a server, wherein the client comprises an acquisition module, a forwarding module, a searching module and a display module;

the acquisition module is suitable for acquiring a material search request sent by a request object;

The forwarding module is suitable for forwarding the material searching request to the server;

the server is suitable for searching the video materials according to the material searching request to obtain a video material searching result, and if the copyright information of the first video material corresponding to the video material searching result is the first copyright information, visual data corresponding to the video material searching result is returned to the client;

the searching module is suitable for searching the local video material according to the visual data returned by the server to obtain a second video material;

And the display module is suitable for displaying the second video material to the request object so that the request object carries out video creation according to the second video material.

According to yet another aspect of an embodiment of the present application, there is provided a computing device including a processor, a memory, a communication interface, and a communication bus through which the processor, the memory, and the communication interface communicate with each other;

The memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the video creation method.

According to still another aspect of the embodiments of the present application, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the video authoring method described above.

According to yet another aspect of embodiments of the present application, there is provided a computer program product comprising at least one executable instruction for causing a processor to perform operations corresponding to the video authoring method described above.

According to the video creation method and system provided by the embodiment of the application, the material searching efficiency can be improved by vectorizing the image, the first video material corresponding to the video material searching result is subjected to copyright identification, the server only returns visual data corresponding to the video material searching result without transmitting video files under the condition that the copyright information corresponding to the first video material is the first copyright information, thereby helping to request avoidance of copyright risks, avoiding legal disputes caused by using unauthorized materials, returning the corresponding video material to the client under the condition that the copyright information corresponding to the first video material is the second copyright information, finally, the client can perform local search according to the visual data returned by the server, search to obtain the video material closest to the request object searching intention and display the video material to the request object so that the client can create videos by utilizing the searched video material, thereby helping the request object create videos with high quality, improving user experience, and realizing quick video retrieval by means of end cloud cooperation, effectively improving the speed and accuracy of video material retrieval in the video creation process.

The foregoing description is only an overview of the technical solutions of the embodiments of the present application, and may be implemented according to the content of the specification, so that the technical means of the embodiments of the present application can be more clearly understood, and the following specific implementation of the embodiments of the present application will be more apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 illustrates a signaling interaction diagram of a video authoring method in accordance with one embodiment of the present application;

fig. 2 shows a signaling interaction diagram of a video authoring method according to another embodiment of the present application;

FIG. 3 shows a block diagram of a video authoring system in accordance with one embodiment of the present application;

FIG. 4 illustrates a schematic diagram of a computing device, according to one embodiment of the application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

First, terms related to one or more embodiments of the present application will be explained.

The end cloud collaboration refers to the collaboration between the client and the cloud server, and aims to provide stronger computing capacity and resource support so as to realize various applications and functions. In the mode, the client can communicate with the cloud server to complete part of calculation tasks, data processing, storage and other works, so that the burden of the client is reduced, energy and resources are saved, and more complex tasks can be executed under limited hardware resources.

The material library is a resource set, which provides resources such as images, pictures and the like for video production, and is used for users to download or directly use.

Vectorization, the process of converting data or information into vectors for processing and analysis in a computer. Through vectorization, different types of data (such as text and images) can be uniformly represented as numerical vectors, so that a computer can conveniently perform mathematical operation and statistical analysis. In the field of machine learning and data analysis, vectorization is often used to convert raw data into feature vectors for training models or pattern recognition.

And the video material searching and matching refers to a process of quickly searching for the original image or image resources related to the required content through keywords or specific screening conditions.

CLIP (Contrastive Language-IMAGE PRETRAINING) is an innovative deep learning model, cross-modal information matching is achieved through joint learning of images and texts, the image learning model has excellent image understanding and text understanding capabilities, the image learning model can be used for various tasks such as image classification, text description generation and image generation, and the unique contrast learning method enables the image learning model to show strong potential and wide application prospects in the fields of computer vision and natural language processing.

Fig. 1 shows a signaling interaction diagram of a video authoring method according to one embodiment of the present application, as shown in fig. 1, the method comprising the steps of:

In step S101, the client obtains a material search request sent by the request object, and forwards the material search request to the server.

The client may be an application on the terminal, such as a video authoring class APP. The request object can be any user with video creation requirement, when the user performs video creation and may need to search video materials, at this time, the APP can provide a search page for the user, where the search page includes a search box, and the user can input a corresponding search keyword in the search box, so as to trigger a material search request. The client acquires a material search request sent by a request object, forwards the material search request to a server, namely a cloud server, and the cloud server firstly executes an actual search task.

Step S102, the server searches the video materials according to the material search request to obtain a video material search result, if the copyright information of the first video material corresponding to the video material search result is the first copyright information, visual data corresponding to the video material search result is returned to the client, and if the copyright information of the first video material corresponding to the video material search result is the second copyright information, the first video material is returned to the client.

After receiving the material search request forwarded by the client, the server searches video materials in a database in the server according to the material search request, and can obtain video material search results through searching, wherein the video material search results are different according to different information stored in the database in the server, and can be images or image feature vectors, for example.

In general, each video material has a corresponding copyright, and the copyright information of the video material stored in the server may be first copyright information or second copyright information, wherein the first copyright information is that the pointer obtains video playing authorization of the copyright party for the video material, but no recomposition authorization is obtained, that is, the user cannot perform video authoring based on the video material, for such video material, the server may store the corresponding video material but cannot provide the user with authoring;

The video material copyright information is different, and the subsequent processing modes are different, so that the server can firstly inquire and determine what the copyright information of the first video material corresponding to the video material searching result is, if the copyright information of the first video material corresponding to the video material searching result is the first copyright information, visual data corresponding to the video material searching result is returned to the client, wherein the visual data can be the video material searching result itself or an image corresponding to the video material searching result, and if the copyright information of the first video material corresponding to the video material searching result is the second copyright information, the first video material is returned to the client.

Step S103, the client receives the first video material returned by the server and displays the first video material to the request object so that the request object performs video creation according to the first video material, or performs local video material search according to the visual data returned by the server to obtain a second video material and displays the second video material to the request object so that the request object performs video creation according to the second video material.

The client performs different processing on different contents returned by the server, if the client receives the first video material from the server, the client can directly display the first video material to the request object, for example, the client can generate a display page in which the first video material is displayed, so that the request object performs video creation according to the first video material;

If the client receives the visual data from the server, the client further needs to perform a video material search, specifically, the client performs a local video material search according to the visual data returned by the server, where the local video material search refers to searching performed locally on the client, the search refers to the video material of the client that the request object is imported to, and the second video material can be obtained through searching, and then the client can display the second video material to the request object, for example, the client can generate a display page, and display the second video material in the display page, so that the request object performs video creation according to the second video material. In this embodiment, the video material may be video and/or images.

According to the video creation method provided by the embodiment of the application, firstly, after receiving a material search request of a request object, a client forwards the material search request to a server, the server searches according to the material search request, and under the condition that the copyright information corresponding to a first video material corresponding to a video material search result is the first copyright information, the server only returns visual data corresponding to the video material search result and does not transmit video files, so that the request to avoid copyright risk can be helped, legal disputes are avoided due to the use of unauthorized materials, under the condition that the copyright information corresponding to the first video material is the second copyright information, the corresponding video material is returned to the client, finally, the client can perform local search according to the visual data returned by the server, search to obtain the video material closest to the request object search intention and display the video material closest to the request object (the video material obtained by the local search or the video material returned by the server) so that the client can help the request object to create a video by using the searched video material, the video of the request object, the user experience is improved, the collaborative search of the video material is realized, and the collaborative search method is fast, the cloud creation is realized, and the accuracy is improved.

Fig. 2 shows a signaling interaction diagram of a video authoring method according to another embodiment of the present application, as shown in fig. 2, the method comprising the steps of:

In step S201, the server extracts video frames from videos of a server material library, performs vectorization processing on the extracted video frames by utilizing a pre-trained multi-mode vector model to obtain image feature vectors, and/or performs vectorization processing on images in the material library by utilizing the pre-trained multi-mode vector model to obtain the image feature vectors, wherein the multi-mode vector model is obtained by training on the basis of graphics context, and the material library is used for storing video materials, video material identifiers and copyright information in an associated manner.

Specifically, a lot of video materials, such as videos and/or images, are introduced into a server material library in batches, in order to facilitate subsequent video material searching, vectorization processing is required to be performed on the video materials in the server material library, since the video contains a lot of video frames, in order to improve the video material searching efficiency, part of the video frames can be extracted from the video, for example, key frames are extracted or one video frame is extracted every preset time period, or one video frame is extracted every preset number of video frames, then the extracted video frames are input into a pre-trained multi-mode vector model deployed by a server, and vectorization processing is performed on the extracted video frames by the pre-trained multi-mode vector model to obtain an image feature vector. The pre-trained multimodal vector model is obtained based on image-text pair training, and can be, for example, an open-source CLIP model, or a model obtained by fine tuning the open-source CLIP model by using task-related image-text.

And for the case that the video material contains images, the server inputs the images into a multi-modal vector model, and the pre-trained multi-modal vector model carries out vectorization processing on the images in the material library to obtain image feature vectors. In order to facilitate the subsequent identification of whether the video material has copyright, the video material identification and the copyright information can be stored in a material library in an associated manner. The copyright information reflects the copyright condition of the video material, wherein the copyright information can be the second copyright information or the first copyright information.

In step S202, the server stores the image feature vector, the video material identifier, and the timestamp of the video frame corresponding to the image feature vector in the video in association with the cloud vector database.

After vectorization processing is performed, the server stores the image feature vector, the video material identifier and the occurrence time stamp of the video frame corresponding to the image feature vector in the video to the cloud vector database, and the server material database and the cloud vector database are associated through the video material identifier, wherein the occurrence time stamp of the video frame corresponding to the image feature vector in the video refers to the occurrence time point of the video frame in the video frame, and for the case that the video material is an image, the occurrence time stamp of the video frame corresponding to the image feature vector in the video does not need to be stored, and for the case that the video material is a video, the occurrence time stamp of the video frame corresponding to the image feature vector in the video needs to be stored. The storing of the appearance time stamp of the video frame corresponding to the image feature vector in the video is to facilitate determining the image corresponding to the image feature vector, and under the condition that the server material library does not store the image, the corresponding image is conveniently extracted from the corresponding video according to the appearance time stamp of the video frame corresponding to the image feature vector in the video.

In this embodiment, the server material library and the cloud vector database may be combined into one database, for example, the image feature vector, the video material identifier, the timestamp of the appearance of the video frame corresponding to the image feature vector in the video, the video material, and the copyright information are all stored in the server material library. Or the server material library is used for storing the image feature vector, the appearance time stamp of the video frame corresponding to the image feature vector in the video, the video material and the copyright information in an associated manner.

In addition, the server material library can also store the extracted video frames, and at this time, the appearance time stamp of the video frames corresponding to the image feature vectors in the video can not be stored any more.

In step S203, the client extracts video frames from the videos of the local material library, and performs vectorization processing on the extracted video frames by using a pre-trained multi-mode vector model to obtain image feature vectors, and/or the client performs vectorization processing on images in the material library by using the pre-trained multi-mode vector model to obtain image feature vectors, wherein the local material library is used for storing video materials and video material identifications in an associated manner.

In this embodiment, the client also has pre-deployed a pre-trained multimodal vector model locally and indexes the client local library. The specific embodiment of this step is similar to step S201, and will not be described here again. In this step, the user may first import video material into the local material library in bulk.

In step S204, the client stores the image feature vector and the video material identifier in association with the local vector database.

In the embodiment, the client uses a local material library and a local vector database to store information, wherein the local material library is used for storing video materials and video material identifications in an associated mode, the local vector database is used for storing image feature vectors and video material identifications in an associated mode, and the local material library and the local vector database of the client are indexed in an associated mode through the video material identifications.

In step S205, the client obtains the material search request sent by the request object, and forwards the material search request to the server.

In step S206, the server uses the pre-trained multi-modal vector model to vectorize the search keywords, and obtains the feature vectors of the search keywords.

Specifically, the material search request forwarded by the client carries the search keyword, so that the server can extract the search keyword from the material search request, input the search keyword into the pre-trained multi-modal vector model, and perform vectorization processing on the search keyword by the multi-modal vector model to obtain the feature vector of the search keyword. The multi-mode vector model is obtained by training image-text pairs, so that the model well learns the internal relation between the image and the text, and the image features and the text features can be aligned in the same space, so that the image feature vector can be searched based on the search keyword feature vector.

Step S207, the server queries a cloud vector database according to the search keyword feature vector to obtain a first image feature vector matched with the search keyword feature vector, a video material identifier, an appearance time stamp of a video frame corresponding to the image feature vector in the video, queries a server material library according to the video material identifier to determine copyright information corresponding to a first video material corresponding to the video material identifier, returns the first video material to the client if the copyright information is second copyright information, extracts a video frame from the video corresponding to the video material identifier according to the appearance time stamp of the video frame corresponding to the image feature vector in the video if the copyright information is first copyright information, determines the extracted video frame as a first image or an image corresponding to the first image feature vector as the first image, and returns the first image to the client.

After the server performs quantization processing on the search keyword, a cloud vector database can be queried according to the feature vector of the search keyword, the feature vector of the search keyword is matched with the image feature vector in the cloud vector database, for example, the similarity between the feature vector of the search keyword and the image feature vector stored in the cloud vector database can be calculated, a first image feature vector matched with the feature vector of the search keyword is obtained through screening according to the similarity, for example, the image feature vector with the similarity being greater than or equal to a first preset similarity threshold is screened, the screened image feature vector is used as the first image feature vector, or N image feature vectors with the highest similarity are selected as the first image feature vector.

In general, each video material has a corresponding copyright, and the copyright information of the video material stored in the server may be first copyright information or second copyright information, where the first copyright information is that the pointer obtains video playing authorization of the copyright party for the video material, but no recomposition authorization is obtained, i.e., the user cannot perform video authoring based on the video material, for such video material, the server may store the corresponding video material but cannot provide the user with authoring;

The copyright information of the video material is different, and the subsequent processing modes are different, so that after the matched first image feature vector is determined, the associated video material identification and the appearance timestamp of the video frame corresponding to the image feature vector in the video can be determined by inquiring the cloud vector database, then, the server material library is inquired according to the video material identification, whether the copyright information associated with the video material identification is the second copyright information or the first copyright information is determined, and if the copyright information is the second copyright information, the first video material is returned to the client.

If the copyright information is the first copyright information, extracting the video frame from the video corresponding to the video material identifier according to the appearance timestamp of the video frame corresponding to the image feature vector in the video if the video material corresponding to the first image feature vector is the video, then determining the extracted video frame as a first image, and returning the first image to the client. If the video material corresponding to the first image feature vector is an image, the image corresponding to the first image feature vector can be directly determined to be the first image, and the first image is returned to the client. In the case that the video material is an image, the first image feature vector is preferably returned to the client, so that the image corresponding to the first image feature vector is prevented from being directly returned to the client.

In an alternative embodiment, in a case where the server material stores the extracted video frames corresponding to the image feature vectors, the extracted video frames corresponding to the image feature vectors may be directly determined as the first image and returned to the client.

In the case of storing the extracted video frames corresponding to the image feature vectors for the server material, the image corresponding to the first image feature vector that matches the search keyword feature vector may be directly used as the search result.

In an optional implementation manner, the server may further return the first image feature vector to the client, so that the client may directly perform local material search according to the first image feature vector returned by the server, thereby improving material search efficiency.

Step S208, the client receives the first video material returned by the server and displays the first video material to the request object so that the request object performs video creation according to the first video material, or the client performs vectorization processing on the first image returned by the server by using a pre-trained multi-mode vector model to obtain a first image feature vector, calculates the similarity between the first image feature vector and the image feature vector locally stored by the client, screens a second image feature vector matched with the first image feature vector according to the similarity, determines a second video material corresponding to the second image feature vector, and displays the second video material to the request object so that the request object performs video creation according to the second video material.

If the client receives the first image from the server, the client further needs to perform video material searching, specifically, the client inputs the first image to a locally deployed pre-trained multi-mode vector model, the multi-mode vector model performs vectorization processing on the first image to obtain a first image feature vector, and then performs local material searching by using the first image feature vector, wherein the local video material searching refers to searching locally at the client, and the searching refers to searching locally at the client, wherein the searching refers to the video material of which the request object is imported to locally at the client, for example, similarity between the first image feature vector and the locally stored image feature vector of the client can be calculated, a second image feature vector matched with the first image feature vector is selected according to the similarity, for example, the image feature vector with the similarity greater than or equal to a second preset similarity threshold is selected, and the selected image feature vector with the highest similarity is used as the second image feature vector, wherein the second preset similarity threshold can be the same as the first preset similarity or different from the first preset similarity threshold, and the second preset similarity threshold can be the same as or different from the first preset similarity threshold; determining a second video material corresponding to the second image feature vector, and displaying the second video material to the request object, so that the request object performs video creation according to the second video material, for example, the request object may query a local material library through a video material identifier corresponding to the second image feature vector to obtain a second video material associated with the video material identifier in the local material library, and then the client may display the second video material to the request object, for example, the client may generate a display page, and displaying the second video material in the display page, so that the request object performs video creation according to the second video material.

In an alternative embodiment, if the client receives the first image feature vector from the server, the first image feature vector may be directly used to perform local material search, for example, calculate a similarity between the first image feature vector and an image feature vector locally stored in the client, filter a second image feature vector matched with the first image feature vector according to the similarity, and determine a second video material corresponding to the second image feature vector.

According to the video creation method provided by the embodiment of the application, the material searching efficiency can be improved by vectorizing the image, the first video material corresponding to the video material searching result is subjected to copyright identification, the server only returns visual data corresponding to the video material searching result without transmitting video files under the condition that the copyright information corresponding to the first video material is not the first copyright information, thereby helping to request avoidance of copyright risks, avoiding legal disputes caused by using unauthorized materials, returning the corresponding video material to the client under the condition that the copyright information corresponding to the first video material is the second copyright information, finally, the client can perform local search according to the visual data returned by the server to obtain the video material closest to the searching intention of the request object and display the video material to the request object so that the server can help the request object create high-quality video by utilizing the video of the searched video material.

FIG. 3 shows a block diagram of a video authoring system in accordance with one embodiment of the present application, as shown in FIG. 3, the system includes a client 310 and a server 320, wherein the client includes an acquisition module 311, a forwarding module 312, a search module 313, a presentation module 314;

the acquisition module 311 is adapted to acquire a material search request sent by a request object;

A forwarding module 312 adapted to forward the material search request to a server;

the server 320 is adapted to search video materials according to the material search request to obtain a video material search result, and if the copyright information of the first video material corresponding to the video material search result is the first copyright information, return visual data corresponding to the video material search result to the client;

The searching module 313 is adapted to search the local video material according to the visual data returned by the server to obtain a second video material;

The display module 314 is adapted to display the second video material to the request object, so that the request object performs video authoring according to the second video material.

Optionally, the server is further adapted to determine that the copyright information of the first video material corresponding to the video material search result is the second copyright information, and then return the first video material to the client;

The display module is further adapted to receive the first video material returned by the server and display the first video material to the request object, so that the request object performs video authoring according to the first video material.

Optionally, the client further comprises a vectorization processing module, the server or the vectorization processing module is further adapted to extract video frames from the video material if the video material contains video, perform vectorization processing on the extracted video frames by using a pre-trained multi-modal vector model to obtain image feature vectors, and/or

Optionally, the material search request carries a search keyword;

The server is further adapted to perform vectorization processing on the search keywords by utilizing the pre-trained multi-modal vector model to obtain feature vectors of the search keywords;

Optionally, the search module is further adapted to calculate a similarity between the first image feature vector and an image feature vector locally stored by the client, and to filter a second image feature vector matching the first image feature vector according to the similarity;

Optionally, the searching module is further adapted to perform vectorization processing on the first image returned by the server by using the pre-trained multi-modal vector model to obtain a first image feature vector;

The above descriptions of the modules refer to the corresponding descriptions in the method embodiments, and are not repeated herein.

According to the video creation system provided by the embodiment of the application, the material searching efficiency can be improved by vectorizing the image, the first video material corresponding to the video material searching result is subjected to copyright identification, the server only returns visual data corresponding to the video material searching result without transmitting video files under the condition that the first video material does not have second copyright information, so that the request for avoiding copyright risks can be helped, legal disputes are avoided due to the use of unauthorized materials, the corresponding video material is returned to the client under the condition of the second copyright information, finally, the client can perform local searching according to the visual data returned by the server, search to obtain the video material closest to the searching intention of the request object and display the video material to the request object so as to create the video by using the searched video material, and therefore, the video creation system can help the request object create high-quality video, and improve user experience.

The embodiment of the application provides a non-volatile computer storage medium, which stores at least one executable instruction or a computer program, and the executable instruction or the computer program can cause a processor to execute operations corresponding to the video authoring method in any of the above method embodiments.

Embodiments of the present application provide a computer program product comprising at least one executable instruction or a computer program, where the executable instruction or the computer program may cause a processor to perform operations corresponding to the video authoring method in any of the above-described method embodiments.

FIG. 4 illustrates a schematic diagram of an embodiment of a computing device of the present application, and the embodiments of the present application are not limited to a particular implementation of the computing device.

As shown in FIG. 4, the computing device may include a processor 402, a communication interface (Communications Interface) 404, a memory 406, and a communication bus 408.

Wherein processor 402, communication interface 404, and memory 406 communicate with each other via communication bus 408. A communication interface 404 for communicating with network elements of other devices, such as clients or other servers. Processor 402 is configured to execute program 410 and may specifically perform the relevant steps described above in the video authoring method embodiment for a computing device.

In particular, program 410 may include program code including computer-operating instructions.

The processor 402 may be a central processing unit CPU, or an Application-specific integrated Circuit ASIC (Application SPECIFIC INTEGRATED Circuit), or one or more integrated circuits configured to implement embodiments of the present application. The computing device may include one or more processors of the same type, such as one or more CPUs, or of different types, such as one or more CPUs and one or more ASICs.

Memory 406 for storing programs 410. Memory 406 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

Program 410 may be specifically operative to cause processor 402 to perform the video authoring method of any of the method embodiments described above. The specific implementation of each step in the procedure 410 may refer to the corresponding descriptions in the corresponding steps and units in the above video authoring embodiment, which are not repeated herein. It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and modules described above may refer to corresponding procedure descriptions in the foregoing method embodiments, which are not repeated herein.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, embodiments of the present application are not directed to any particular programming language. It will be appreciated that the teachings of embodiments of the present application described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the embodiments of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the above description of exemplary embodiments of the application, various features of the embodiments of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed embodiments of the application require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of embodiments of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in accordance with embodiments of the present application may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). Embodiments of the present application may also be implemented as a device or apparatus program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the embodiments of the present application may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Embodiments of the application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

Claims

1. A video creation method, the method being applied to a client, the method comprising:

Get the material search request sent by the request object;

forwarding the material search request to a server, so that the server searches for video materials according to the material search request to obtain a video material search result, and if the copyright information of the first video material corresponding to the video material search result is the first copyright information, returning visual data corresponding to the video material search result to the client;

A local video material search is performed based on the visual data returned by the server to obtain a second video material, and the second video material is displayed to the requesting object so that the requesting object can create a video based on the second video material.

2. The method according to claim 1, wherein the method further comprises: the server determines that the copyright information of the first video material corresponding to the video material search result is the second copyright information, and then returns the first video material to the client;

Receive the first video material returned by the server, and display the first video material to the requesting object, so that the requesting object creates a video according to the first video material.

3. The method according to claim 1 or 2, wherein the method further comprises: if the video material includes a video, extracting a video frame from the video material, and performing vectorization processing on the extracted video frame using a pre-trained multimodal vector model to obtain an image feature vector; and/or

If the video material includes an image, the image is vectorized using a pre-trained multimodal vector model to obtain an image feature vector, wherein the multimodal vector model is obtained based on image-text pair training;

Among them, the server associates and stores all or part of the following information: video material, video material identification, extracted video frames, image feature vectors, appearance timestamp of the video frame corresponding to the image feature vector in the video, and copyright information; the client locally associates and stores all or part of the following information: video material, video material identification, extracted video frames, and image feature vectors.

4. The method according to claim 3, wherein the material search request carries a search keyword;

The server searches for video material according to the material search request, and obtains video material search results further comprising:

The server performs vectorization processing on the search keyword using a pre-trained multimodal vector model to obtain a search keyword feature vector;

Searching for video material according to the search keyword feature vector to obtain a matching first image feature vector, a video material identifier, and an appearance timestamp of a video frame corresponding to the image feature vector in the video; or; searching for video material according to the search keyword feature vector to obtain a matching first image and video material identifier;

The returning of visual data corresponding to the video material search result to the client further comprises:

The first image feature vector is returned to the client or the first image corresponding to the first image feature vector is returned to the client.

5. The method according to claim 4, wherein the step of searching for local video material according to the visual data returned by the server to obtain the second video material further comprises:

Calculating the similarity between the first image feature vector and an image feature vector locally stored on the client, and selecting a second image feature vector matching the first image feature vector according to the similarity;

Determine a second video material corresponding to the second image feature vector.

6. The method according to claim 4, wherein the step of searching for local video material according to the visual data returned by the server to obtain the second video material further comprises:

Using a pre-trained multimodal vector model, vectorize the first image returned by the server to obtain a first image feature vector;

7. A video creation system, the system comprising a client and a server, wherein the client comprises: an acquisition module, a forwarding module, a search module, and a display module;

An acquisition module, adapted to acquire a material search request sent by a request object;

A forwarding module, adapted to forward the material search request to a server;

The server is adapted to perform a video material search according to the material search request, obtain a video material search result, and return visual data corresponding to the video material search result to the client if the copyright information of the first video material corresponding to the video material search result is the first copyright information;

A search module, adapted to search for local video material according to the visual data returned by the server to obtain a second video material;

The display module is suitable for displaying the second video material to the request object so that the request object can create a video based on the second video material.

8. A computing device, comprising: a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface communicate with each other through the communication bus;

The memory is used to store at least one executable instruction, and the executable instruction enables the processor to perform operations corresponding to the video creation method as described in any one of claims 1-6.

9. A computer storage medium, wherein at least one executable instruction is stored in the storage medium, and the executable instruction enables a processor to perform operations corresponding to the video creation method as described in any one of claims 1-6.

10. A computer program product, comprising at least one executable instruction, wherein the executable instruction enables a processor to perform operations corresponding to the video creation method as described in any one of claims 1-6.