[go: up one dir, main page]

CN117112184B - Task scheduling service method and system based on container technology - Google Patents

Task scheduling service method and system based on container technology Download PDF

Info

Publication number
CN117112184B
CN117112184B CN202311369652.8A CN202311369652A CN117112184B CN 117112184 B CN117112184 B CN 117112184B CN 202311369652 A CN202311369652 A CN 202311369652A CN 117112184 B CN117112184 B CN 117112184B
Authority
CN
China
Prior art keywords
task
information
container
environment
virtual environment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311369652.8A
Other languages
Chinese (zh)
Other versions
CN117112184A (en
Inventor
许靖
柴磊
韦晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Magic Digital Intelligent Artificial Intelligence Co ltd
Original Assignee
Shenzhen Magic Digital Intelligent Artificial Intelligence Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Magic Digital Intelligent Artificial Intelligence Co ltd filed Critical Shenzhen Magic Digital Intelligent Artificial Intelligence Co ltd
Priority to CN202311369652.8A priority Critical patent/CN117112184B/en
Publication of CN117112184A publication Critical patent/CN117112184A/en
Application granted granted Critical
Publication of CN117112184B publication Critical patent/CN117112184B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

本发明提供了一种基于容器技术的任务调度服务方法及系统,方法包含:获取数据源连接信息,利用连接信息进行通信,并调起对应代码和监听状态;对数据源进行配置,得到代码的托管模板;通过基础镜像创建新的容器,配置Python包索引源;同时创建并管理专属的虚拟环境;当有任务被调起后,推送执行请求至任务调度服务中,根据推送的任务的参数信息,计算对应系统的资源及文件资源等情况,添加任务队列获取任务对应的虚拟环境,并保存调度的参数;系统包含:信息处理模块、环境创建模块及任务处理模块。本发明容器化的部署方式,支持快速部署和分布式部署,当通过集群的形式部署任务执行服务,执行的效率将会大大提高。

The invention provides a task scheduling service method and system based on container technology. The method includes: obtaining data source connection information, using the connection information to communicate, and calling up the corresponding code and monitoring status; configuring the data source to obtain the code Host templates; create new containers through basic images, configure Python package index sources; create and manage dedicated virtual environments at the same time; when a task is called up, push the execution request to the task scheduling service, and based on the parameter information of the pushed task , calculate the resources and file resources of the corresponding system, add a task queue to obtain the virtual environment corresponding to the task, and save the scheduling parameters; the system includes: information processing module, environment creation module and task processing module. The containerized deployment method of the present invention supports rapid deployment and distributed deployment. When task execution services are deployed in the form of clusters, the execution efficiency will be greatly improved.

Description

Task scheduling service method and system based on container technology
Technical Field
The invention relates to the technical field of data processing, in particular to a task scheduling service method and system based on a container technology.
Background
Along with the carding of index systems in different industries, the technology used in the method is also various in different index systems, and the technology is considered according to the requirements of a target database stored by data, performance and other reasons. In constructing an index system, when different computing engines and databases need to be supported, a single system cannot satisfy diversity support and full satisfaction in resources. However, in the prior art, the data processing codes stored by users cannot be directly issued into the service of timing scheduling execution, and an external platform is needed, so that the owned codes comprise sql, python, spark codes and the like, and the system requirement for the execution characteristics of the compatible codes is higher, and the system is required to support the scheduling logic of various codes and support the timing scheduling and monitoring; the existing system does not carry out environment stripping under the general condition, codes are extremely easy to invade the running environment of the system when being called up, the codes are staggered with the system environment, and when the codes which are not standardized are possibly influenced by the running of the original system in the executing process, pollute the host environment, cause breakdown and cannot meet the requirements on the safety of the system under the general condition; the environment configuration of the user level can not be provided in the face of environments required by different codes, so that the use experience of the user can be greatly influenced; the problem that the dependent package version is incompatible when the stored data processing codes such as python are called up for code execution, or the execution result is different due to inconsistent package versions, the execution environment is seriously coupled with the system environment, the code is very difficult to access to the production without modification, and currently, the department in the industry is the execution environment which is manually integrated with the code or rearranged and adapted in operation and maintenance, so that the time and the labor are wasted, and the requirements of the current fast adaptation market are not met; task scheduling services which are not independently stripped cannot be quickly accessed to other platforms, are often limited to be used in specific platforms, and the fixed parameters and the fixed calling modes of the part often determine the upper limit of scheduling execution services, so that the lower the degree of external opening is, the more adverse is to componentization and microservization, and the openness of the whole platform can be influenced; the resource problem is that the running number of the tasks executed by the current platform is based on the resources owned by the current local system, so that when a plurality of people execute the tasks simultaneously, the tasks are easy to block in resource application, the overall operation of the platform can be influenced, and the normal use requirement of the platform is not facilitated.
First, application number: CN201310342752.1 discloses a task scheduling service system and method, comprising: the task calling end module is used for initiating a task scheduling request; the service interface component module is used for creating tasks according to task scheduling requests and creating corresponding scheduling task records in the database; the task scanning component module is used for scanning the scheduling task records, calculating task priorities according to a task scheduling priority algorithm, and then placing the tasks into corresponding priority queues; the task queue component module is used for selecting a task to be executed currently in the priority queue according to a priority queue element dequeuing algorithm; and the task execution module is used for executing the task to be executed currently. By the task scheduling service system and the task scheduling service method, the task scheduling is prioritized, the execution efficiency of the task scheduling service is improved, and meanwhile, the differentiated experience of users on task scheduling services with different priorities is enhanced.
Second prior art, application number: CN201710287419.3 discloses a task scheduling server and task scheduling method, comprising: the device comprises a judging module, an acquiring module and a processing module. And the judging module is used for judging whether the current sub-task is singly executed by the corresponding sub-system. The acquisition module is used for acquiring a task script corresponding to the subtask when the current subtask is singly executed by the corresponding subsystem; the processing module is used for executing the task script, sending the execution request of the subtask to the corresponding subsystem, monitoring the execution result of the subtask, and determining whether the subtask is executed according to the execution result. The task script of the subtask can be called to determine whether the current subtask is successfully executed or not, so that all the subtasks can be orderly executed.
Third, application number: CN201710209051.9 discloses a quantiz-based timing task scheduling service framework and method, including, configuration files including configuration information; the task scheduler comprises a trigger and a job interface, and instantiates the trigger and the job interface through configuration information to provide corresponding task scheduling service; and the service task end is configured with a service task program of the inherited operation interface, and receives a trigger signal sent by the trigger after the task scheduling service is started so that the service task program can complete corresponding operation. Although the requirement of the timing task scheduling service can be simply and rapidly met, the development efficiency and quality of the business system are improved.
The first, second and third existing technologies can not be directly issued into the service of timing scheduling execution, can not provide the environment configuration of user level, the execution environment is seriously coupled with the system environment, the task scheduling service is not independently stripped, and the problem that a plurality of persons execute tasks simultaneously is easy to cause blocking is solved.
Disclosure of Invention
In order to solve the technical problems, the invention provides a task scheduling service method based on a container technology, which comprises the following steps:
acquiring data source connection information, communicating by using the connection information, and calling up a corresponding code and a monitoring state; configuring a data source to obtain a managed template of a code;
creating a new container through the basic mirror image, and configuring a Python package index source; creating and managing exclusive virtual environment;
after a task is called up, pushing an execution request to a task scheduling service, calculating resources and file resources of a corresponding system according to parameter information of the pushed task, adding a task queue to obtain a virtual environment corresponding to the task, and storing the scheduled parameters.
Optionally, the process of obtaining the managed template of the code includes the steps of:
the data source is configured in type and connection format, whether the data source is configured correctly is judged, if the data source is configured correctly, the data source is initialized, the data source connection information is read, the data source management interface is imported and the connection authority is checked, and if the data source is configured incorrectly, operators are informed that the data source is configured incorrectly;
After configuration is finished, a data processing engine template is established, codes, input parameters, output parameters and basic information of the data processing engine are configured, on-line coding of managed codes is carried out according to the codes, coding characteristics of the managed codes are obtained, and a coding style function corresponding to a target coding mode is selected from a preset coding function library based on the coding characteristics;
compiling the coding style function to obtain a managed coding function, and uniformly preprocessing codes to generate function call information corresponding to the managed coding function; utilizing the managed coding function to carry out managed compiling based on the function call information to obtain an operation information compiling file of the data processing engine; and compiling the file according to the running information to generate the application software of the data processing engine.
Optionally, the process of creating a new container by the base image comprises the steps of:
acquiring environment information of a basic image, creating and starting a new container through the basic image, wherein the environment information comprises a basic environment, a language, a computing package and a connection package;
constructing a basic mirror image through a mirror image container management tool, storing the basic mirror image in a mirror image warehouse, and creating a container group to obtain a new container; dispatching to a specific container management service according to the created container group, and calling a server by the container management service;
The container management service configures a Python packet index source which supports downloading and installing a new Python packet; by mirroring the container management tool page add-on and creating a personal virtual environment, the personal virtual environment creates a virtual environment in real time in the new container, the virtual environment creates a separate Python environment containing the package required for the Python interpreter and project.
Optionally, the creating process of the virtual environment includes the following steps:
receiving a request for creating a virtual environment, selecting a basic mirror image according to the environment information of the virtual environment, and creating a new container according to the basic mirror image;
analyzing the request to obtain the number of created virtual environments, dividing the new container into a first basic container and a second basic container, and constructing the basic environment by using the second basic container to obtain a Python interpreter and a dependency package;
and constructing a virtual environment and a basic environment by using the first basic container, configuring a Python interpreter in the virtual environment, and creating the virtual environment according to the configured environment information and the dependency package.
Optionally, a process of adding a task queue to obtain a virtual environment corresponding to a task and storing a scheduled parameter includes the following steps:
receiving a task calling request, and pushing task information into a message queue RabbitMQ;
Obtaining a configuration instruction of a message queue RabbitMQ, constructing a RabbitMQ management center according to the configuration instruction, obtaining path configuration information pushed by task information of the queue, monitoring the queue according to the path configuration information, and obtaining a monitoring result by a task scheduling service; after acquiring task information, calculating required system resources and file resources including CPU core number, memory consumption and hard disk resources according to parameters, and adding tasks into a task queue;
the task queue follows the principle of first-in first-out, and waits to be executed according to the sequence; when the task is queued to be executed, the service acquires the task type and the scheduling type according to the parameters of the task, and the required virtual environment and the saved scheduling parameters; and calling different execution classes to execute the task according to the parameters.
Optionally, a process of calling different execution classes to execute tasks includes the following steps:
activating a virtual environment, wherein the virtual environment is remotely created through a virtual environment management interface, and the virtual environment is preconfigured and is used for executing specific tasks or programs; dynamically loading an execution code, reading the execution code according to a dynamic loading instruction, dynamically loading or updating the execution code through a configuration port, reconfiguring programmable resources in the execution code, and instantiating the required execution code into an executable class;
Calling a specific method of the executable class by using the pre-configured parameters to execute tasks;
after a return result of task execution is obtained, processing is carried out according to requirements; the result is returned to the calling party through the HTTP mode, or the data is stored to the temporary table through the built-in data table falling function, and the calling party is informed of the stored information through the HTTP mode.
The invention provides a task scheduling service system based on a container technology, which comprises the following components:
the information processing module is in charge of acquiring data source connection information, communicating by using the connection information, and calling up a corresponding code and a monitoring state; configuring a data source to obtain a managed template of a code;
the environment creation module is responsible for creating a new container through the basic mirror image and configuring a Python package index source; creating and managing exclusive virtual environment;
and the task processing module is responsible for pushing an execution request to a task scheduling service after a task is scheduled, calculating resources and file resources of a corresponding system according to the parameter information of the pushed task, adding a task queue to acquire a virtual environment corresponding to the task, and storing the scheduled parameters.
Optionally, the environment creation module includes:
the container creation sub-module is responsible for acquiring environment information of the basic mirror image, creating and starting a new container through the basic mirror image, wherein the environment information comprises a basic environment, a language, a calculation package and a connection package;
the container scheduling sub-module is responsible for constructing a basic mirror image through a mirror image container management tool, storing the basic mirror image in a mirror image warehouse, and creating a container group to obtain a new container; dispatching to a specific container management service according to the created container group, and calling a server by the container management service;
the service configuration sub-module is in charge of the container management service to configure a Python packet index source, and the Python packet index source supports downloading and installing a new Python packet; by mirroring the container management tool page add-on and creating a personal virtual environment, the personal virtual environment creates a virtual environment in real time in the new container, the virtual environment creates a separate Python environment containing the package required for the Python interpreter and project.
Optionally, the task processing module includes:
the information pushing sub-module is in charge of receiving a task calling request and pushing task information into the message queue RabbitMQ;
the instruction processing sub-module is responsible for obtaining configuration instructions of the message queue RabbitMQ, constructing a RabbitMQ management center according to the configuration instructions, obtaining path configuration information pushed by task information of the queue, monitoring the queue according to the path configuration information, and obtaining a monitoring result by the task scheduling service; after acquiring task information, calculating required system resources and file resources including CPU core number, memory consumption and hard disk resources according to parameters, and adding tasks into a task queue;
The task execution sub-module is responsible for queuing and waiting to be executed according to the sequence, wherein the task queue follows the principle of first-in first-out; when the task is queued to be executed, the service acquires the task type and the scheduling type according to the parameters of the task, and the required virtual environment and the saved scheduling parameters; and calling different execution classes to execute the task according to the parameters.
Firstly, acquiring data source connection information, communicating by using the connection information, and calling up a corresponding code and a monitoring state; configuring a data source to obtain a managed template of a code; secondly, creating a new container through the basic mirror image, and configuring a Python package index source; creating and managing exclusive virtual environment; finally, after the task is called up, pushing an execution request to a task scheduling service, calculating the conditions of corresponding system resources, file resources and the like according to the parameter information of the pushed task, adding a task queue to obtain a virtual environment corresponding to the task, and storing the scheduled parameter; the task scheduling execution service is adopted, the cooling convenience of the container technology is utilized in the technical field of data processing, the integration of multiple virtual environments is realized, and the stronger execution efficiency is provided; the expansion of multiple environments is realized, so that different users of the same platform have more choices, and the users can completely define respective virtual environments independently to execute different codes, thereby enabling sql, python, spark codes to be integrated rapidly in the same system; the diversity of functions greatly improves the corresponding code diversity, can also be rapidly supported on the subsequent expansion of other execution code classes, and can be rapidly invoked by a task mode by only installing different packages to create independent virtual environments; the effectiveness of environment isolation, because the code execution layer is completely stripped out and is completely submitted to the task scheduling service center for execution, the coupling with the original system is completely isolated, the stability and the safety of the system are greatly improved, the system problems caused in the process of executing task codes, such as os operation of python codes, are avoided, the breakdown of the system caused by incorrect operation is avoided, and compared with the system of an unserviceable execution component at present, the system is incomparable in the aspects of execution efficiency, safety, diversity and the like; the containerized deployment mode supports quick deployment and distributed deployment, when task execution services are deployed in a cluster mode, the execution efficiency is greatly improved, and compared with the prior art that new environments are reconfigured and redeployed manually according to environments required by users, the system has the characteristics of simplicity and rapidness in efficiency and simplicity; even code developers lacking related container learning background can rapidly deploy the required environment, so that the learning cost of the developers and the time cost consumed by deployment are saved to the maximum extent; the device is provided with various data source adaptation components, supports spark, hive, mysql and the like, and can perform the operations of adding, deleting and modifying various data sources without depending on other components; unlike prior art solutions, the coupled databases are connected into the system and depending on the particular circumstances, the user's existing database cannot be directly used as a data source. The invention integrates the data source adaptation and the code scheduling adaptation, so that the old index output codes of the former developers can be invoked and scheduled without modification, and the modification cost and time are saved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a flow chart of a task scheduling service method based on container technology in embodiment 1 of the present invention;
FIG. 2 is a process diagram of a managed template of the code obtained in embodiment 2 of the present invention;
FIG. 3 is a diagram of a process for creating a new container by a base image in embodiment 3 of the present invention;
FIG. 4 is a diagram illustrating a process of creating a virtual environment according to embodiment 4 of the present invention;
FIG. 5 is a schematic diagram of a virtual environment creation process in embodiment 4 of the present invention;
FIG. 6 is a process diagram of adding a task queue to obtain a virtual environment corresponding to a task and storing scheduled parameters in embodiment 5 of the present invention;
FIG. 7 is a schematic diagram of a process of adding a task queue to obtain a virtual environment corresponding to a task and storing scheduled parameters in embodiment 5 of the present invention;
FIG. 8 is a process diagram of invoking different execution classes for task execution in embodiment 6 of the present invention;
FIG. 9 is a block diagram of a task scheduling service system based on container technology in embodiment 7 of the present invention;
FIG. 10 is a block diagram of an information processing module in embodiment 8 of the present invention;
FIG. 11 is a block diagram of an environment creation module in embodiment 9 of the present invention;
fig. 12 is a block diagram of a task processing module in embodiment 10 of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the application. As used in the examples and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art as the case may be.
Example 1: as shown in fig. 1, an embodiment of the present invention provides a task scheduling service method based on container technology, which includes the following steps:
s100: acquiring data source connection information, communicating by using the connection information, and calling up a corresponding code and a monitoring state; configuring a data source to obtain a managed template of a code;
s200: creating a new container through the basic mirror image, and configuring a Python package index source; creating and managing exclusive virtual environment;
s300: after a task is called up, pushing an execution request to a task scheduling service, calculating the conditions of resources, file resources and the like of a corresponding system according to the parameter information of the pushed task, adding a task queue to obtain a virtual environment corresponding to the task, and storing the scheduled parameter;
The working principle and beneficial effects of the technical scheme are as follows: firstly, acquiring data source connection information, communicating by using the connection information, and calling up a corresponding code and a monitoring state; configuring a data source to obtain a managed template of a code; secondly, creating a new container through the basic mirror image, and configuring a Python package index source; creating and managing exclusive virtual environment; finally, after the task is called up, pushing an execution request to a task scheduling service, calculating the conditions of corresponding system resources, file resources and the like according to the parameter information of the pushed task, adding a task queue to obtain a virtual environment corresponding to the task, and storing the scheduled parameter; the scheme adopts task scheduling execution service, and the cooling convenience of the container technology is utilized in the technical field of data processing, so that the integration of multiple virtual environments is realized, and stronger execution efficiency is provided; the expansion of multiple environments is realized, so that different users of the same platform have more choices, and the users can completely define respective virtual environments independently to execute different codes, thereby enabling sql, python, spark codes to be integrated rapidly in the same system; the diversity of functions greatly improves the corresponding code diversity, can also be rapidly supported on the subsequent expansion of other execution code classes, and can be rapidly invoked by a task mode by only installing different packages to create independent virtual environments; the effectiveness of environment isolation, because the code execution layer is completely stripped out and is completely submitted to the task scheduling service center for execution, the coupling with the original system is completely isolated, the stability and the safety of the system are greatly improved, the system problems caused in the process of executing task codes, such as os operation of python codes, are avoided, the breakdown of the system caused by incorrect operation is avoided, and compared with the system of an unserviceable execution component at present, the implementation efficiency, the safety, the diversity and the like of the embodiment are incomparable; the containerized deployment mode supports quick deployment and distributed deployment, when task execution services are deployed in a cluster mode, the execution efficiency is greatly improved, and compared with the prior art that new environments are reconfigured and redeployed manually according to environments required by users, the system has the characteristics of simplicity and rapidness in efficiency and simplicity; even code developers lacking related container learning background can rapidly deploy the required environment, so that the learning cost of the developers and the time cost consumed by deployment are saved to the maximum extent; the device is provided with various data source adaptation components, supports spark, hive, mysql and the like, and can perform the operations of adding, deleting and modifying various data sources without depending on other components; unlike prior art solutions, the coupled databases are connected into the system and depending on the particular circumstances, the user's existing database cannot be directly used as a data source. The data source adaptation and the code scheduling adaptation integrated in the embodiment enable old index output codes of the former developers to be invoked and scheduled without modification, and save modification cost and time.
Example 2: as shown in fig. 2, on the basis of embodiment 1, the process of obtaining the managed template of the code provided by the embodiment of the invention includes the following steps:
s104: the data source is configured in type and connection format, whether the data source is configured correctly is judged, if the data source is configured correctly, the data source is initialized, the data source connection information is read, the data source management interface is imported and the connection authority is checked, and if the data source is configured incorrectly, operators are informed that the data source is configured incorrectly;
s105: after configuration is finished, a data processing engine template is established, codes, input parameters, output parameters and basic information of the data processing engine are configured, on-line coding of managed codes is carried out according to the codes, coding characteristics of the managed codes are obtained, and a coding style function corresponding to a target coding mode is selected from a preset coding function library based on the coding characteristics;
s106: compiling the coding style function to obtain a managed coding function, and uniformly preprocessing codes to generate function call information corresponding to the managed coding function; utilizing the managed coding function to carry out managed compiling based on the function call information to obtain an operation information compiling file of the data processing engine; compiling a file according to the running information to generate application software of a data processing engine;
The working principle and beneficial effects of the technical scheme are as follows: firstly, configuring a type and a connection format of a data source, judging whether the configuration of the data source is correct, initializing the data source if the configuration of the data source is correct, reading the connection information of the data source, importing a data source management interface and checking the connection authority, and informing an operator that the configuration of the data source is incorrect if the configuration of the data source is incorrect; after the configuration is finished, a data processing engine template is established, codes, input parameters, output parameters and basic information of the data processing engine are configured, on-line coding of managed codes is carried out according to the codes, coding characteristics of the managed codes are obtained, and a coding style function corresponding to a target coding mode is selected from a preset coding function library based on the coding characteristics; finally compiling the coding style function to obtain a managed coding function, and uniformly preprocessing codes to generate function call information corresponding to the managed coding function; utilizing the managed coding function to carry out managed compiling based on the function call information to obtain an operation information compiling file of the data processing engine; compiling a file according to the running information to generate application software of a data processing engine; the scheme realizes the configuration and connection management of the data sources and the template configuration and code generation of the data processing engine; the correctness of the data source is ensured by configuring the data source, and the authority verification is carried out to ensure the reliability and the safety of the data; meanwhile, a data processing engine template is established and online coding is carried out, and a corresponding coding style function is generated according to requirements, so that flexible data processing operation is realized; and finally, the generated data processing engine application software improves the efficiency and accuracy of data processing and provides better data analysis and decision support. The embodiment provides a complete set of data processing flow and tool, which helps users to better manage and utilize data resources.
In the embodiment, the connection information of spark is imported through a data source management page of the page, connection authority verification is carried out, after the configuration of the data source is completed, a spark template of a platform menu is newly established, and the spark code, input parameters and output content level basic information of the template are configured, so that the hosting of the code on the platform is completed; after the template is possessed, the template is imported through the expert model of the page, tasks are arranged in a dragging mode, and task scheduling is carried out for the following process.
Example 3: as shown in fig. 3, on the basis of embodiment 1, the process of creating a new container by the base mirror image provided in the embodiment of the present invention includes the following steps:
s201: acquiring environment information of a basic image, creating and starting a new container through the basic image, wherein the environment information comprises a basic environment, a language, a computing package, a connection package and the like;
s202: constructing a basic mirror image through a mirror image container management tool, storing the basic mirror image in a mirror image warehouse, and creating a container group to obtain a new container; dispatching to a specific container management service according to the created container group, and calling a server by the container management service;
s203: the container management service configures a Python packet index source which supports downloading and installing a new Python packet; adding through a mirror image container management tool page, and creating a personal virtual environment, wherein the personal virtual environment generates a virtual environment in a new container in real time, and the virtual environment creates an independent Python environment which contains a Python interpreter of a specific version and a package required by an item;
The working principle and beneficial effects of the technical scheme are as follows: the method comprises the steps that firstly, environment information of a basic image is obtained, a new container is created and opened through the basic image, and the environment information comprises a basic environment, a language, a calculation package, a connection package and the like; secondly, constructing a basic mirror image through a mirror image container management tool, storing the basic mirror image into a mirror image warehouse, and creating a container group to obtain a new container; dispatching to a specific container management service according to the created container group, and calling a server by the container management service; finally, the container management service configures a Python packet index source which supports downloading and installing a new Python packet; adding through a mirror image container management tool page, and creating a personal virtual environment, wherein the personal virtual environment generates a virtual environment in a new container in real time, and the virtual environment creates an independent Python environment which contains a Python interpreter of a specific version and a package required by an item; by creating the personal virtual environment, the scheme can isolate the development environment of the user from the system environment, avoid the influence of the operation of the user on the system environment, and simultaneously avoid the interference of the change of the system environment on the development work of the user; the personal virtual environment allows a user to install a Python interpreter of a specific version and packages required by the items in the personal virtual environment, so that the user can conveniently manage the dependency relationship of the items, dependency conflict among different items is avoided, and each item can be ensured to normally run; through the personal virtual environment, a user can run a plurality of projects on the same machine, each project has an independent environment, and the user can rapidly switch the development environments of different projects, so that the development efficiency and flexibility are improved; the personal virtual environment can be packaged into a mirror image and stored in a mirror image warehouse, and when the project is deployed, the environment required by the project can be quickly built only by deploying the mirror image into the target environment, so that the deployment process is simplified. The embodiment creates a personal virtual environment, provides an independent and isolated development environment, is convenient for users to manage dependence, improves flexibility, simplifies the deployment process of projects, and is very significant for developers.
In this embodiment, the image container management, before service deployment, a base image of a dock needs to be prepared in advance, where the base image includes a python base environment, and a software language of python3 is built in (the service is implemented based on a python flash), and includes a common computing packet and a connection packet such as pandas, numpy and pyspark; when the new container is deployed for the first time, a new container is required to be pulled up through the basic mirror image, and all environment information is acquired; through the mirror image container management of the front-end page of the system, an internal pip source is configured, the pip source supports direct downloading and installation of new packages, and a dedicated personal virtual environment is created in a page adding mode and is completely independent of the environment of the system, and the environment is an environment only belonging to the user, and can produce the virtual environment in the container in real time.
Example 4: as shown in fig. 4, on the basis of embodiment 3, the creation process of the virtual environment provided by the embodiment of the present invention includes the following steps:
s2031: receiving a request for creating a virtual environment, selecting a basic mirror image according to the environment information of the virtual environment, and creating a new container according to the basic mirror image;
s2032: analyzing the request to obtain the number of created virtual environments, dividing the new container into a first basic container and a second basic container, and constructing the basic environment by using the second basic container to obtain a Python interpreter and a dependency package;
S2033: constructing a virtual environment and a basic environment by using a first basic container, configuring a Python interpreter in the virtual environment, and creating the virtual environment according to the configured environment information and the dependency package;
the working principle and beneficial effects of the technical scheme are as follows: the method comprises the steps of firstly receiving a request for creating a virtual environment, selecting a basic mirror image according to environment information of the virtual environment, and creating a new container according to the basic mirror image; secondly, analyzing the request to obtain the number of created virtual environments, dividing the new container into a first basic container and a second basic container, and constructing the basic environment by utilizing the second basic container to obtain a Python interpreter and a dependency package; finally, constructing a virtual environment and a basic environment by utilizing a first basic container, configuring a Python interpreter in the virtual environment, and creating the virtual environment according to the configured environment information and the dependency package (the principle is shown in figure 5); the scheme is isolated: by creating independent virtual environments, dependency conflicts among different projects are avoided, each virtual environment is provided with a Python interpreter and a dependency package, and the virtual environments are not mutually influenced; independence: each virtual environment is provided with Python interpreters and dependency packages of different versions, so that the requirements of different projects are met, and the required Python interpreters and dependency packages are obtained by dividing a basic container and constructing the basic environment; flexibility: creating a plurality of virtual environments according to the number specified in the request, wherein each environment is independently managed and maintained, and the specific requirements of the project are met by configuring environment information and a dependency package of the virtual environment; simplified management: by automating the process of creating the virtual environment, the workload of manual operations and configuration is reduced, and by sharing the base container and the base environment, the version and update of the dependency package is better managed and maintained. The embodiment provides an isolated, independent and flexible development environment, so that a developer can better manage and maintain the dependency relationship of the project, and the development efficiency and the maintainability of the project are improved.
Example 5: as shown in fig. 6, on the basis of embodiment 1, the process of adding a task queue to obtain a virtual environment corresponding to a task and storing a scheduled parameter according to the embodiment of the present invention includes the following steps:
s301: receiving a task calling request, and pushing task information into a message queue RabbitMQ;
s302: obtaining a configuration instruction of a message queue RabbitMQ, constructing a RabbitMQ management center according to the configuration instruction, obtaining path configuration information pushed by task information of the queue, monitoring the queue according to the path configuration information, and obtaining a monitoring result by a task scheduling service; after acquiring task information, calculating the conditions of required system resources, file resources and the like according to parameters, including CPU core number, memory consumption, hard disk resources and the like, and adding tasks into a task queue;
s303: the task queue follows the principle of first-in first-out, and waits to be executed according to the sequence; when the task is queued to be executed, the service acquires the task type and the scheduling type according to the parameters of the task, and the required virtual environment and the saved scheduling parameters; according to the parameters, different execution classes are called to execute tasks;
the working principle and beneficial effects of the technical scheme are as follows: firstly, receiving a task calling request, and pushing task information into a message queue RabbitMQ; secondly, obtaining a configuration instruction of the message queue RabbitMQ, constructing a RabbitMQ management center according to the configuration instruction, obtaining path configuration information pushed by task information of the queue, monitoring the queue according to the path configuration information, and obtaining a monitoring result by a task scheduling service; after acquiring task information, calculating the conditions of required system resources, file resources and the like according to parameters, including CPU core number, memory consumption, hard disk resources and the like, and adding tasks into a task queue; finally, the task queue follows the principle of first-in first-out, and waits to be executed according to the sequence; when the task is queued to be executed, the service acquires the task type and the scheduling type according to the parameters of the task, and the required virtual environment and the saved scheduling parameters; according to the parameters, different execution classes are called to execute tasks (the principle is shown in fig. 7); the scheme improves the task scheduling efficiency: the task information is pushed to the message queue, and the task information is acquired through the monitoring queue, so that asynchronous processing and concurrent execution of the task are realized, and the task scheduling efficiency is improved; flexibly configuring task scheduling: the execution mode and the path of the task are flexibly configured according to specific requirements through configuration instructions and path configuration information, so that the task scheduling is more flexible and controllable; resource management and scheduling: the system resources are reasonably managed and scheduled by calculating the conditions of required system resources, file resources and the like, including the number of CPU cores, the memory consumption, hard disk resources and the like, so that resource overload or waste is avoided, and the overall performance of the system is improved; task queue management: orderly execution of tasks is realized through a first-in first-out principle of a task queue, so that the loss or disorder of the tasks is avoided, and the accuracy and the integrity of task execution are ensured; support multiple task types and scheduling types: and according to the parameters of the task and the scheduling requirements, different execution classes are called to execute the task, and flexible application of various task types and scheduling types is supported. The embodiment provides an efficient, flexible and controllable task scheduling service, which can effectively manage and schedule tasks and improve the overall performance of a system and the accuracy of task execution.
Example 6: as shown in fig. 8, on the basis of embodiment 5, the process of calling different execution classes to execute tasks provided in the embodiment of the present invention includes the following steps:
s3031: activating a virtual environment, wherein the virtual environment is remotely created through a virtual environment management interface, and the virtual environment is preconfigured and is used for executing specific tasks or programs; dynamically loading an execution code, reading the execution code according to a dynamic loading instruction, dynamically loading or updating the execution code through a configuration port, reconfiguring programmable resources in the execution code, and instantiating the required execution code into an executable class;
s3032: calling a specific method of the executable class by using the pre-configured parameters to execute tasks;
s3033: after a return result of task execution is obtained, processing is carried out according to requirements; returning the result to the caller in the form of hypertext transfer protocol (HTTP); if the data volume is large, the data is stored to a temporary table through a built-in data falling table function, and the caller is informed of the storage information through a HTTP mode;
the working principle and beneficial effects of the technical scheme are as follows: the embodiment activates a virtual environment, the virtual environment is created remotely through a virtual environment management interface, and the virtual environment is preconfigured and is used for executing specific tasks or programs; dynamically loading an execution code, reading the execution code according to a dynamic loading instruction, dynamically loading or updating the execution code through a configuration port, reconfiguring programmable resources in the execution code, and instantiating the required execution code into an executable class; secondly, a specific method of the executable class is called by using the pre-configured parameters, and the task is executed; finally, after a return result of task execution is obtained, processing is carried out according to requirements; returning the result to the caller in the form of hypertext transfer protocol (HTTP); if the data volume is large, the data is stored to a temporary table through a built-in data falling table function, and the caller is informed of the storage information through a HTTP mode; the above scheme flexibility and reusability: the executable codes are packaged into executable classes, so that the codes are conveniently organized and multiplexed, and the flexibility and maintainability of the codes are improved; dynamic loading and configuration: the execution codes are dynamically loaded and dynamically loaded or updated according to the configured ports, and the execution codes are flexibly configured and reconfigured according to the needs, so that the method is suitable for different task demands; extensibility and customizable: calling a specific method of the executable class through the pre-configured parameters, executing various specific tasks, and processing a returned result according to requirements; meanwhile, a large amount of data is saved through the temporary table, and a calling party is informed of saving information in an HTTP mode, so that the expansibility and the customization of the system are improved. The embodiment processes and returns the result according to the requirements, and meets the requirements of different application scenes.
Example 7: as shown in fig. 9, on the basis of embodiment 1 to embodiment 6, the task scheduling service system based on container technology provided in the embodiment of the present invention includes:
the information processing module is in charge of acquiring data source connection information, communicating by using the connection information, and calling up a corresponding code and a monitoring state; configuring a data source to obtain a managed template of a code;
the environment creation module is responsible for creating a new container through the basic mirror image and configuring a Python package index source; creating and managing exclusive virtual environment;
the task processing module is in charge of pushing an execution request to a task scheduling service after a task is scheduled, calculating the conditions of resources, file resources and the like of a corresponding system according to the parameter information of the pushed task, adding a task queue to obtain a virtual environment corresponding to the task, and storing the scheduled parameters;
the working principle and beneficial effects of the technical scheme are as follows: the information processing module of the embodiment acquires the connection information of the data source, communicates by using the connection information, and calls up the corresponding code and the monitoring state; configuring a data source to obtain a managed template of a code; the environment creation module creates a new container through the basic mirror image and configures a Python package index source; creating and managing exclusive virtual environment; after a task is called, the task processing module pushes an execution request to a task scheduling service, calculates the conditions of resources, file resources and the like of a corresponding system according to the parameter information of the pushed task, adds a task queue to obtain a virtual environment corresponding to the task, and stores the scheduled parameters; the scheme adopts task scheduling execution service, and the cooling convenience of the container technology is utilized in the technical field of data processing, so that the integration of multiple virtual environments is realized, and stronger execution efficiency is provided; the expansion of multiple environments is realized, so that different users of the same platform have more choices, and the users can completely define respective virtual environments independently to execute different codes, thereby enabling sql, python, spark codes to be integrated rapidly in the same system; the diversity of functions greatly improves the corresponding code diversity, can also be rapidly supported on the subsequent expansion of other execution code classes, and can be rapidly invoked by a task mode by only installing different packages to create independent virtual environments; the effectiveness of environment isolation, because the code execution layer is completely stripped out and is completely submitted to the task scheduling service center for execution, the coupling with the original system is completely isolated, the stability and the safety of the system are greatly improved, the system problems caused in the process of executing task codes, such as os operation of python codes, are avoided, the breakdown of the system caused by incorrect operation is avoided, and compared with the system of an unserviceable execution component at present, the implementation efficiency, the safety, the diversity and the like of the embodiment are incomparable; the containerized deployment mode supports quick deployment and distributed deployment, when task execution services are deployed in a cluster mode, the execution efficiency is greatly improved, and compared with the prior art that new environments are reconfigured and redeployed manually according to environments required by users, the system has the characteristics of simplicity and rapidness in efficiency and simplicity; even code developers lacking related container learning background can rapidly deploy the required environment, so that the learning cost of the developers and the time cost consumed by deployment are saved to the maximum extent; the device is provided with various data source adaptation components, supports spark, hive, mysql and the like, and can perform the operations of adding, deleting and modifying various data sources without depending on other components; unlike prior art solutions, the coupled databases are connected into the system and depending on the particular circumstances, the user's existing database cannot be directly used as a data source. The data source adaptation and the code scheduling adaptation integrated in the embodiment enable old index output codes of the former developers to be invoked and scheduled without modification, and save modification cost and time.
Example 8: as shown in fig. 10, on the basis of embodiment 7, an information processing module provided in an embodiment of the present invention includes:
the configuration processing sub-module is in charge of carrying out the configuration of the type and the connection format on the data source, judging whether the configuration of the data source is correct, initializing the data source if the configuration of the data source is correct, reading the connection information of the data source, importing a data source management interface and carrying out connection permission verification, and informing operators that the configuration of the data source is incorrect if the configuration of the data source is incorrect;
the code support pipe module is in charge of establishing a data processing engine template after configuration is completed, configuring codes, input parameters, output parameters and basic information of the data processing engine, carrying out on-line coding of managed codes according to the codes, acquiring coding characteristics of the managed codes, and selecting a coding style function corresponding to a target coding mode in a preset coding function library based on the coding characteristics;
the information compiling sub-module is responsible for compiling the coding style function to obtain a managed coding function, and uniformly preprocessing codes to generate function call information corresponding to the managed coding function; utilizing the managed coding function to carry out managed compiling based on the function call information to obtain an operation information compiling file of the data processing engine; compiling a file according to the running information to generate application software of a data processing engine;
The working principle and beneficial effects of the technical scheme are as follows: the configuration processing submodule of the embodiment carries out the configuration of the type and the connection format on the data source, judges whether the configuration of the data source is correct, if the configuration of the data source is correct, initializes the data source, reads the connection information of the data source, imports a data source management interface and carries out connection permission verification, and if the configuration of the data source is incorrect, informs operators that the configuration of the data source is incorrect; after the code support pipe module is configured, a data processing engine template is established, codes, input parameters, output parameters and basic information of the data processing engine are configured, on-line coding of managed codes is carried out according to the codes, coding characteristics of the managed codes are obtained, and a coding style function corresponding to a target coding mode is selected from a preset coding function library based on the coding characteristics; the information compiling sub-module compiles the coding style function to obtain a managed coding function, and uniformly pre-processes codes to generate function call information corresponding to the managed coding function; utilizing the managed coding function to carry out managed compiling based on the function call information to obtain an operation information compiling file of the data processing engine; compiling a file according to the running information to generate application software of a data processing engine; the scheme realizes the acquisition of the data source connection information: information required by connecting an external data source, such as a host name, a port number, a user name, a password and the like, is obtained by analyzing the configuration information, and is a key for connecting the external data source, so that a data processing engine can accurately acquire index data; establishing a communication connection: through the data source connection information, each node in the data processing engine cluster can establish communication connection, realize the coordination of data transmission and tasks, transmit index data from the data source to each node, and perform distributed data processing and analysis in the cluster; the data processing efficiency is improved: by establishing communication connection, parallel processing of data is realized in the data processing engine cluster, and each node can process different data fragments simultaneously, so that the efficiency and speed of data processing are greatly improved, and the method is particularly important for large-scale data sets and complex data processing tasks; task coordination and allocation are realized: after communication connection is established, each node in the data processing engine cluster coordinates and distributes tasks, data and computing resources can be shared among the nodes, and the tasks are reasonably distributed according to the characteristics of the tasks and the load conditions of the nodes, so that more efficient data processing is realized. The embodiment realizes the coordination of data transmission and tasks in the data processing engine cluster, improves the efficiency and performance of data processing, and can better process and analyze big data; configuration and connection management of the data sources are realized, and template configuration and code generation of a data processing engine are realized; the correctness of the data source is ensured by configuring the data source, and the authority verification is carried out to ensure the reliability and the safety of the data; meanwhile, a data processing engine template is established and online coding is carried out, and a corresponding coding style function is generated according to requirements, so that flexible data processing operation is realized; and finally, the generated data processing engine application software improves the efficiency and accuracy of data processing and provides better data analysis and decision support. The embodiment provides a complete set of data processing flow and tool, which helps users to better manage and utilize data resources.
Example 9: as shown in fig. 11, on the basis of embodiment 7, an environment creation module provided in an embodiment of the present invention includes:
the container creation sub-module is responsible for acquiring environment information of the basic mirror image, creating and starting a new container through the basic mirror image, wherein the environment information comprises a basic environment, a language, a calculation package, a connection package and the like;
the container scheduling sub-module is responsible for constructing a basic mirror image through a mirror image container management tool, storing the basic mirror image in a mirror image warehouse, and creating a container group to obtain a new container; dispatching to a specific container management service according to the created container group, and calling a server by the container management service;
the service configuration sub-module is in charge of the container management service to configure a Python packet index source, and the Python packet index source supports downloading and installing a new Python packet; adding through a mirror image container management tool page, and creating a personal virtual environment, wherein the personal virtual environment generates a virtual environment in a new container in real time, and the virtual environment creates an independent Python environment which contains a Python interpreter of a specific version and a package required by an item;
the working principle and beneficial effects of the technical scheme are as follows: the container creation submodule of the embodiment obtains the environment information of the basic mirror image, creates and opens a new container through the basic mirror image, and the environment information comprises a basic environment, a language, a calculation packet, a connection packet and the like; the container scheduling sub-module constructs a basic mirror image through a mirror image container management tool, stores the basic mirror image in a mirror image warehouse, and creates a container group to obtain a new container; dispatching to a specific container management service according to the created container group, and calling a server by the container management service; the service configuration submodule container management service configures a Python package index source which supports downloading and installing a new Python package; adding through a mirror image container management tool page, and creating a personal virtual environment, wherein the personal virtual environment generates a virtual environment in a new container in real time, and the virtual environment creates an independent Python environment which contains a Python interpreter of a specific version and a package required by an item; by creating the personal virtual environment, the scheme can isolate the development environment of the user from the system environment, avoid the influence of the operation of the user on the system environment, and simultaneously avoid the interference of the change of the system environment on the development work of the user; the personal virtual environment allows a user to install a Python interpreter of a specific version and packages required by the items in the personal virtual environment, so that the user can conveniently manage the dependency relationship of the items, dependency conflict among different items is avoided, and each item can be ensured to normally run; through the personal virtual environment, a user can run a plurality of projects on the same machine, each project has an independent environment, and the user can rapidly switch the development environments of different projects, so that the development efficiency and flexibility are improved; the personal virtual environment can be packaged into a mirror image and stored in a mirror image warehouse, and when the project is deployed, the environment required by the project can be quickly built only by deploying the mirror image into the target environment, so that the deployment process is simplified. The embodiment creates a personal virtual environment, provides an independent and isolated development environment, is convenient for users to manage dependence, improves flexibility, simplifies the deployment process of projects, and is very significant for developers.
Example 10: as shown in fig. 12, on the basis of embodiment 7, a task processing module provided in an embodiment of the present invention includes:
the information pushing sub-module is in charge of receiving a task calling request and pushing task information into the message queue RabbitMQ;
the instruction processing sub-module is responsible for obtaining configuration instructions of the message queue RabbitMQ, constructing a RabbitMQ management center according to the configuration instructions, obtaining path configuration information pushed by task information of the queue, monitoring the queue according to the path configuration information, and obtaining a monitoring result by the task scheduling service; after acquiring task information, calculating the conditions of required system resources, file resources and the like according to parameters, including CPU core number, memory consumption, hard disk resources and the like, and adding tasks into a task queue;
the task execution sub-module is responsible for queuing and waiting to be executed according to the sequence, wherein the task queue follows the principle of first-in first-out; when the task is queued to be executed, the service acquires the task type and the scheduling type according to the parameters of the task, and the required virtual environment and the saved scheduling parameters; according to the parameters, different execution classes are called to execute tasks;
the working principle and beneficial effects of the technical scheme are as follows: the information pushing submodule of the embodiment receives a request for task call and pushes task information to the message queue RabbitMQ; the instruction processing sub-module obtains a configuration instruction of the message queue RabbitMQ, constructs a RabbitMQ management center according to the configuration instruction, obtains path configuration information pushed by task information of the queue, monitors the queue according to the path configuration information, and the task scheduling service obtains a monitoring result; after acquiring task information, calculating the conditions of required system resources, file resources and the like according to parameters, including CPU core number, memory consumption, hard disk resources and the like, and adding tasks into a task queue; the task queue of the task execution submodule follows the principle of first-in first-out, and waits to be executed according to the sequence; when the task is queued to be executed, the service acquires the task type and the scheduling type according to the parameters of the task, and the required virtual environment and the saved scheduling parameters; according to the parameters, different execution classes are called to execute tasks; the scheme improves the task scheduling efficiency: the task information is pushed to the message queue, and the task information is acquired through the monitoring queue, so that asynchronous processing and concurrent execution of the task are realized, and the task scheduling efficiency is improved; flexibly configuring task scheduling: the execution mode and the path of the task are flexibly configured according to specific requirements through configuration instructions and path configuration information, so that the task scheduling is more flexible and controllable; resource management and scheduling: the system resources are reasonably managed and scheduled by calculating the conditions of required system resources, file resources and the like, including the number of CPU cores, the memory consumption, hard disk resources and the like, so that resource overload or waste is avoided, and the overall performance of the system is improved; task queue management: orderly execution of tasks is realized through a first-in first-out principle of a task queue, so that the loss or disorder of the tasks is avoided, and the accuracy and the integrity of task execution are ensured; support multiple task types and scheduling types: and according to the parameters of the task and the scheduling requirements, different execution classes are called to execute the task, and flexible application of various task types and scheduling types is supported. The embodiment provides an efficient, flexible and controllable task scheduling service, which can effectively manage and schedule tasks and improve the overall performance of a system and the accuracy of task execution.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (7)

1.一种基于容器技术的任务调度服务方法,其特征在于,包含以下步骤:1. A task scheduling service method based on container technology, which is characterized by including the following steps: 获取数据源连接信息,利用连接信息进行通信,并调起对应代码和监听状态;对数据源进行配置,得到代码的托管模板;Obtain the data source connection information, use the connection information to communicate, and call up the corresponding code and monitoring status; configure the data source and obtain the managed template of the code; 通过基础镜像创建新的容器,配置Python包索引源;同时创建并管理专属的虚拟环境;Create a new container through the basic image and configure the Python package index source; at the same time, create and manage a dedicated virtual environment; 当有任务被调起后,推送执行请求至任务调度服务中,根据推送的任务的参数信息,计算对应系统的资源及文件资源,添加任务队列获取任务对应的虚拟环境,并保存调度的参数;When a task is called up, push the execution request to the task scheduling service, calculate the corresponding system resources and file resources based on the parameter information of the pushed task, add the task queue to obtain the virtual environment corresponding to the task, and save the scheduling parameters; 得到代码的托管模板的过程,包含以下步骤:The process of obtaining the managed template of the code includes the following steps: 对数据源进行类型及连接格式的配置,判断数据源的配置是否正确,若数据源的配置正确,则进行数据源的初始化,读取数据源连接信息,导入数据源管理界面并进行连接权限校验,若数据源的配置不正确,则通知操作人员数据源的配置不正确;Configure the data source type and connection format, and determine whether the data source configuration is correct. If the data source configuration is correct, initialize the data source, read the data source connection information, import the data source management interface, and perform connection permission calibration. Verification, if the configuration of the data source is incorrect, notify the operator that the configuration of the data source is incorrect; 配置完毕后,建立数据处理引擎模板,配置数据处理引擎的代码、输入参数、输出参数及基本信息,根据代码进行托管代码的在线编码,获取托管代码的编码特性,基于编码特性在预设编码函数库中选择目标编码方式对应的编码风格函数;After the configuration is completed, establish a data processing engine template, configure the code, input parameters, output parameters and basic information of the data processing engine, perform online coding of the managed code according to the code, obtain the coding characteristics of the managed code, and use the preset coding function based on the coding characteristics. Select the coding style function corresponding to the target coding method in the library; 对编码风格函数进行编译,获得托管编码函数,对代码进行统一预处理以生成托管编码函数对应的函数调用信息;利用托管编码函数基于函数调用信息进行托管编译,获取数据处理引擎的运行信息编译文件;根据运行信息编译文件生成数据处理引擎的应用软件。Compile the coding style function to obtain the managed coding function, and uniformly preprocess the code to generate the function call information corresponding to the managed coding function; use the managed coding function to perform managed compilation based on the function call information, and obtain the running information compiled file of the data processing engine ; Application software that compiles files to generate data processing engines based on running information. 2.如权利要求1所述的基于容器技术的任务调度服务方法,其特征在于,通过基础镜像创建新的容器的过程,包含以下步骤:2. The task scheduling service method based on container technology as claimed in claim 1, characterized in that the process of creating a new container through a basic image includes the following steps: 获取基础镜像的环境信息,通过基础镜像创建并开启新的容器,环境信息包含基础环境、语言、计算包及连接包;Obtain the environment information of the basic image, create and start a new container through the basic image, the environment information includes the basic environment, language, computing package and connection package; 通过镜像容器管理工具构建基础镜像,并存储于镜像仓库,创建容器组,得到新的容器;根据创建的容器组调度到具体的容器管理服务,由容器管理服务调用服务器;Build a basic image through the image container management tool, store it in the image warehouse, create a container group, and obtain a new container; schedule it to a specific container management service according to the created container group, and the container management service calls the server; 容器管理服务配置Python包索引源,Python包索引源支持下载并安装新的Python包;通过镜像容器管理工具页面添加,并创建个人虚拟环境,个人虚拟环境在新的容器中实时产生虚拟环境,虚拟环境创建一个独立的Python环境,其中包含Python解释器和项目所需的包。The container management service configures the Python package index source. The Python package index source supports downloading and installing new Python packages; it is added through the image container management tool page and creates a personal virtual environment. The personal virtual environment generates a virtual environment in a new container in real time. Environment creates a standalone Python environment that contains the Python interpreter and the packages required for the project. 3.如权利要求1所述的基于容器技术的任务调度服务方法,其特征在于,虚拟环境的创建过程,包含以下步骤:3. The task scheduling service method based on container technology as claimed in claim 1, characterized in that the creation process of the virtual environment includes the following steps: 接收创建虚拟环境的请求,根据虚拟环境的环境信息选择基础镜像,根据基础镜像创建新的容器;Receive a request to create a virtual environment, select a base image based on the environment information of the virtual environment, and create a new container based on the base image; 解析请求,得到创建虚拟环境的数量,将新的容器划分为第一基础容器和第二基础容器,利用第二基础容器构建基础环境,得到Python解释器和依赖包;Parse the request to get the number of virtual environments created, divide the new container into a first basic container and a second basic container, use the second basic container to build a basic environment, and get the Python interpreter and dependency packages; 利用第一基础容器构建虚拟环境和基础环境,在虚拟环境中配置Python解释器,根据配置的环境信息和依赖包,创建虚拟环境。Use the first basic container to build a virtual environment and a basic environment, configure the Python interpreter in the virtual environment, and create a virtual environment based on the configured environment information and dependency packages. 4.如权利要求1所述的基于容器技术的任务调度服务方法,其特征在于,添加任务队列获取任务对应的虚拟环境,并保存调度的参数的过程,包含以下步骤:4. The task scheduling service method based on container technology as claimed in claim 1, characterized in that the process of adding a task queue to obtain the virtual environment corresponding to the task and saving the scheduling parameters includes the following steps: 接收到任务调起的请求,将任务信息推送到消息队列RabbitMQ中;Receive the request to activate the task and push the task information to the message queue RabbitMQ; 获得消息队列RabbitMQ的配置指令,根据配置指令构建RabbitMQ管理中心,获得队列的任务信息推送的路径配置信息,根据路径配置信息对队列进行监听,任务调度服务获得监听结果;获取到任务信息后,根据参数计算所需系统资源和文件资源,包括 CPU 核心数、内存用量及硬盘资源,并将任务添加到任务队列中;Obtain the configuration instructions of the message queue RabbitMQ, build the RabbitMQ management center according to the configuration instructions, obtain the path configuration information for pushing the task information of the queue, monitor the queue according to the path configuration information, and the task scheduling service obtains the monitoring results; after obtaining the task information, according to Parameter calculation requires system resources and file resources, including the number of CPU cores, memory usage and hard disk resources, and adds the task to the task queue; 任务队列遵循先进先出的原则,按照顺序排队等待执行;当任务排队到执行时,服务根据任务的参数获取任务类型和调度类型,以及所需的虚拟环境和保存调度的参数;根据参数,调用不同的执行类进行任务执行。The task queue follows the first-in-first-out principle and queues up in order to wait for execution; when the task is queued for execution, the service obtains the task type and scheduling type according to the parameters of the task, as well as the required virtual environment and saved scheduling parameters; according to the parameters, call Different execution classes perform task execution. 5.如权利要求1所述的基于容器技术的任务调度服务方法,其特征在于,调用不同的执行类进行任务执行的过程,包含以下步骤:5. The task scheduling service method based on container technology as claimed in claim 1, characterized in that the process of calling different execution classes for task execution includes the following steps: 激活虚拟环境,虚拟环境通过虚拟环境管理界面远程创建,虚拟环境是预先配置的,用于执行特定的任务或程序;动态加载执行代码,根据动态加载指令读取执行代码,并通过配置端口对执行代码进行动态加载或更新,对其内部的可编程资源进行重新配置,将所需执行代码实例化成可执行类;Activate the virtual environment. The virtual environment is created remotely through the virtual environment management interface. The virtual environment is pre-configured and used to perform specific tasks or programs; dynamically load the execution code, read the execution code according to the dynamic loading instructions, and execute the execution through the configuration port The code is dynamically loaded or updated, its internal programmable resources are reconfigured, and the required execution code is instantiated into an executable class; 使用预先配置好的参数调用可执行类的具体方法,执行任务;Use preconfigured parameters to call specific methods of the executable class to perform tasks; 获取到任务执行的返回结果后,根据需求进行处理;通过超文本传输协议HTTP的形式将结果返回给调用方,或者将数据通过内置的数据落表函数保存至临时表,并通过超文本传输协议HTTP的方式告知调用方保存信息。After obtaining the return result of the task execution, process it according to the needs; return the result to the caller in the form of Hypertext Transfer Protocol HTTP, or save the data to a temporary table through the built-in data table function, and use the Hypertext Transfer Protocol The HTTP method tells the caller to save the information. 6.一种基于容器技术的任务调度服务系统,其特征在于,包含:6. A task scheduling service system based on container technology, which is characterized by including: 信息处理模块,负责获取数据源连接信息,利用连接信息进行通信,并调起对应代码和监听状态;对数据源进行配置,得到代码的托管模板;The information processing module is responsible for obtaining the data source connection information, using the connection information to communicate, and calling up the corresponding code and monitoring status; configuring the data source and obtaining the hosting template of the code; 环境创建模块,负责通过基础镜像创建新的容器,配置Python包索引源;同时创建并管理专属的虚拟环境;The environment creation module is responsible for creating new containers through basic images and configuring Python package index sources; at the same time, creating and managing exclusive virtual environments; 任务处理模块,负责当有任务被调起后,推送执行请求至任务调度服务中,根据推送的任务的参数信息,计算对应系统的资源及文件资源,添加任务队列获取任务对应的虚拟环境,并保存调度的参数;The task processing module is responsible for pushing execution requests to the task scheduling service when a task is called up. Based on the parameter information of the pushed task, it calculates the resources and file resources of the corresponding system, adds a task queue to obtain the virtual environment corresponding to the task, and Save scheduling parameters; 环境创建模块,包含:Environment creation module, including: 容器创建子模块,负责获取基础镜像的环境信息,通过基础镜像创建并开启新的容器,环境信息包含基础环境、语言、计算包及连接包;The container creation submodule is responsible for obtaining the environment information of the basic image, creating and opening a new container through the basic image. The environment information includes the basic environment, language, computing package and connection package; 容器调度子模块,负责通过镜像容器管理工具构建基础镜像,并存储于镜像仓库,创建容器组,得到新的容器;根据创建的容器组调度到具体的容器管理服务,由容器管理服务调用服务器;The container scheduling sub-module is responsible for building a basic image through the image container management tool, storing it in the image warehouse, creating a container group, and obtaining a new container; scheduling to a specific container management service based on the created container group, and the container management service calls the server; 服务配置子模块,负责容器管理服务配置Python包索引源,Python包索引源支持下载并安装新的Python包;通过镜像容器管理工具页面添加,并创建个人虚拟环境,个人虚拟环境在新的容器中实时产生虚拟环境,虚拟环境创建一个独立的Python环境,其中包含Python解释器和项目所需的包。The service configuration submodule is responsible for container management service configuration Python package index source. The Python package index source supports downloading and installing new Python packages; it is added through the image container management tool page and creates a personal virtual environment. The personal virtual environment is in a new container. Generate a virtual environment in real time. The virtual environment creates an independent Python environment that contains the Python interpreter and the packages required for the project. 7.如权利要求6所述的基于容器技术的任务调度服务系统,其特征在于,任务处理模块,包含:7. The task scheduling service system based on container technology as claimed in claim 6, characterized in that the task processing module includes: 信息推送子模块,负责接收到任务调起的请求,将任务信息推送到消息队列RabbitMQ中;The information push sub-module is responsible for receiving the task call request and pushing the task information to the message queue RabbitMQ; 指令处理子模块,负责获得消息队列RabbitMQ的配置指令,根据配置指令构建RabbitMQ管理中心,获得队列的任务信息推送的路径配置信息,根据路径配置信息对队列进行监听,任务调度服务获得监听结果;获取到任务信息后,根据参数计算所需系统资源和文件资源,包括 CPU 核心数、内存用量及硬盘资源,并将任务添加到任务队列中;The instruction processing sub-module is responsible for obtaining the configuration instructions of the message queue RabbitMQ, building the RabbitMQ management center according to the configuration instructions, obtaining the path configuration information for pushing the task information of the queue, monitoring the queue according to the path configuration information, and the task scheduling service obtains the monitoring results; obtain After obtaining the task information, calculate the required system resources and file resources according to the parameters, including the number of CPU cores, memory usage and hard disk resources, and add the task to the task queue; 任务执行子模块,负责任务队列遵循先进先出的原则,按照顺序排队等待执行;当任务排队到执行时,服务根据任务的参数获取任务类型和调度类型,以及所需的虚拟环境和保存调度的参数;根据参数,调用不同的执行类进行任务执行。The task execution submodule is responsible for the task queue following the first-in-first-out principle and queuing up in order for execution; when the task is queued for execution, the service obtains the task type and scheduling type according to the parameters of the task, as well as the required virtual environment and saved schedule. Parameters; according to the parameters, different execution classes are called for task execution.
CN202311369652.8A 2023-10-23 2023-10-23 Task scheduling service method and system based on container technology Active CN117112184B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311369652.8A CN117112184B (en) 2023-10-23 2023-10-23 Task scheduling service method and system based on container technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311369652.8A CN117112184B (en) 2023-10-23 2023-10-23 Task scheduling service method and system based on container technology

Publications (2)

Publication Number Publication Date
CN117112184A CN117112184A (en) 2023-11-24
CN117112184B true CN117112184B (en) 2024-02-02

Family

ID=88809432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311369652.8A Active CN117112184B (en) 2023-10-23 2023-10-23 Task scheduling service method and system based on container technology

Country Status (1)

Country Link
CN (1) CN117112184B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118860651B (en) * 2024-07-16 2025-05-30 上海迪维欧电子设备有限公司 AI application distributed container management method and system based on K8s
CN120086207B (en) * 2025-04-30 2025-08-05 阿里云计算有限公司 Database system, data transmission method, computing device and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9367305B1 (en) * 2015-05-27 2016-06-14 Runnable Inc. Automatic container definition
US10303492B1 (en) * 2017-12-13 2019-05-28 Amazon Technologies, Inc. Managing custom runtimes in an on-demand code execution system
US10318347B1 (en) * 2017-03-28 2019-06-11 Amazon Technologies, Inc. Virtualized tasks in an on-demand network code execution system
CN111897622A (en) * 2020-06-10 2020-11-06 中国科学院计算机网络信息中心 High-throughput computing method and system based on container technology
CN112099843A (en) * 2020-09-15 2020-12-18 平安付科技服务有限公司 Code hosting platform management method, device, computer equipment and storage medium
CN115658248A (en) * 2022-09-09 2023-01-31 百度在线网络技术(北京)有限公司 Task scheduling method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9367305B1 (en) * 2015-05-27 2016-06-14 Runnable Inc. Automatic container definition
US10318347B1 (en) * 2017-03-28 2019-06-11 Amazon Technologies, Inc. Virtualized tasks in an on-demand network code execution system
US10303492B1 (en) * 2017-12-13 2019-05-28 Amazon Technologies, Inc. Managing custom runtimes in an on-demand code execution system
CN111897622A (en) * 2020-06-10 2020-11-06 中国科学院计算机网络信息中心 High-throughput computing method and system based on container technology
CN112099843A (en) * 2020-09-15 2020-12-18 平安付科技服务有限公司 Code hosting platform management method, device, computer equipment and storage medium
CN115658248A (en) * 2022-09-09 2023-01-31 百度在线网络技术(北京)有限公司 Task scheduling method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
分布式视频增强转码系统设计与优化;张旭升;中国优秀硕士学位论文全文数据库信息科技辑(月刊)(第01期);I136-531 *
医药协同数据采集技术的研究与实现;杜丽丽;中国优秀硕士学位论文全文数据库信息科技辑(月刊)(第01期);I140-2462 *

Also Published As

Publication number Publication date
CN117112184A (en) 2023-11-24

Similar Documents

Publication Publication Date Title
CN117112184B (en) Task scheduling service method and system based on container technology
EP3895010B1 (en) Performance-based hardware emulation in an on-demand network code execution system
CN106789339B (en) Distributed cloud simulation method and system based on lightweight virtualization framework
CN107766126B (en) Container mirror image construction method, system and device and storage medium
Liu et al. PERTS: A prototyping environment for real-time systems
CN108429787B (en) Container deployment method and device, computer storage medium and terminal
Zhou et al. Container orchestration on HPC systems
JP2022097438A (en) Dynamic cloud deployment of robotic process automation (RPA) robots
CN116541134B (en) Deployment method and device for containers in multi-architecture clusters
Sundas et al. An introduction of CloudSim simulation tool for modelling and scheduling
CN112395736A (en) Parallel simulation job scheduling method of distributed interactive simulation system
CN113268332B (en) Continuous integration method and device
CN112199184A (en) Cross-language task scheduling method, device, equipment and readable storage medium
CN113297080B (en) A simple test method for the Internet of Things platform
CN112564979A (en) Execution method and device for construction task, computer equipment and storage medium
Lin et al. Modeling and simulation of spark streaming
Straesser et al. Kubernetes-in-the-Loop: enriching microservice simulation through authentic container orchestration
CN111522623B (en) Modularized software multi-process running system
Casini et al. Addressing analysis and partitioning issues for the Waters 2019 challenge
WO2022109932A1 (en) Multi-task submission system based on slurm computing platform
CN118036648A (en) Method for quickly constructing neural network model for software defined satellite
CN117873490A (en) Multi-edge management framework system based on OpenFaaS
Balasubramanian et al. DREMS ML: A wide spectrum architecture design language for distributed computing platforms
EP4155905A1 (en) Deploying a radio access network containerized network function (ran cnf) that is portable across a plurality of ran hardware platforms
CN112445595B (en) Multitask submission system based on slurm computing platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PP01 Preservation of patent right

Effective date of registration: 20250328

Granted publication date: 20240202

PP01 Preservation of patent right
PD01 Discharge of preservation of patent

Date of cancellation: 20250714

Granted publication date: 20240202

PD01 Discharge of preservation of patent