Disclosure of Invention
In order to solve the technical problems, the invention provides a task scheduling service method based on a container technology, which comprises the following steps:
acquiring data source connection information, communicating by using the connection information, and calling up a corresponding code and a monitoring state; configuring a data source to obtain a managed template of a code;
creating a new container through the basic mirror image, and configuring a Python package index source; creating and managing exclusive virtual environment;
after a task is called up, pushing an execution request to a task scheduling service, calculating resources and file resources of a corresponding system according to parameter information of the pushed task, adding a task queue to obtain a virtual environment corresponding to the task, and storing the scheduled parameters.
Optionally, the process of obtaining the managed template of the code includes the steps of:
the data source is configured in type and connection format, whether the data source is configured correctly is judged, if the data source is configured correctly, the data source is initialized, the data source connection information is read, the data source management interface is imported and the connection authority is checked, and if the data source is configured incorrectly, operators are informed that the data source is configured incorrectly;
After configuration is finished, a data processing engine template is established, codes, input parameters, output parameters and basic information of the data processing engine are configured, on-line coding of managed codes is carried out according to the codes, coding characteristics of the managed codes are obtained, and a coding style function corresponding to a target coding mode is selected from a preset coding function library based on the coding characteristics;
compiling the coding style function to obtain a managed coding function, and uniformly preprocessing codes to generate function call information corresponding to the managed coding function; utilizing the managed coding function to carry out managed compiling based on the function call information to obtain an operation information compiling file of the data processing engine; and compiling the file according to the running information to generate the application software of the data processing engine.
Optionally, the process of creating a new container by the base image comprises the steps of:
acquiring environment information of a basic image, creating and starting a new container through the basic image, wherein the environment information comprises a basic environment, a language, a computing package and a connection package;
constructing a basic mirror image through a mirror image container management tool, storing the basic mirror image in a mirror image warehouse, and creating a container group to obtain a new container; dispatching to a specific container management service according to the created container group, and calling a server by the container management service;
The container management service configures a Python packet index source which supports downloading and installing a new Python packet; by mirroring the container management tool page add-on and creating a personal virtual environment, the personal virtual environment creates a virtual environment in real time in the new container, the virtual environment creates a separate Python environment containing the package required for the Python interpreter and project.
Optionally, the creating process of the virtual environment includes the following steps:
receiving a request for creating a virtual environment, selecting a basic mirror image according to the environment information of the virtual environment, and creating a new container according to the basic mirror image;
analyzing the request to obtain the number of created virtual environments, dividing the new container into a first basic container and a second basic container, and constructing the basic environment by using the second basic container to obtain a Python interpreter and a dependency package;
and constructing a virtual environment and a basic environment by using the first basic container, configuring a Python interpreter in the virtual environment, and creating the virtual environment according to the configured environment information and the dependency package.
Optionally, a process of adding a task queue to obtain a virtual environment corresponding to a task and storing a scheduled parameter includes the following steps:
receiving a task calling request, and pushing task information into a message queue RabbitMQ;
Obtaining a configuration instruction of a message queue RabbitMQ, constructing a RabbitMQ management center according to the configuration instruction, obtaining path configuration information pushed by task information of the queue, monitoring the queue according to the path configuration information, and obtaining a monitoring result by a task scheduling service; after acquiring task information, calculating required system resources and file resources including CPU core number, memory consumption and hard disk resources according to parameters, and adding tasks into a task queue;
the task queue follows the principle of first-in first-out, and waits to be executed according to the sequence; when the task is queued to be executed, the service acquires the task type and the scheduling type according to the parameters of the task, and the required virtual environment and the saved scheduling parameters; and calling different execution classes to execute the task according to the parameters.
Optionally, a process of calling different execution classes to execute tasks includes the following steps:
activating a virtual environment, wherein the virtual environment is remotely created through a virtual environment management interface, and the virtual environment is preconfigured and is used for executing specific tasks or programs; dynamically loading an execution code, reading the execution code according to a dynamic loading instruction, dynamically loading or updating the execution code through a configuration port, reconfiguring programmable resources in the execution code, and instantiating the required execution code into an executable class;
Calling a specific method of the executable class by using the pre-configured parameters to execute tasks;
after a return result of task execution is obtained, processing is carried out according to requirements; the result is returned to the calling party through the HTTP mode, or the data is stored to the temporary table through the built-in data table falling function, and the calling party is informed of the stored information through the HTTP mode.
The invention provides a task scheduling service system based on a container technology, which comprises the following components:
the information processing module is in charge of acquiring data source connection information, communicating by using the connection information, and calling up a corresponding code and a monitoring state; configuring a data source to obtain a managed template of a code;
the environment creation module is responsible for creating a new container through the basic mirror image and configuring a Python package index source; creating and managing exclusive virtual environment;
and the task processing module is responsible for pushing an execution request to a task scheduling service after a task is scheduled, calculating resources and file resources of a corresponding system according to the parameter information of the pushed task, adding a task queue to acquire a virtual environment corresponding to the task, and storing the scheduled parameters.
Optionally, the environment creation module includes:
the container creation sub-module is responsible for acquiring environment information of the basic mirror image, creating and starting a new container through the basic mirror image, wherein the environment information comprises a basic environment, a language, a calculation package and a connection package;
the container scheduling sub-module is responsible for constructing a basic mirror image through a mirror image container management tool, storing the basic mirror image in a mirror image warehouse, and creating a container group to obtain a new container; dispatching to a specific container management service according to the created container group, and calling a server by the container management service;
the service configuration sub-module is in charge of the container management service to configure a Python packet index source, and the Python packet index source supports downloading and installing a new Python packet; by mirroring the container management tool page add-on and creating a personal virtual environment, the personal virtual environment creates a virtual environment in real time in the new container, the virtual environment creates a separate Python environment containing the package required for the Python interpreter and project.
Optionally, the task processing module includes:
the information pushing sub-module is in charge of receiving a task calling request and pushing task information into the message queue RabbitMQ;
the instruction processing sub-module is responsible for obtaining configuration instructions of the message queue RabbitMQ, constructing a RabbitMQ management center according to the configuration instructions, obtaining path configuration information pushed by task information of the queue, monitoring the queue according to the path configuration information, and obtaining a monitoring result by the task scheduling service; after acquiring task information, calculating required system resources and file resources including CPU core number, memory consumption and hard disk resources according to parameters, and adding tasks into a task queue;
The task execution sub-module is responsible for queuing and waiting to be executed according to the sequence, wherein the task queue follows the principle of first-in first-out; when the task is queued to be executed, the service acquires the task type and the scheduling type according to the parameters of the task, and the required virtual environment and the saved scheduling parameters; and calling different execution classes to execute the task according to the parameters.
Firstly, acquiring data source connection information, communicating by using the connection information, and calling up a corresponding code and a monitoring state; configuring a data source to obtain a managed template of a code; secondly, creating a new container through the basic mirror image, and configuring a Python package index source; creating and managing exclusive virtual environment; finally, after the task is called up, pushing an execution request to a task scheduling service, calculating the conditions of corresponding system resources, file resources and the like according to the parameter information of the pushed task, adding a task queue to obtain a virtual environment corresponding to the task, and storing the scheduled parameter; the task scheduling execution service is adopted, the cooling convenience of the container technology is utilized in the technical field of data processing, the integration of multiple virtual environments is realized, and the stronger execution efficiency is provided; the expansion of multiple environments is realized, so that different users of the same platform have more choices, and the users can completely define respective virtual environments independently to execute different codes, thereby enabling sql, python, spark codes to be integrated rapidly in the same system; the diversity of functions greatly improves the corresponding code diversity, can also be rapidly supported on the subsequent expansion of other execution code classes, and can be rapidly invoked by a task mode by only installing different packages to create independent virtual environments; the effectiveness of environment isolation, because the code execution layer is completely stripped out and is completely submitted to the task scheduling service center for execution, the coupling with the original system is completely isolated, the stability and the safety of the system are greatly improved, the system problems caused in the process of executing task codes, such as os operation of python codes, are avoided, the breakdown of the system caused by incorrect operation is avoided, and compared with the system of an unserviceable execution component at present, the system is incomparable in the aspects of execution efficiency, safety, diversity and the like; the containerized deployment mode supports quick deployment and distributed deployment, when task execution services are deployed in a cluster mode, the execution efficiency is greatly improved, and compared with the prior art that new environments are reconfigured and redeployed manually according to environments required by users, the system has the characteristics of simplicity and rapidness in efficiency and simplicity; even code developers lacking related container learning background can rapidly deploy the required environment, so that the learning cost of the developers and the time cost consumed by deployment are saved to the maximum extent; the device is provided with various data source adaptation components, supports spark, hive, mysql and the like, and can perform the operations of adding, deleting and modifying various data sources without depending on other components; unlike prior art solutions, the coupled databases are connected into the system and depending on the particular circumstances, the user's existing database cannot be directly used as a data source. The invention integrates the data source adaptation and the code scheduling adaptation, so that the old index output codes of the former developers can be invoked and scheduled without modification, and the modification cost and time are saved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the application. As used in the examples and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art as the case may be.
Example 1: as shown in fig. 1, an embodiment of the present invention provides a task scheduling service method based on container technology, which includes the following steps:
s100: acquiring data source connection information, communicating by using the connection information, and calling up a corresponding code and a monitoring state; configuring a data source to obtain a managed template of a code;
s200: creating a new container through the basic mirror image, and configuring a Python package index source; creating and managing exclusive virtual environment;
s300: after a task is called up, pushing an execution request to a task scheduling service, calculating the conditions of resources, file resources and the like of a corresponding system according to the parameter information of the pushed task, adding a task queue to obtain a virtual environment corresponding to the task, and storing the scheduled parameter;
The working principle and beneficial effects of the technical scheme are as follows: firstly, acquiring data source connection information, communicating by using the connection information, and calling up a corresponding code and a monitoring state; configuring a data source to obtain a managed template of a code; secondly, creating a new container through the basic mirror image, and configuring a Python package index source; creating and managing exclusive virtual environment; finally, after the task is called up, pushing an execution request to a task scheduling service, calculating the conditions of corresponding system resources, file resources and the like according to the parameter information of the pushed task, adding a task queue to obtain a virtual environment corresponding to the task, and storing the scheduled parameter; the scheme adopts task scheduling execution service, and the cooling convenience of the container technology is utilized in the technical field of data processing, so that the integration of multiple virtual environments is realized, and stronger execution efficiency is provided; the expansion of multiple environments is realized, so that different users of the same platform have more choices, and the users can completely define respective virtual environments independently to execute different codes, thereby enabling sql, python, spark codes to be integrated rapidly in the same system; the diversity of functions greatly improves the corresponding code diversity, can also be rapidly supported on the subsequent expansion of other execution code classes, and can be rapidly invoked by a task mode by only installing different packages to create independent virtual environments; the effectiveness of environment isolation, because the code execution layer is completely stripped out and is completely submitted to the task scheduling service center for execution, the coupling with the original system is completely isolated, the stability and the safety of the system are greatly improved, the system problems caused in the process of executing task codes, such as os operation of python codes, are avoided, the breakdown of the system caused by incorrect operation is avoided, and compared with the system of an unserviceable execution component at present, the implementation efficiency, the safety, the diversity and the like of the embodiment are incomparable; the containerized deployment mode supports quick deployment and distributed deployment, when task execution services are deployed in a cluster mode, the execution efficiency is greatly improved, and compared with the prior art that new environments are reconfigured and redeployed manually according to environments required by users, the system has the characteristics of simplicity and rapidness in efficiency and simplicity; even code developers lacking related container learning background can rapidly deploy the required environment, so that the learning cost of the developers and the time cost consumed by deployment are saved to the maximum extent; the device is provided with various data source adaptation components, supports spark, hive, mysql and the like, and can perform the operations of adding, deleting and modifying various data sources without depending on other components; unlike prior art solutions, the coupled databases are connected into the system and depending on the particular circumstances, the user's existing database cannot be directly used as a data source. The data source adaptation and the code scheduling adaptation integrated in the embodiment enable old index output codes of the former developers to be invoked and scheduled without modification, and save modification cost and time.
Example 2: as shown in fig. 2, on the basis of embodiment 1, the process of obtaining the managed template of the code provided by the embodiment of the invention includes the following steps:
s104: the data source is configured in type and connection format, whether the data source is configured correctly is judged, if the data source is configured correctly, the data source is initialized, the data source connection information is read, the data source management interface is imported and the connection authority is checked, and if the data source is configured incorrectly, operators are informed that the data source is configured incorrectly;
s105: after configuration is finished, a data processing engine template is established, codes, input parameters, output parameters and basic information of the data processing engine are configured, on-line coding of managed codes is carried out according to the codes, coding characteristics of the managed codes are obtained, and a coding style function corresponding to a target coding mode is selected from a preset coding function library based on the coding characteristics;
s106: compiling the coding style function to obtain a managed coding function, and uniformly preprocessing codes to generate function call information corresponding to the managed coding function; utilizing the managed coding function to carry out managed compiling based on the function call information to obtain an operation information compiling file of the data processing engine; compiling a file according to the running information to generate application software of a data processing engine;
The working principle and beneficial effects of the technical scheme are as follows: firstly, configuring a type and a connection format of a data source, judging whether the configuration of the data source is correct, initializing the data source if the configuration of the data source is correct, reading the connection information of the data source, importing a data source management interface and checking the connection authority, and informing an operator that the configuration of the data source is incorrect if the configuration of the data source is incorrect; after the configuration is finished, a data processing engine template is established, codes, input parameters, output parameters and basic information of the data processing engine are configured, on-line coding of managed codes is carried out according to the codes, coding characteristics of the managed codes are obtained, and a coding style function corresponding to a target coding mode is selected from a preset coding function library based on the coding characteristics; finally compiling the coding style function to obtain a managed coding function, and uniformly preprocessing codes to generate function call information corresponding to the managed coding function; utilizing the managed coding function to carry out managed compiling based on the function call information to obtain an operation information compiling file of the data processing engine; compiling a file according to the running information to generate application software of a data processing engine; the scheme realizes the configuration and connection management of the data sources and the template configuration and code generation of the data processing engine; the correctness of the data source is ensured by configuring the data source, and the authority verification is carried out to ensure the reliability and the safety of the data; meanwhile, a data processing engine template is established and online coding is carried out, and a corresponding coding style function is generated according to requirements, so that flexible data processing operation is realized; and finally, the generated data processing engine application software improves the efficiency and accuracy of data processing and provides better data analysis and decision support. The embodiment provides a complete set of data processing flow and tool, which helps users to better manage and utilize data resources.
In the embodiment, the connection information of spark is imported through a data source management page of the page, connection authority verification is carried out, after the configuration of the data source is completed, a spark template of a platform menu is newly established, and the spark code, input parameters and output content level basic information of the template are configured, so that the hosting of the code on the platform is completed; after the template is possessed, the template is imported through the expert model of the page, tasks are arranged in a dragging mode, and task scheduling is carried out for the following process.
Example 3: as shown in fig. 3, on the basis of embodiment 1, the process of creating a new container by the base mirror image provided in the embodiment of the present invention includes the following steps:
s201: acquiring environment information of a basic image, creating and starting a new container through the basic image, wherein the environment information comprises a basic environment, a language, a computing package, a connection package and the like;
s202: constructing a basic mirror image through a mirror image container management tool, storing the basic mirror image in a mirror image warehouse, and creating a container group to obtain a new container; dispatching to a specific container management service according to the created container group, and calling a server by the container management service;
s203: the container management service configures a Python packet index source which supports downloading and installing a new Python packet; adding through a mirror image container management tool page, and creating a personal virtual environment, wherein the personal virtual environment generates a virtual environment in a new container in real time, and the virtual environment creates an independent Python environment which contains a Python interpreter of a specific version and a package required by an item;
The working principle and beneficial effects of the technical scheme are as follows: the method comprises the steps that firstly, environment information of a basic image is obtained, a new container is created and opened through the basic image, and the environment information comprises a basic environment, a language, a calculation package, a connection package and the like; secondly, constructing a basic mirror image through a mirror image container management tool, storing the basic mirror image into a mirror image warehouse, and creating a container group to obtain a new container; dispatching to a specific container management service according to the created container group, and calling a server by the container management service; finally, the container management service configures a Python packet index source which supports downloading and installing a new Python packet; adding through a mirror image container management tool page, and creating a personal virtual environment, wherein the personal virtual environment generates a virtual environment in a new container in real time, and the virtual environment creates an independent Python environment which contains a Python interpreter of a specific version and a package required by an item; by creating the personal virtual environment, the scheme can isolate the development environment of the user from the system environment, avoid the influence of the operation of the user on the system environment, and simultaneously avoid the interference of the change of the system environment on the development work of the user; the personal virtual environment allows a user to install a Python interpreter of a specific version and packages required by the items in the personal virtual environment, so that the user can conveniently manage the dependency relationship of the items, dependency conflict among different items is avoided, and each item can be ensured to normally run; through the personal virtual environment, a user can run a plurality of projects on the same machine, each project has an independent environment, and the user can rapidly switch the development environments of different projects, so that the development efficiency and flexibility are improved; the personal virtual environment can be packaged into a mirror image and stored in a mirror image warehouse, and when the project is deployed, the environment required by the project can be quickly built only by deploying the mirror image into the target environment, so that the deployment process is simplified. The embodiment creates a personal virtual environment, provides an independent and isolated development environment, is convenient for users to manage dependence, improves flexibility, simplifies the deployment process of projects, and is very significant for developers.
In this embodiment, the image container management, before service deployment, a base image of a dock needs to be prepared in advance, where the base image includes a python base environment, and a software language of python3 is built in (the service is implemented based on a python flash), and includes a common computing packet and a connection packet such as pandas, numpy and pyspark; when the new container is deployed for the first time, a new container is required to be pulled up through the basic mirror image, and all environment information is acquired; through the mirror image container management of the front-end page of the system, an internal pip source is configured, the pip source supports direct downloading and installation of new packages, and a dedicated personal virtual environment is created in a page adding mode and is completely independent of the environment of the system, and the environment is an environment only belonging to the user, and can produce the virtual environment in the container in real time.
Example 4: as shown in fig. 4, on the basis of embodiment 3, the creation process of the virtual environment provided by the embodiment of the present invention includes the following steps:
s2031: receiving a request for creating a virtual environment, selecting a basic mirror image according to the environment information of the virtual environment, and creating a new container according to the basic mirror image;
s2032: analyzing the request to obtain the number of created virtual environments, dividing the new container into a first basic container and a second basic container, and constructing the basic environment by using the second basic container to obtain a Python interpreter and a dependency package;
S2033: constructing a virtual environment and a basic environment by using a first basic container, configuring a Python interpreter in the virtual environment, and creating the virtual environment according to the configured environment information and the dependency package;
the working principle and beneficial effects of the technical scheme are as follows: the method comprises the steps of firstly receiving a request for creating a virtual environment, selecting a basic mirror image according to environment information of the virtual environment, and creating a new container according to the basic mirror image; secondly, analyzing the request to obtain the number of created virtual environments, dividing the new container into a first basic container and a second basic container, and constructing the basic environment by utilizing the second basic container to obtain a Python interpreter and a dependency package; finally, constructing a virtual environment and a basic environment by utilizing a first basic container, configuring a Python interpreter in the virtual environment, and creating the virtual environment according to the configured environment information and the dependency package (the principle is shown in figure 5); the scheme is isolated: by creating independent virtual environments, dependency conflicts among different projects are avoided, each virtual environment is provided with a Python interpreter and a dependency package, and the virtual environments are not mutually influenced; independence: each virtual environment is provided with Python interpreters and dependency packages of different versions, so that the requirements of different projects are met, and the required Python interpreters and dependency packages are obtained by dividing a basic container and constructing the basic environment; flexibility: creating a plurality of virtual environments according to the number specified in the request, wherein each environment is independently managed and maintained, and the specific requirements of the project are met by configuring environment information and a dependency package of the virtual environment; simplified management: by automating the process of creating the virtual environment, the workload of manual operations and configuration is reduced, and by sharing the base container and the base environment, the version and update of the dependency package is better managed and maintained. The embodiment provides an isolated, independent and flexible development environment, so that a developer can better manage and maintain the dependency relationship of the project, and the development efficiency and the maintainability of the project are improved.
Example 5: as shown in fig. 6, on the basis of embodiment 1, the process of adding a task queue to obtain a virtual environment corresponding to a task and storing a scheduled parameter according to the embodiment of the present invention includes the following steps:
s301: receiving a task calling request, and pushing task information into a message queue RabbitMQ;
s302: obtaining a configuration instruction of a message queue RabbitMQ, constructing a RabbitMQ management center according to the configuration instruction, obtaining path configuration information pushed by task information of the queue, monitoring the queue according to the path configuration information, and obtaining a monitoring result by a task scheduling service; after acquiring task information, calculating the conditions of required system resources, file resources and the like according to parameters, including CPU core number, memory consumption, hard disk resources and the like, and adding tasks into a task queue;
s303: the task queue follows the principle of first-in first-out, and waits to be executed according to the sequence; when the task is queued to be executed, the service acquires the task type and the scheduling type according to the parameters of the task, and the required virtual environment and the saved scheduling parameters; according to the parameters, different execution classes are called to execute tasks;
the working principle and beneficial effects of the technical scheme are as follows: firstly, receiving a task calling request, and pushing task information into a message queue RabbitMQ; secondly, obtaining a configuration instruction of the message queue RabbitMQ, constructing a RabbitMQ management center according to the configuration instruction, obtaining path configuration information pushed by task information of the queue, monitoring the queue according to the path configuration information, and obtaining a monitoring result by a task scheduling service; after acquiring task information, calculating the conditions of required system resources, file resources and the like according to parameters, including CPU core number, memory consumption, hard disk resources and the like, and adding tasks into a task queue; finally, the task queue follows the principle of first-in first-out, and waits to be executed according to the sequence; when the task is queued to be executed, the service acquires the task type and the scheduling type according to the parameters of the task, and the required virtual environment and the saved scheduling parameters; according to the parameters, different execution classes are called to execute tasks (the principle is shown in fig. 7); the scheme improves the task scheduling efficiency: the task information is pushed to the message queue, and the task information is acquired through the monitoring queue, so that asynchronous processing and concurrent execution of the task are realized, and the task scheduling efficiency is improved; flexibly configuring task scheduling: the execution mode and the path of the task are flexibly configured according to specific requirements through configuration instructions and path configuration information, so that the task scheduling is more flexible and controllable; resource management and scheduling: the system resources are reasonably managed and scheduled by calculating the conditions of required system resources, file resources and the like, including the number of CPU cores, the memory consumption, hard disk resources and the like, so that resource overload or waste is avoided, and the overall performance of the system is improved; task queue management: orderly execution of tasks is realized through a first-in first-out principle of a task queue, so that the loss or disorder of the tasks is avoided, and the accuracy and the integrity of task execution are ensured; support multiple task types and scheduling types: and according to the parameters of the task and the scheduling requirements, different execution classes are called to execute the task, and flexible application of various task types and scheduling types is supported. The embodiment provides an efficient, flexible and controllable task scheduling service, which can effectively manage and schedule tasks and improve the overall performance of a system and the accuracy of task execution.
Example 6: as shown in fig. 8, on the basis of embodiment 5, the process of calling different execution classes to execute tasks provided in the embodiment of the present invention includes the following steps:
s3031: activating a virtual environment, wherein the virtual environment is remotely created through a virtual environment management interface, and the virtual environment is preconfigured and is used for executing specific tasks or programs; dynamically loading an execution code, reading the execution code according to a dynamic loading instruction, dynamically loading or updating the execution code through a configuration port, reconfiguring programmable resources in the execution code, and instantiating the required execution code into an executable class;
s3032: calling a specific method of the executable class by using the pre-configured parameters to execute tasks;
s3033: after a return result of task execution is obtained, processing is carried out according to requirements; returning the result to the caller in the form of hypertext transfer protocol (HTTP); if the data volume is large, the data is stored to a temporary table through a built-in data falling table function, and the caller is informed of the storage information through a HTTP mode;
the working principle and beneficial effects of the technical scheme are as follows: the embodiment activates a virtual environment, the virtual environment is created remotely through a virtual environment management interface, and the virtual environment is preconfigured and is used for executing specific tasks or programs; dynamically loading an execution code, reading the execution code according to a dynamic loading instruction, dynamically loading or updating the execution code through a configuration port, reconfiguring programmable resources in the execution code, and instantiating the required execution code into an executable class; secondly, a specific method of the executable class is called by using the pre-configured parameters, and the task is executed; finally, after a return result of task execution is obtained, processing is carried out according to requirements; returning the result to the caller in the form of hypertext transfer protocol (HTTP); if the data volume is large, the data is stored to a temporary table through a built-in data falling table function, and the caller is informed of the storage information through a HTTP mode; the above scheme flexibility and reusability: the executable codes are packaged into executable classes, so that the codes are conveniently organized and multiplexed, and the flexibility and maintainability of the codes are improved; dynamic loading and configuration: the execution codes are dynamically loaded and dynamically loaded or updated according to the configured ports, and the execution codes are flexibly configured and reconfigured according to the needs, so that the method is suitable for different task demands; extensibility and customizable: calling a specific method of the executable class through the pre-configured parameters, executing various specific tasks, and processing a returned result according to requirements; meanwhile, a large amount of data is saved through the temporary table, and a calling party is informed of saving information in an HTTP mode, so that the expansibility and the customization of the system are improved. The embodiment processes and returns the result according to the requirements, and meets the requirements of different application scenes.
Example 7: as shown in fig. 9, on the basis of embodiment 1 to embodiment 6, the task scheduling service system based on container technology provided in the embodiment of the present invention includes:
the information processing module is in charge of acquiring data source connection information, communicating by using the connection information, and calling up a corresponding code and a monitoring state; configuring a data source to obtain a managed template of a code;
the environment creation module is responsible for creating a new container through the basic mirror image and configuring a Python package index source; creating and managing exclusive virtual environment;
the task processing module is in charge of pushing an execution request to a task scheduling service after a task is scheduled, calculating the conditions of resources, file resources and the like of a corresponding system according to the parameter information of the pushed task, adding a task queue to obtain a virtual environment corresponding to the task, and storing the scheduled parameters;
the working principle and beneficial effects of the technical scheme are as follows: the information processing module of the embodiment acquires the connection information of the data source, communicates by using the connection information, and calls up the corresponding code and the monitoring state; configuring a data source to obtain a managed template of a code; the environment creation module creates a new container through the basic mirror image and configures a Python package index source; creating and managing exclusive virtual environment; after a task is called, the task processing module pushes an execution request to a task scheduling service, calculates the conditions of resources, file resources and the like of a corresponding system according to the parameter information of the pushed task, adds a task queue to obtain a virtual environment corresponding to the task, and stores the scheduled parameters; the scheme adopts task scheduling execution service, and the cooling convenience of the container technology is utilized in the technical field of data processing, so that the integration of multiple virtual environments is realized, and stronger execution efficiency is provided; the expansion of multiple environments is realized, so that different users of the same platform have more choices, and the users can completely define respective virtual environments independently to execute different codes, thereby enabling sql, python, spark codes to be integrated rapidly in the same system; the diversity of functions greatly improves the corresponding code diversity, can also be rapidly supported on the subsequent expansion of other execution code classes, and can be rapidly invoked by a task mode by only installing different packages to create independent virtual environments; the effectiveness of environment isolation, because the code execution layer is completely stripped out and is completely submitted to the task scheduling service center for execution, the coupling with the original system is completely isolated, the stability and the safety of the system are greatly improved, the system problems caused in the process of executing task codes, such as os operation of python codes, are avoided, the breakdown of the system caused by incorrect operation is avoided, and compared with the system of an unserviceable execution component at present, the implementation efficiency, the safety, the diversity and the like of the embodiment are incomparable; the containerized deployment mode supports quick deployment and distributed deployment, when task execution services are deployed in a cluster mode, the execution efficiency is greatly improved, and compared with the prior art that new environments are reconfigured and redeployed manually according to environments required by users, the system has the characteristics of simplicity and rapidness in efficiency and simplicity; even code developers lacking related container learning background can rapidly deploy the required environment, so that the learning cost of the developers and the time cost consumed by deployment are saved to the maximum extent; the device is provided with various data source adaptation components, supports spark, hive, mysql and the like, and can perform the operations of adding, deleting and modifying various data sources without depending on other components; unlike prior art solutions, the coupled databases are connected into the system and depending on the particular circumstances, the user's existing database cannot be directly used as a data source. The data source adaptation and the code scheduling adaptation integrated in the embodiment enable old index output codes of the former developers to be invoked and scheduled without modification, and save modification cost and time.
Example 8: as shown in fig. 10, on the basis of embodiment 7, an information processing module provided in an embodiment of the present invention includes:
the configuration processing sub-module is in charge of carrying out the configuration of the type and the connection format on the data source, judging whether the configuration of the data source is correct, initializing the data source if the configuration of the data source is correct, reading the connection information of the data source, importing a data source management interface and carrying out connection permission verification, and informing operators that the configuration of the data source is incorrect if the configuration of the data source is incorrect;
the code support pipe module is in charge of establishing a data processing engine template after configuration is completed, configuring codes, input parameters, output parameters and basic information of the data processing engine, carrying out on-line coding of managed codes according to the codes, acquiring coding characteristics of the managed codes, and selecting a coding style function corresponding to a target coding mode in a preset coding function library based on the coding characteristics;
the information compiling sub-module is responsible for compiling the coding style function to obtain a managed coding function, and uniformly preprocessing codes to generate function call information corresponding to the managed coding function; utilizing the managed coding function to carry out managed compiling based on the function call information to obtain an operation information compiling file of the data processing engine; compiling a file according to the running information to generate application software of a data processing engine;
The working principle and beneficial effects of the technical scheme are as follows: the configuration processing submodule of the embodiment carries out the configuration of the type and the connection format on the data source, judges whether the configuration of the data source is correct, if the configuration of the data source is correct, initializes the data source, reads the connection information of the data source, imports a data source management interface and carries out connection permission verification, and if the configuration of the data source is incorrect, informs operators that the configuration of the data source is incorrect; after the code support pipe module is configured, a data processing engine template is established, codes, input parameters, output parameters and basic information of the data processing engine are configured, on-line coding of managed codes is carried out according to the codes, coding characteristics of the managed codes are obtained, and a coding style function corresponding to a target coding mode is selected from a preset coding function library based on the coding characteristics; the information compiling sub-module compiles the coding style function to obtain a managed coding function, and uniformly pre-processes codes to generate function call information corresponding to the managed coding function; utilizing the managed coding function to carry out managed compiling based on the function call information to obtain an operation information compiling file of the data processing engine; compiling a file according to the running information to generate application software of a data processing engine; the scheme realizes the acquisition of the data source connection information: information required by connecting an external data source, such as a host name, a port number, a user name, a password and the like, is obtained by analyzing the configuration information, and is a key for connecting the external data source, so that a data processing engine can accurately acquire index data; establishing a communication connection: through the data source connection information, each node in the data processing engine cluster can establish communication connection, realize the coordination of data transmission and tasks, transmit index data from the data source to each node, and perform distributed data processing and analysis in the cluster; the data processing efficiency is improved: by establishing communication connection, parallel processing of data is realized in the data processing engine cluster, and each node can process different data fragments simultaneously, so that the efficiency and speed of data processing are greatly improved, and the method is particularly important for large-scale data sets and complex data processing tasks; task coordination and allocation are realized: after communication connection is established, each node in the data processing engine cluster coordinates and distributes tasks, data and computing resources can be shared among the nodes, and the tasks are reasonably distributed according to the characteristics of the tasks and the load conditions of the nodes, so that more efficient data processing is realized. The embodiment realizes the coordination of data transmission and tasks in the data processing engine cluster, improves the efficiency and performance of data processing, and can better process and analyze big data; configuration and connection management of the data sources are realized, and template configuration and code generation of a data processing engine are realized; the correctness of the data source is ensured by configuring the data source, and the authority verification is carried out to ensure the reliability and the safety of the data; meanwhile, a data processing engine template is established and online coding is carried out, and a corresponding coding style function is generated according to requirements, so that flexible data processing operation is realized; and finally, the generated data processing engine application software improves the efficiency and accuracy of data processing and provides better data analysis and decision support. The embodiment provides a complete set of data processing flow and tool, which helps users to better manage and utilize data resources.
Example 9: as shown in fig. 11, on the basis of embodiment 7, an environment creation module provided in an embodiment of the present invention includes:
the container creation sub-module is responsible for acquiring environment information of the basic mirror image, creating and starting a new container through the basic mirror image, wherein the environment information comprises a basic environment, a language, a calculation package, a connection package and the like;
the container scheduling sub-module is responsible for constructing a basic mirror image through a mirror image container management tool, storing the basic mirror image in a mirror image warehouse, and creating a container group to obtain a new container; dispatching to a specific container management service according to the created container group, and calling a server by the container management service;
the service configuration sub-module is in charge of the container management service to configure a Python packet index source, and the Python packet index source supports downloading and installing a new Python packet; adding through a mirror image container management tool page, and creating a personal virtual environment, wherein the personal virtual environment generates a virtual environment in a new container in real time, and the virtual environment creates an independent Python environment which contains a Python interpreter of a specific version and a package required by an item;
the working principle and beneficial effects of the technical scheme are as follows: the container creation submodule of the embodiment obtains the environment information of the basic mirror image, creates and opens a new container through the basic mirror image, and the environment information comprises a basic environment, a language, a calculation packet, a connection packet and the like; the container scheduling sub-module constructs a basic mirror image through a mirror image container management tool, stores the basic mirror image in a mirror image warehouse, and creates a container group to obtain a new container; dispatching to a specific container management service according to the created container group, and calling a server by the container management service; the service configuration submodule container management service configures a Python package index source which supports downloading and installing a new Python package; adding through a mirror image container management tool page, and creating a personal virtual environment, wherein the personal virtual environment generates a virtual environment in a new container in real time, and the virtual environment creates an independent Python environment which contains a Python interpreter of a specific version and a package required by an item; by creating the personal virtual environment, the scheme can isolate the development environment of the user from the system environment, avoid the influence of the operation of the user on the system environment, and simultaneously avoid the interference of the change of the system environment on the development work of the user; the personal virtual environment allows a user to install a Python interpreter of a specific version and packages required by the items in the personal virtual environment, so that the user can conveniently manage the dependency relationship of the items, dependency conflict among different items is avoided, and each item can be ensured to normally run; through the personal virtual environment, a user can run a plurality of projects on the same machine, each project has an independent environment, and the user can rapidly switch the development environments of different projects, so that the development efficiency and flexibility are improved; the personal virtual environment can be packaged into a mirror image and stored in a mirror image warehouse, and when the project is deployed, the environment required by the project can be quickly built only by deploying the mirror image into the target environment, so that the deployment process is simplified. The embodiment creates a personal virtual environment, provides an independent and isolated development environment, is convenient for users to manage dependence, improves flexibility, simplifies the deployment process of projects, and is very significant for developers.
Example 10: as shown in fig. 12, on the basis of embodiment 7, a task processing module provided in an embodiment of the present invention includes:
the information pushing sub-module is in charge of receiving a task calling request and pushing task information into the message queue RabbitMQ;
the instruction processing sub-module is responsible for obtaining configuration instructions of the message queue RabbitMQ, constructing a RabbitMQ management center according to the configuration instructions, obtaining path configuration information pushed by task information of the queue, monitoring the queue according to the path configuration information, and obtaining a monitoring result by the task scheduling service; after acquiring task information, calculating the conditions of required system resources, file resources and the like according to parameters, including CPU core number, memory consumption, hard disk resources and the like, and adding tasks into a task queue;
the task execution sub-module is responsible for queuing and waiting to be executed according to the sequence, wherein the task queue follows the principle of first-in first-out; when the task is queued to be executed, the service acquires the task type and the scheduling type according to the parameters of the task, and the required virtual environment and the saved scheduling parameters; according to the parameters, different execution classes are called to execute tasks;
the working principle and beneficial effects of the technical scheme are as follows: the information pushing submodule of the embodiment receives a request for task call and pushes task information to the message queue RabbitMQ; the instruction processing sub-module obtains a configuration instruction of the message queue RabbitMQ, constructs a RabbitMQ management center according to the configuration instruction, obtains path configuration information pushed by task information of the queue, monitors the queue according to the path configuration information, and the task scheduling service obtains a monitoring result; after acquiring task information, calculating the conditions of required system resources, file resources and the like according to parameters, including CPU core number, memory consumption, hard disk resources and the like, and adding tasks into a task queue; the task queue of the task execution submodule follows the principle of first-in first-out, and waits to be executed according to the sequence; when the task is queued to be executed, the service acquires the task type and the scheduling type according to the parameters of the task, and the required virtual environment and the saved scheduling parameters; according to the parameters, different execution classes are called to execute tasks; the scheme improves the task scheduling efficiency: the task information is pushed to the message queue, and the task information is acquired through the monitoring queue, so that asynchronous processing and concurrent execution of the task are realized, and the task scheduling efficiency is improved; flexibly configuring task scheduling: the execution mode and the path of the task are flexibly configured according to specific requirements through configuration instructions and path configuration information, so that the task scheduling is more flexible and controllable; resource management and scheduling: the system resources are reasonably managed and scheduled by calculating the conditions of required system resources, file resources and the like, including the number of CPU cores, the memory consumption, hard disk resources and the like, so that resource overload or waste is avoided, and the overall performance of the system is improved; task queue management: orderly execution of tasks is realized through a first-in first-out principle of a task queue, so that the loss or disorder of the tasks is avoided, and the accuracy and the integrity of task execution are ensured; support multiple task types and scheduling types: and according to the parameters of the task and the scheduling requirements, different execution classes are called to execute the task, and flexible application of various task types and scheduling types is supported. The embodiment provides an efficient, flexible and controllable task scheduling service, which can effectively manage and schedule tasks and improve the overall performance of a system and the accuracy of task execution.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.