CN111193774A

CN111193774A - Method and system for improving throughput of server system and server system

Info

Publication number: CN111193774A
Application number: CN201911259224.3A
Authority: CN
Inventors: 李鑫; 许信; 徐原野
Original assignee: China Telecom Bestpay Co Ltd
Current assignee: China Telecom Bestpay Co Ltd
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2020-05-22

Abstract

A method, system, and server system for improving throughput of a server system. The present application provides a method for improving the throughput of a server system. The method for improving the throughput of a server system includes: configuring a task processing interface so that tasks are executed in a separate thread; dividing program logic that needs to be parallel into different responsibilities, and Allocate to each independent task, so that each described task is executed in parallel in a single process; execute the partition execution and remote control of the task through the parallel processing interface; open the thread asynchronously to process each data concurrently and feed back the processing result in a synchronous manner, Asynchronous message result update. Compared with the existing method of improving system performance and throughput by adding hardware, the present invention proposes a software processing method for changing serial to parallel, requesting synchronization for results, and asynchronously opening threads to process data under the condition of limited hardware resources. , which can greatly improve the throughput of the server system and effectively reduce the response time of the system.

Description

Method and system for improving throughput of server system and server system

Technical Field

The application relates to the technical field of internet finance, in particular to the technical field of performance of internet finance systems, and specifically relates to a method and a system for improving throughput of a server system and the server system.

Background

In recent years, with the rapid development of the internet industry, the requirements for the performance of the system and the throughput of the system are higher and higher, and due to the high price of the server and the limited storage space, many enterprises have to search for a method for improving the throughput of the system in a limited resource environment.

In the internet financial field, attention is paid to the performance point of a service system, the response time of the current service system, whether the service condition of server resources is reasonable, whether the resource use of an application server and a database is reasonable, whether the service system can be expanded, the maximum user access number which can be supported by the system, the maximum service processing capacity of the system, where the performance bottleneck of the system may exist, and which equipment is replaced may improve the performance of the current system. The operation of internet financial products places increasingly higher demands on the performance of server systems and the throughput of systems. Currently, increasing system performance and throughput by increasing the number of servers or adding server-attached hardware devices is the most common way. The method for improving the system performance and the throughput by increasing the hardware brings about the problems that the resource utilization rate is not high, the cost is too high, and medium and small-sized enterprises with too high cost are not consumed, so that the method for improving the system throughput by a software mode and a strategy is the best mode for maintaining increasingly-rising service volume and performance bottlenecks under the condition of limited hardware resources.

Content of application

In view of the above-mentioned drawbacks of the prior art, the present application aims to provide a method, a system and a server system for improving the throughput of a server system by means of software.

To achieve the above and other related objects, the present application provides a method for improving throughput of a server system, comprising: configuring a task processing interface so that tasks are executed in separate threads; dividing the program logic needing to be paralleled into different duties and distributing the duties to each independent task so as to enable each task to be executed in parallel in a single process; performing partitioned execution and remoting of tasks through a parallel processing interface; and (4) processing each piece of data concurrently by an asynchronous thread opening and feeding back a processing result in a synchronous mode, and updating the asynchronous message result.

In an embodiment of the present application, the configuration task processing interface includes: and introducing a TaskExecutor interface in task configuration.

In an embodiment of the present application, the method for improving throughput of a server system further includes: the task using remote block is divided into a plurality of processes for processing, and the plurality of processes realize communication through middleware.

In an embodiment of the present application, the parallel processing interface includes an SPI interface of Spring Batch.

In one embodiment of the application, the partitioned execution and remoting of tasks is performed by a PartitionHandler partition handler and a Partitioner partition.

In an embodiment of the present application, asynchronous threading is realized through Dubbo scheduling, a processing result is fed back in a synchronous manner by adopting an asynchronous kafka message queue notification manner, and an asynchronous message result is updated.

To achieve the above and other related objects, there is also provided a system for improving throughput of a server system, the system comprising: the parallel processing module is used for configuring a task processing interface to enable tasks to be executed in independent threads, dividing parallel program logic into different responsibilities and distributing the responsibilities to each independent task, enabling each task to be executed in parallel in a single process, and executing the partition execution and remote execution of the tasks through the parallel processing interface; and the scheduling feedback module is used for concurrently processing each piece of data through the asynchronous thread opening and feeding back a processing result in a synchronous mode, and the asynchronous message result is updated.

In an embodiment of the present application, the parallel processing module introduces a task execution interface in task configuration, a task using remote blocking is split into multiple processes for processing, and the multiple processes implement communication through middleware.

In an embodiment of the present application, the scheduling feedback module implements asynchronous threading through Dubbo scheduling, implements synchronous mode feedback processing result by adopting an asynchronous kafka message queue notification mode, and updates the asynchronous message result.

To achieve the above and other related objects, the present application further provides a server system applying the method for improving the throughput of the server system as described above.

As described above, the method, system and server system for improving the throughput of the server system according to the present application have the following advantages:

compared with the existing mode of improving the system performance and the throughput by increasing hardware, the invention provides a software processing method for requesting synchronous result giving and asynchronous thread opening to process data under the condition of limited hardware resources by changing serial to parallel, so that the throughput of the server system can be greatly improved, and the response time of the system is effectively shortened.

Drawings

FIG. 1 is a schematic overall flow chart illustrating a method for improving throughput of a server system according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating transaction traffic in a method for improving throughput of a server system according to an embodiment of the present application;

fig. 3 is a schematic diagram illustrating a relationship among a concurrency number, a QPS, and an average response time in the method for improving throughput of a server system according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a remote blocking process processing model in the method for improving the throughput of the server system according to an embodiment of the present application;

fig. 5 is a schematic diagram illustrating a remote process model in a method for improving throughput of a server system according to an embodiment of the present application.

FIG. 6 is a sequence diagram of a policy interface in a method for improving throughput of a server system according to an embodiment of the present application;

fig. 7 is a schematic block diagram of a system for improving throughput of a server system according to an embodiment of the present application.

Description of the element reference numerals

100 system for improving throughput of server system

110 parallel processing module

120 scheduling feedback module

S100 to S400

Detailed Description

The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," and/or "comprising," when used in this specification, specify the presence of stated features, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, operations, elements, components, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "either: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions or operations are inherently mutually exclusive in some way.

The embodiment provides a method, a system and a server system for improving the throughput of the server system in a software mode.

The method and the system for improving the throughput of the server system are suitable for the server system supporting the operation of Internet financial products. The real-time music system improves the throughput and response speed of the current server system by changing serial to parallel, requesting synchronous result and asynchronously opening a thread to process data. Specifically, a serial-to-parallel processing mode is adopted, a Spring batch parallel batch processing mode is introduced, and a Dubbo scheduling and asynchronous kafka message queue notification mode is adopted to achieve request synchronization to a result and asynchronous threading. And each piece of data is processed concurrently, so that the processing speed of concurrent access of large-batch data is improved.

The principles and embodiments of the method, system and server system for improving throughput of the server system according to the present invention will be described in detail below, so that those skilled in the art can understand the method, system and server system for improving throughput of the server system without creative efforts.

Fig. 1 is a flow chart illustrating a method for improving throughput of a server system according to an embodiment of the present invention.

It should be noted that the method for improving the throughput of the server system may be arranged on one or more entity servers according to various factors such as functions, loads, and the like, or may be formed by a distributed or centralized server cluster, which is not limited in this embodiment.

As shown in fig. 1, in this embodiment, the method for improving the throughput of the server system includes steps S100 to S400.

Step S100, configuring a task processing interface to enable a task to be executed in an independent thread;

step S200 requires parallel program logic to be divided into different responsibilities and distributed to each independent task, so that each task is executed in parallel in a single process;

step S300, executing the partition execution and remote of the task through the parallel processing interface;

step S400 is to concurrently process each piece of data by asynchronously opening threads and to feed back a processing result in a synchronous manner, and the asynchronous message result is updated.

The method for improving the throughput of the server system in the embodiment is to improve the throughput and the response speed of the current system by changing serial to parallel and requesting synchronous result and asynchronously opening a thread to process data under the condition of limited hardware resources, so as to meet the current rapidly-developed service requirement.

By adopting the serial processing mode, when the number of concurrent requests increases to a large number, many requests need to be processed every second, which causes frequent switching of processes (threads), but the time for actually processing the requests becomes short, the number of requests which can be processed every second becomes small, and meanwhile, the request waiting time of a user also becomes long.

In this embodiment, a serial-to-parallel processing mode is adopted, a Spring batch (batch frame) parallel batch processing mode is introduced, and a method of Dubbo (open source Java RPC frame) scheduling and asynchronous kafka (high throughput distributed publish-subscribe message system) message queue notification is adopted to achieve request synchronization to result and asynchronous threading. And each piece of data is processed concurrently, so that the processing speed of concurrent access of large-batch data is improved.

The method for improving the throughput of the server system according to the present embodiment will be described in detail below.

In the embodiment, the data processing is changed from serial to parallel, so that the synchronous data processing speed is increased, and the system throughput is improved. The premise for improving the system throughput is to firstly clarify the system throughput factor and the evaluation of the system throughput.

First, system throughput factor:

throughput is the amount of data successfully transferred per unit time to a network, device, port, virtual circuit, or other device.

System throughput is the amount of information processed by the system per unit time, measured in terms of the number of processes processed per time period.

The throughput capacity (pressure-bearing capacity) of a system is closely related to the consumption of CPU by demand (request), external interfaces, IO, etc. The higher the consumption of a single demand on the CPU, the slower the influence speed of an external system interface and IO is, the lower the throughput capacity of the system is, and the higher the throughput capacity of the system is. Of course, the access speed of the device, the CPU performance (clock frequency, number of clock cycles spent per instruction (CPI), number of instructions, the architecture of the system, here the parallel processing architecture, is also included.

System throughput several important parameters: qps (tps), number of concurrencies, response time.

Qps (tps): request/number of transactions per second;

and (3) concurrent counting: the request/number of transactions processed by the system at the same time;

response time: the average response time is typically taken.

From the above three elements, the relationship between them can be deduced:

qps (tps) ═ concurrency/average response time, i.e.: the concurrency is QPS average response time.

While the implementation of the concurrency technique is quite complex, it is most easily understood that "time slice round-robin scheduling algorithm", under the management of the operating system, all running processes use the CPU in turn, and each process is allowed to occupy a very short time (e.g. 10ms) of the CPU, so that the user can feel that the CPU serves multiple processes in turn as if all the processes are running uninterruptedly. In practice, however, because a thread is the smallest unit of operation of a CPU, there is one and only one thread in possession of the CPU at any one time slice. If a computer has a plurality of CPUs or a CPU has a plurality of cores and threads, the situation is different, and if the number of bus threads generated by the process is less than the number of CPU cores, the threads of different processes can be distributed to different CPUs for operation, so that the processes can be really operated simultaneously.

To take a simple example: one CPU is just like an expressway, if 8 lanes are arranged on the expressway A side by side, the maximum parallel vehicles are 8; if a vehicle passes a gate for 10ms, then on highway a, then:

the number of 1s passing through the gate is 1000ms 8/10 ms;

if the system is more complex, the engine of each vehicle is different, the processing results are different, the speed of each vehicle passing through the gate is not necessarily 10ms, and the speed is slower or faster, and a 'time slice round-to-round progress scheduling algorithm' is used, so that the vehicle exits at first when the speed is slow, the next vehicle passes at first and turns to be slow again, and the vehicle passes through the gate in an alternating manner.

Under the condition of not considering any objective factor, a server is assumed to have 2 physical CPUs, and each CPU has 8 cores and 16 threads; then its 1 s-past concurrency is:

the 1s limit concurrency is 1000ms 16(CPU thread count) 2/(CPU slice switching time (assuming 10ms) + program execution time (assuming 10ms)) 1600

In the case of a foreground that guarantees no crash, assuming that the maximum time that the user can tolerate is 3s, the maximum amount of concurrency that the system can achieve is 4800, although this is only an ideal case, and indeed the limit of concurrency for the various factors under real production conditions must be much smaller than this.

The system throughput is usually determined by two factors, namely QPS (TPS) and concurrency, both values of each set of system have a relative limit value, under the access pressure of an application scene, as long as one of the two values reaches the highest value of the system, the system throughput is not improved, and if the pressure is continuously increased, the system throughput is reduced on the contrary because the system is overloaded, and the system performance is reduced due to context switching, memory and other consumption.

The response time of one-time calling of the system is the same as that of a project plan, and a key path is also provided, and the key path is the system influence time; the critical path is composed of CPU operation, IO, external system response and the like.

Secondly, evaluating the system throughput:

the influence caused by CPU operation, IO, external system response factors and preliminary estimation of system performance need to be considered when the system is designed.

Before the system throughput evaluation is performed, the network throughput is evaluated, and the network throughput refers to the residual bandwidth provided for network applications between two nodes in the network at a certain time (the bandwidth refers to the number of bits that can be transmitted in a link per second). I.e., the maximum rate that the device can accept without frame loss. Network throughput can help find bottlenecks in network paths.

The throughput is mainly determined by the network card in the firewall and the efficiency of the program algorithm, and particularly, the program algorithm can cause a firewall system to carry out a large amount of operations and the communication volume is greatly reduced. Therefore, most firewalls are called 100MB firewalls, and because their algorithms are implemented in software, the traffic is far from 100MB, actually only 10MB-20 MB. The pure hardware firewall adopts hardware for calculation, so that the throughput can reach 90MB-95MB linearly, and the pure hardware firewall is a real 100MB firewall.

Throughput and packet forwarding rate are the main indexes of the application of the relational firewall, and are generally measured by using FDT, which means the full-duplex throughput of a 64B data packet, and the indexes include both throughput indexes and packet forwarding rate indexes.

The throughput testing method comprises the following steps: sending a certain number of frames at a certain rate in the test, calculating the frames transmitted by the equipment to be tested, and if the number of the sent frames is equal to that of the received frames, increasing the sending rate and retesting; if the received frame is less than the transmitted frame, the transmission rate is reduced and the test is repeated until the final result is obtained. Throughput test results are typically expressed in bit/s or B/s.

In general circumstances, when system throughput evaluation is performed, in response to requirements, the evaluated data has another dimension besides QPS and concurrency: the daily PV.

By observing the access log of the system, the access flow of the same time period in each time period is almost the same under the condition that the number of users is large. Such as the morning of a weekday. The daily flow can be estimated as long as the daily flow graph and QPS are available.

The general technical approach:

1. the highest TPS and daily PV of the system are found, and these two elements have a relatively stable relationship (except for holiday and seasonal factor effects).

2. And obtaining the highest TPS through pressure test or experience estimation, and further calculating the highest daily throughput of the system. As shown in FIG. 2, the data volume of the part of the transaction in the production module that occurs daily for a certain day of the work day is randomly taken. The daily platform throughput of the module can be derived from the data collection samples.

The relationship between TPS and PV of this platform module is typically the highest TPS: PV is about 1: 13 x 3600 (equivalent to 13 hours of access per highest TPS, this is the scenario for funds and receipt transactions, with some differences in different application scenarios).

Under this platform module, assuming a TPS of 100 for the pressure test, the daily throughput of this platform module:

daily throughput is 100 × 13 × 3600 ═ 468 ten thousand;

this is the simple (single url) case where there are pages, one page with multiple requests, and the actual throughput of the system is still small.

Regardless of the thought time (T _ think), the following relationship exists between the TPS value obtained by the test and the concurrent virtual user number (U _ current) and the transaction response time (T _ response) read by the Loadrunner (under a stable operation condition):

TPS＝U_concurrent/(T_response+T_think)。

the relationship among the concurrency number, QPS, and average response time is shown in fig. 3. The abscissa in fig. 3 is the number of concurrent users. Line 1 is CPU usage; 2-line is throughput, or QPS; and 3 lines are time delays.

Initially, the system has only one user and CPU work must be inadequate. On the one hand the server may have multiple CPUs but only handle a single process, on the other hand in handling a process some of the stages may be IO stages, which causes the CPUs to wait while there are no other requesting processes that can be handled). As the number of concurrent users increases, the CPU utilization rate increases, and the QPS correspondingly increases (formula QPS is the number of concurrent users/average response time), and as the number of concurrent users increases, the average response time also increases, and the increase of the average response time is an exponential increase curve. When the number of concurrent requests increases to a large number, many requests need to be processed every second, which causes frequent switching of processes (threads), but the time for actually processing the requests decreases, the number of requests that can be processed every second decreases, and meanwhile, the request waiting time of the user also increases, even exceeds the psychological baseline of the user.

In order to process data which is flooded in a large scale, a serial processing mode obviously cannot meet the actual requirement of high data throughput, so a serial-to-parallel processing mode is adopted in the actual scene processing. A Spring batch parallel batch mode is cited. However, many batch processing problems can be completed by a single-process and single-thread working mode, so before a complex design and implementation are required, whether the complex implementation is really suitable needs to be examined. The performance of the actual job (job) is measured to see if the simplest implementation can meet the requirements: even the most common hardware, hundreds of MB data files can be read and written in a minute.

The parallel processing technology provided by Spring Batch can practically solve the pain point of the current module high throughput requirement, and from the high-level abstraction perspective, the parallel processing has two modes: single-process, multi-thread mode; or a multi-process mode. It can also be divided into the following categories:

multi-threaded Step (Single Process)

Parallel Steps (Single Process)

Remote blocking Step (multiple process)

Step is partitioned (single/multiple process).

Step S100, configuring a task processing interface so that tasks are executed in separate threads.

In this embodiment, the configuration task processing interface includes: and introducing a TaskExecutor interface in task configuration.

A TaskExecuto interface is added in the task configuration, so that the task can be executed in a single thread during reading, processing and writing (of each submitted block) records.

The simplest way to start parallel processing is to add a task execution to the task configuration, for example, as an attribute of a tasklet:

</step>

in the above example, the taskExecutor points to another Bean that implements the taskExecutor interface. The task execution is a standard Spring interface, and the simplest multithreading task execution is SimpleAsyncTaskexecution.

The result of the above configuration is that the tasks are executed in separate threads at the time of reading, processing, and writing of the record (per committed block). Note that this means that there is no fixed order between the data items to be processed, and a non-contiguous block may contain a single threaded instance of an item versus another. The executor has some restrictions (e.g., if it is executed in the background by a thread pool), there is a configuration item for the tasklet that can be adjusted, and the thread-limit defaults to 4. In practical applications, the traffic is large, and is configured as 150:

it is also noted that there may be some limitations in concurrent use of connection pool resources in tasks, such as database connection pool DataSource. It is ensured that the number of resources in the connection pool is greater than or equal to the number of concurrent threads.

In some common batch processing scenarios, there are some practical limitations to using multi-threaded tasks. Many parts of a task (e.g., readers and writers) are stateful, and if some state is not thread isolated, then these components are not available in a multi-threaded task. In particular, most of the readers and writers provided by Spring Batch are not designed for multithreading. However, stateless or thread-safe readers and writers may also be used, and reference may be made to this example in Spring Batch Samples (click-through Section 6.12, "preceding State Persistence"), which shows tracking by an indicator which items in the database input table have been processed, but which have not yet been processed.

Spring Batch provides some implementations of ItemWriter and ItemReader. It is often indicated in javadoc whether it is thread safe or which problems need attention in a concurrent environment. If not explicitly stated in the document, one can only see if there is any thread unsafe shared state by looking at the source code. A reader that is not thread-safe can also be used efficiently in proxy objects that handle synchronization themselves.

If more time is consumed by write operations and processing operations in a task, then even locking the read () operation for synchronization, it may be executed much faster than in a single threaded environment.

Step S200 requires that the parallel program logic is divided into different responsibilities and assigned to each independent task, so that each task is executed in parallel in a single process.

As long as parallel program logic is required to be divided into different responsibilities and assigned to separate tasks, it can be executed in parallel in a single process.

Parallel task execution is easy to configure and use, and if a plurality of tasks are executed in parallel, a flow can be configured as follows:

a configurable "task _ executor" attribute is used to indicate which task _ executor implementation should be used to execute the independent flow. The default is SyncTaskExecutor, but sometimes it is desirable to run certain steps in parallel using an asynchronous TaskExecutor. Note that this work will ensure that each flow is completed before aggregation.

In this embodiment, the method for improving throughput of the server system further includes: the task using remote block is divided into a plurality of processes for processing, and the plurality of processes realize communication through middleware.

The task using remote block is divided into a plurality of processes for processing, and the plurality of processes realize communication through middleware. The model is schematically shown in fig. 4.

The Master component is a single process and the Slaves component is typically a plurality of remote processes. This mode works almost best if the Master process is not a bottleneck and should therefore be used in situations where processing data takes more time than reading data (as is the case in practical applications).

The Master component is just one implementation of Spring Batch Step, just replacing ItemWriter with a generic version that "knows" how to send chunks of a data item to the middleware as messages (messages). Slaves is a standard listener (listeners) and, regardless of which middleware is used (such as MessageListeners when JMS is used), functions to process chunks of data items (chunks), either using a standard ItemWriter or an ItemProcessor plus an ItemWriter, using a ChunkProcessorinterface. One advantage of using this mode is: the reader, processor and writer components are all off-the-shelf (just as step is executed natively). Data items are dynamically partitioned and work is shared through middleware, so that load balancing is automatically achieved if listeners are all consumers in starvation mode.

The middleware must be persistent and reliable to ensure that every message is distributed, and only to a single consumer. JMS is a popular solution, but there are other alternatives in grid computing and shared memory product space (e.g., Java Spaces services; providing distributed shared storage for Java objects).

Step S300 performs partitioned execution and remoting of tasks through the parallel processing interface.

In this embodiment, the parallel processing interface includes an SPI interface of Spring Batch.

In this embodiment, the partition execution and remoting of tasks is performed by the PartitionHandler partition handler and Partitioner partition.

Spring Batch provides an SPI (service provider interface) for partitioned and remote execution of tasks. In this case, the remote executive is simply a simple task instance, and is configured and used in the same manner as the native process. The actual model diagram is shown in fig. 5.

The Job executed on the left (Job) is the serial Steps, while the middle Step is labeled Master. The Slave in fig. 5 is the same instance of a task, and the execution result of the Slave is actually equivalent to the result of being the Master for a job. The slave is typically a remote service, but could also be other threads executing locally. In this mode, the message sent by the Master to the Slave does not require persistence (durable) nor guaranteed delivery: for each job execution step, the Spring Batch meta-information saved in JobReposiory will ensure that each Slave will be executed and only once.

The SPI of Spring Batch consists of one specialized implementation of the task (PartitionStep), and two policy interfaces that need to be implemented by a specific environment. These two policy interfaces are PartitionHandler and stepexeccutionsplitter, respectively, whose roles are shown in the following sequence diagram in fig. 6.

The task on the right at this time is the "remote" Slave, so there may be multiple objects or processes playing this role, while the PartitionStep in the figure is driving (/ controlling) the whole execution process. The configuration of PartitionStep is as follows:

similar to the throw-limit attribute of a multi-threaded task, the grid-size attribute prevents overloading the task executor of a single task.

An example in the Spring Batch Samples example program may be copied/extended in unit testing (refer to the partitionjob. xml configuration file on production).

Spring Batch performs steps for partition creation, such as "step 1: partition0 ", so Masterstep is commonly called" step 1: master ". In Spring 3. Step may also be aliased in 0 (by specifying the name attribute instead of the id attribute).

The PartitionHandler component is aware of the organization of the remote grid environment. It may send a stepxecution request to remote Steps in a specific data format, such as DTO. It does not need to know how to split the input data or how to aggregate the results of multiple step executions. In general it may also not be necessary to know about elasticity or failover, since in many cases these are characteristic of the fabric, Spring Batch in any case always provides a fabric-independent rebondable capability: a failed job will always be restarted and only the failed step will be re-executed.

The PartitionHandler interface may have implementation classes of various structures: such as simple RMI remote method calls, EJB remote calls, custom web services, JMS, JavaSpaces, shared memory grids (such as Terracotta or Coherence), grid execution structures (such as GridGain). Spring Batch itself does not contain any implementation of proprietary grids or remote structures.

Spring Batch provides a useful PartitionHandler implementation that executes Steps in a locally separate thread, the class of implementation is named taskeexecutorpartitionhandler, and it is the default handler in the above XML configuration. As explicitly specified below:

gridSize determines the number of independent task executions to create, so it can be configured to the size of the thread pool in the taskeexecutor, or set a little bit larger than the number of threads available, in which case the execution blocks become smaller.

The task execution partitioning handler is very powerful for IO intensive steps, such as copying large numbers of files, or copying a file system to a content management system. It can also be used for remote execution by providing a proxy for remote invocation (e.g. using Spring Remoting).

Partitioner has a simple role: an execution environment (contexts) is generated for the new step instance only as an input parameter (so that it does not need to be considered at restart). The interface has only one method:

the return value of this method is a Map object, and a mapping is made between the unique name (String in Map-generic) assigned to each task execution and its associated input parameters in the form of an ExecutionContext. This name is then displayed in the batch meta data as the task name for the partition StepExerations. ExecutionContext is simply a collection of key-value pairs, so it may contain a series of primary keys, or line numbers, or the location of the input file. The remote Step then typically uses a # { … } placeholder to bind to contextual inputs (late binding within the task scope).

The name of task execution (key in Map returned by partner interface) needs to be kept unique in the whole execution process of the job, and other specific requirements are not required. To do this and require a name that is meaningful to the user, the simplest approach is to use the naming convention of prefix, which may be the name of the task being executed (which is itself unique in Job Job), plus suffix, which may be a counter. There is a SimplePartitioner in the framework that uses this convention. There is an optional interface partitionnameprovider that can be used to provide the partition name independently of the partition itself. If a Partitioner implements this interface, only names will be queried on restart. This may be a useful optimization if the partitions are heavyweight. Obviously, the name provided by partitionnameprovider must be consistent with the name provided by Partitioner.

Binding input data to tasks, tasks performed by identically configured partitionhandlers are very efficient because the input parameters of the tasks are bound into the ExecutionContext at runtime. This is easily accomplished by the StepScope feature of Spring Batch. For example, if Partitioner creates an execution context instance, each task execution points to another different file (or directory) with a fileName as key, the Partitioner's output may look like the following:

the step (task) execution context name provided by the execution destination directory processing Partitioner is exemplified as follows:

|Step Execution Name(key)|ExecutionContext(value)

|filecopy：partition0|fileName＝/home/data/one|

|filecopy：partition1|fileName＝/home/data/two|

|filecopy：partition2|fileName＝/home/data/three|

the filename can then be bound into the task, which uses late binding of the execution context:

</bean>

the throughput of the system is improved by several times by changing the serial mode into the parallel mode, the response time of the system is shortened by a lot, the consumed time for completing one transaction by paying the balance on the line at present is about 20ms, the response speed is high, the high throughput can be borne, the structure which is parallel to the serial mode is not separated, and the transaction scene is not separated from the transaction scene. Of course such large throughput and such fast response are not isolated from parallel processing, but also from another measure-synchronous to the result, asynchronous to thread processing.

Specifically, in this embodiment, asynchronous threading is realized through Dubbo scheduling, a synchronous mode feedback processing result is realized by adopting an asynchronous kafka message queue notification mode, and an asynchronous message result is updated.

The throughput of the system is improved, and the system can be divided into two parts of swallowing and spitting, wherein an upper layer request is given down, a lower layer can swallow, the amount of swallowing is given below a threshold value, a result is synchronously given to the upper layer request, then multiple threads are opened to execute each request, and the result is given out again after the processing is finished. For example: like going to a restaurant to eat, many people sit on different tables, guests find one table to sit down, but the number of cooks in the restaurant is large, and the restaurant can serve a plurality of dishes on each table, so that the guests eat the dishes firstly, give feedback to the guests firstly, then make the dishes at the back and give the guests a feedback to the guests, and tell the guests that the dishes are finished, so that the restaurant throughput is improved.

Synchronization, that is, when a function call is issued, the call is not returned until no result is obtained. By this definition, the vast majority of functions or methods are synchronous calls.

Asynchronous, when an asynchronous procedure call is issued, the caller cannot get the result immediately. The component that actually handles this call, after completion, notifies the caller via status, notification, and callback.

The result is synchronously given, the strategy of asynchronous thread opening for processing can not only enable the request to be quickly responded and improve the throughput of the system, but also can improve the processing speed of concurrent access of mass data by asynchronously thread opening for concurrently processing each piece of data, and the asynchronization is realized in a mode of Dubbo scheduling and asynchronous kafka message queue notification in an actual project.

In addition to the above, the present embodiment also adopts a distributed microservice architecture to improve data processing speed and improve resource utilization. Distributed computing is a study on how to divide a problem that needs huge computing power to solve into many small parts, then distribute the parts to many computers for processing, and finally combine the computing results to obtain the final result.

The distributed computing and the parallel computing have different pursued effects, the distributed computing is concerned with reliability, and the parallel computing is concerned with speed, so that the combination of the distributed computing and the parallel computing can improve the speed of computing response of the system and ensure the reliability of a computing result.

As shown in fig. 7, the present embodiment further provides a system 100 for improving throughput of a server system, where the system 100 for improving throughput of a server system includes: a parallel processing module 110 and a scheduling feedback module 120.

In this embodiment, the parallel processing module 110 is configured to configure the task processing interface, so that the tasks are executed in separate threads, parallel program logic is divided into different responsibilities, and the responsibilities are allocated to the separate tasks, so that the tasks are executed in parallel in a single process, and the partitioned execution and the remote execution of the tasks are performed through the parallel processing interface.

Specifically, in this embodiment, the parallel processing module 110 introduces a task execution interface in task configuration, a task using remote blocking is split into multiple processes for processing, and the multiple processes implement communication through middleware.

In this embodiment, the scheduling feedback module 120 is configured to concurrently process each piece of data by asynchronously opening threads and feed back a processing result in a synchronous manner, so that the asynchronous message result is updated.

Specifically, in this embodiment, the scheduling feedback module 120 implements asynchronous threading through Dubbo scheduling, implements synchronous mode feedback processing result by adopting an asynchronous kafka message queue notification mode, and updates the asynchronous message result.

The technical features of the specific implementation of the system 100 for improving the throughput of the server system in this embodiment are substantially the same as the method for improving the throughput of the server system in the foregoing embodiment, and the general technical contents between the embodiments are not repeated.

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the module x may be a processing element separately set up, or may be implemented by being integrated in a chip of an electronic terminal, or may be stored in a memory of the terminal in the form of program code, and the function of the tracking calculation module is called and executed by a processing element of the terminal. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

The embodiment also provides a server system, and the method for improving the throughput of the server system is applied. The above-mentioned method for improving the throughput of the server system has been described in detail, and is not described herein again.

In summary, compared with the existing method of improving the system performance and throughput by increasing hardware, the invention provides a software processing method for requesting synchronous result giving and asynchronous thread opening to process data under the condition of limited hardware resources by changing serial to parallel, so that the throughput of the server system can be greatly improved, and the response time of the system can be effectively shortened. Therefore, the application effectively overcomes various defects in the prior art and has high industrial utilization value.

The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims

1. A method for improving the throughput of a server system, characterized in that: the method for improving the throughput of a server system comprises:

Configure the task processing interface so that tasks are executed in separate threads;

The program logic that needs to be parallelized is divided into different responsibilities and assigned to each independent task, so that each said task is executed in parallel in a single process;

Partitioned execution and remoting of tasks performed through the parallel processing interface;

Open a thread asynchronously to process each piece of data concurrently and feed back the processing result in a synchronous manner, and update the asynchronous message result.

2 . The method according to claim 1 , wherein the configuring the task processing interface comprises: introducing a TaskExecutor interface into the task configuration. 3 .

3. The method for improving the throughput of the server system according to claim 1, wherein the method for improving the throughput of the server system further comprises:

Tasks that use remote chunking are split into multiple processes for processing, and multiple processes communicate through middleware.

4. The method for improving server system throughput according to claim 1, wherein the parallel processing interface comprises an SPI interface of Spring Batch.

5. The method for improving the throughput of a server system according to claim 1, wherein the partition execution and remote execution of tasks are performed by the PartitionHandler partition processor and the Partitioner partitioner.

6. The method for improving the throughput of server system according to claim 1, characterized in that: realize asynchronous thread opening by Dubbo scheduling, realize synchronous mode feedback processing result by means of asynchronous kafka message queue notification, and asynchronous message result update.

7. A system for improving the throughput of a server system, characterized in that: the system for improving the throughput of the server system comprises:

The parallel processing module is used to configure the task processing interface so that tasks are executed in separate threads. The parallel program logic needs to be divided into different responsibilities and assigned to each independent task, so that each said task is executed in parallel in a single process. , and perform partition execution and remote execution of tasks through the parallel processing interface;

The scheduling feedback module is used to process each piece of data concurrently by opening a thread asynchronously and feed back the processing result in a synchronous manner, and update the asynchronous message result.

8. the system of improving the throughput of server system according to claim 7, is characterized in that: described parallel processing module introduces TaskExecutor interface in task configuration, uses the task of remote partitioning to be split into multiple processes for processing, Communication between multiple processes is achieved through middleware.

9. The system of improving the throughput of server system according to claim 7, is characterized in that: described scheduling feedback module realizes asynchronous open thread through Dubbo scheduling, adopts the mode of asynchronous kafka message queue notification to realize synchronous mode feedback processing result, asynchronous The message result is updated.

10. A server system, characterized in that: the method for improving the throughput of a server system according to any one of claims 1 to 6 is applied.