Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application.
The term "and/or" is used herein to describe an association relationship of an associated object, and specifically indicates that three relationships may exist, for example, a and/or B may indicate that a exists alone, while a and B exist together, and B exists alone.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
In order to clearly describe the technical solution of the embodiments of the present application, firstly, the terms involved in the present application are explained:
PISA Protocol-INDEPENDENT SWITCH Architecture, protocol independent switching Architecture.
RMT Reconfigurable Match Tables, the match table may be reconstructed.
MAU is Match-Action-Unit, matching Action units.
ACL Access Control Lists, access control list.
QoS Quality of Service, quality of service.
GRE Generic Routing Encapsulation, generic routing encapsulation protocol.
VXLAN Virtual Extensible Local Area Network, virtual extended LAN.
IPSec: internet Protocol Security, internet security protocol.
IP Internet Protocol, internetworking protocol.
SDN, software Defined Network, software defined network.
Parser, parser.
MPLS: multi-Protocol Label Switching, multiprotocol label switching.
Programmable switched routing architecture is a technology for building high performance network devices that has flexibility and scalability to meet the ever-increasing demands for network traffic and complexity. In this background area, one important architecture is the programmable switch architecture, where PISA is a widely adopted architecture.
The PISA architecture provides a programmable data plane (DATA PLANE) handling manner that enables network devices to handle a variety of protocols and functions, not just to specific protocols or applications. Conventional network devices typically focus on specific protocol processing, while PISA architecture achieves greater flexibility and scalability by decoupling the processing logic of the data packets from the hardware of the device and transferring them to programmable software or hardware modules.
RMT is an implementation of PISA architecture, a data plane processing architecture based on programmable matching tables, for implementing high performance and highly customizable network devices, which are widely used in programmable switch chips. In conventional network devices, processing of data packets is typically accomplished through fixed-function hardware logic. And the RMT architecture realizes flexible processing of the data packet by transferring the data packet processing logic to a programmable matching table. The match table is a critical data structure in which the rules for matching and manipulating packets by the network device are stored.
In the RMT architecture, the match table is made up of a plurality of entries, each containing a match field and a set of actions associated therewith. When a packet enters a network device, its key fields (e.g., source IP address, destination IP address, port number, etc.) will match entries in the match table. Once the match is successful, the relevant actions will be triggered to be performed, such as modifying the header of the data packet, redirecting traffic, counter updating, etc.
MAU is a programmable processing unit in the RMT architecture that is responsible for processing actions that match entries in the table. Each MAU typically contains a set of processing logic that can be customized according to specific needs. For example, one MAU may implement header parsing and modification of packets, and another MAU may be responsible for classification and routing of traffic. By combining these MAUs together, a complex network device can be constructed, flexibly adapting to different network requirements.
A pipeline switch is a switching device for network data transmission that employs the concept of a pipeline to increase the processing speed and efficiency of data. Pipelining is a parallel processing manner in which tasks are broken down into multiple stages, each of which performs a particular operation. In a pipeline switch, the processing of data packets is also divided into a plurality of stages, each of which is responsible for a different processing task.
In a pipelined switch architecture, a complex service message loop is a processing mechanism that is used to handle the situation when a complex service message is encountered inside the switch. When the switch cannot directly process a specific message, it can choose to send the message back to the input port of the switch and then resend the message to the appropriate output port for processing via a different path.
Complex service messages may be due to excessive header information or the inclusion of special protocol fields in the message, which may result in the switch not being able to perform efficient forwarding or processing. In this case the switch may choose to resend the message back to the input port, i.e. loop back, by bypassing the original forwarding logic in this way, giving more processing time and resources.
The specific implementation of the loop back mechanism depends on the architecture and design of the switch. One common way is to implement the loop back function through a switching matrix or switching engine inside the switch. When the switch detects a complex traffic message, it sends a copy of the message back to the input port and marks the message as a loop-back message with a specific identification. The switch then re-processes the message, including re-parsing the header information, re-selecting the appropriate output port, and forwarding or processing.
The main purpose of the loop mechanism is to solve the processing problem of complex service messages and improve the processing capacity and flexibility of the switch. It allows the switch to perform more calculations and operations in processing complex messages to better meet specific business needs. However, the loop back mechanism may also increase the delay of the switch and have some impact on the overall forwarding performance, thus requiring a comprehensive consideration of various factors in design and configuration.
At present, the judging mode of the composition control flow of the pipeline switch causes that each stage cannot be repeatedly executed, when complex service is encountered, the performance is greatly reduced due to the fact that the processing can only be carried out through a loop or a pipeline folding mode, in the loop process, the behavior of a data packet on the whole pipeline can only flow from head to tail, and the utilization rate of pipeline resources is low.
In order to solve the above problems, an embodiment of the present application provides a message processing scheme, for a message to be processed in a current processing stage, by determining whether to enter a loop processing in the current processing stage, if so, reentering a head vector of the message to be processed as an input into a designated target stage on a pipeline, and if not, normally sending the message to be processed to an entry of a next stage, thereby implementing single-stage loop on the pipeline, enabling flexible service deployment through the single-stage loop, completing all processing of services in the current processing stage, releasing bus resources, thereby solving the problem of insufficient hardware resource deployment, not only improving the performance of message processing, but also improving the resource utilization rate.
As shown in fig. 1, the present embodiment provides an electronic device 1, which includes at least one processor 11 and a memory 12, and one processor is exemplified in fig. 1. The processor 11 and the memory 12 are connected by a bus 10. The memory 12 stores instructions executable by the processor 11, and the instructions are executed by the processor 11, so that the electronic device 1 can execute all or part of the flow of the method in the following embodiments, so as to implement single-stage circulation on the pipeline, and flexible service deployment can be achieved through the single-stage circulation, so that all processing of the service is completed at this stage, bus resources are released, and resource utilization is improved.
In an embodiment, the electronic device 1 may be a network device such as a switch, a router, a gateway, or a large computing system composed of a mobile phone, a tablet computer, a notebook computer, a desktop computer, or a plurality of computers.
The method provided by the embodiment of the application can be implemented by the electronic equipment 1 executing corresponding software codes.
Fig. 2 is a schematic architecture diagram of a system 200 for packet processing according to an embodiment of the present application, which may be applied to a chip architecture of a network device, as shown in fig. 2, where the system 200 includes a distribution unit and at least one data packet processing pipeline, where:
The distribution unit is used for receiving the message to be processed through the message inlet and distributing the message to be processed to the cache of the corresponding pipeline.
And the pipeline is used for reading the message to be processed distributed by the distribution unit from the corresponding cache, and processing the message to be processed by adopting the message processing method provided by any one of the following embodiments of the application.
In practical applications, the number of pipelines may be one or more based on practical requirements, and in fig. 2, 4 pipelines are taken as an example, namely pipeline 0, pipeline 1, pipeline 2 and pipeline 3. A pipeline may include multiple processing stages. The architecture of the system 200 may be applied to the aforementioned electronic device 1, for example, to a network device such as a switch, a router, a gateway, or the like.
The message processing mode of the embodiment of the application can be applied to any field needing message processing.
In an embodiment, taking a switch applied in the technical field of SDN programmable network communications as an example, a basic pipelined switch may include the following main processing stages:
A Reception Stage (Reception Stage) in which the switch receives the packet coming into the port and parses and checks it. This includes verifying the integrity of the data packet, checking the destination address, and performing other related preprocessing operations.
Forwarding Stage (Forwarding Stage) in which the switch decides to which output port to send the packet based on the destination address of the packet and a switching table (also called Forwarding table). This may involve looking up and matching the target address and making a corresponding forwarding decision.
A Queue management stage (Queue MANAGEMENT STAGE) in which the switch manages the queues of the input and output ports to ensure that the order and priority of the packets is maintained. This includes queuing, scheduling and allocating resources for data packets for transmission according to certain policies.
A repeater configuration stage (Switch Fabric Configuration Stage) at which the switch configures the repeater or switching matrix to ensure that the packet is properly forwarded from the input port to the output port. This may involve setting the appropriate connection paths, switching states, and adjusting the bandwidth and resource allocation of the repeater.
A send phase (Transmission Stage) in which the switch sends the data packet to the target output port for final delivery to the target device or host. This may involve transmission operations at the physical layer such as encoding, modulation, transmission and reception acknowledgements of data packets, etc.
In one embodiment, the pipeline switch may also have a tunnel termination stage (Tunnel Termination Stage) between the receive stage and the forward stage.
In a network switch, the tunnel termination phase refers to a phase in the packet processing pipeline for handling the termination and decapsulation operations of the tunnel protocol. Tunneling is a technique for transporting packets of one protocol within packets of another protocol, often used to transport proprietary or encrypted data in a network.
In the tunnel termination phase, the switch will check the destination of the data packet and perform the decapsulation operation according to the specification of the tunnel protocol. This includes parsing header information of the tunneling protocol, extracting internal data packets, recovering original protocol information, etc. The switch then passes the decapsulated packet on to the next stage for further processing and forwarding.
The main tasks of the tunnel termination phase include:
1. Resolving tunnel protocol header, the exchanger resolves the tunnel protocol header to know the information of tunnel type, source address, destination address, etc.
2. The internal data packet is unpacked, and the exchanger extracts the internal data packet contained in the tunnel and restores the internal data packet to the original protocol data packet.
3. Further processing is performed on the internal data packets, i.e. the decapsulated internal data packets are passed to subsequent processing stages, such as routing table lookup, ACL (access control list) matching, qoS (quality of service) processing, etc.
4. And forwarding the decapsulated data packet, namely forwarding the decapsulated data packet by the switch according to the destination address of the decapsulated data packet, and sending the decapsulated data packet to an appropriate outgoing interface.
It should be noted that the specific tunneling protocol and termination operation may vary depending on the type, configuration, and network environment of the switch. Common tunneling protocols include GRE (generic routing encapsulation), VXLAN (virtual extended local area network), IPsec protocol, and the like. Each tunneling protocol has its own header format and decapsulation rules, and the switch needs to perform corresponding processing according to specific protocol specifications.
By dividing the data exchange process of the switch into a plurality of stages and allowing different data packets to be processed in parallel, the pipeline switch is capable of achieving efficient data forwarding and processing. The parallel processing mode can improve the throughput and response speed of the switch, thereby better adapting to the high-speed network environment and the high-flow data transmission requirement. Meanwhile, the pipeline switch can flexibly adapt to different protocols and business requirements so as to provide more efficient network services.
In an embodiment, the pipeline is further configured to send feedback information to the distribution unit when the message to be processed needs to be circularly processed in the current pipeline, so that the distribution unit adjusts the message distribution policy according to the feedback information.
Fig. 3 shows a schematic diagram of a chip architecture of a single pipeline according to an embodiment of the present application, where the chip architecture includes a pipeline scheduler, a message pre-sorting module, a message path prediction module, a message insertion module, a plurality of processing stages (i.e., stage 0 to stage N, where N is a natural number) and so on, where corresponding caches may be set before each processing stage, and functions of each module are as follows:
1. And the pipeline scheduler is used for scheduling the messages for the pipelines in the exchange chip according to the waterline or the additional scheduling information, and changing the rate of the messages entering each pipeline according to the complexity of the processing service.
2. And the message pre-classifying module is used for pre-classifying the types of the message according to the configuration information of the compiler when the Parser analyzes the message protocol, wherein the types can be a certain protocol or a combination of a plurality of protocols.
3. And the message path prediction module is used for statically or dynamically deducing the processing path of the current message according to the pre-classification result of the front stage or the actual execution path length of the rear stage so as to pre-judge whether the message needs to be circularly executed in a certain stage or a certain stages and the number of times of the circularly executed. And notifies the pipeline scheduler of information to assist in determining the rate at which the message enters the pipeline.
4. The message insertion module inserts a plurality of internal marked messages according to the result of the pre-stage message path prediction, and in this way, the message head vectors which are circularly executed are ensured not to influence the processing of other subsequent message head vectors.
5. The capacity of the buffer can be designed to be very small, and the buffer is used for temporarily storing the messages entering the pipeline due to the deviation between the path prediction and the actual execution path.
In one embodiment, the pipeline architecture may implement single-stage loop execution, where each stage determines whether the stage is required to loop by a message pre-classification result, or by a message header vector field, or by an execution action hitting an entry, and if loop is required, the loop message header vector will be used as input to reenter the stage, and if loop is not required, the next stage is normally sent to the entry. The entry in each stage also needs to be judged once, and there are three cases, the first case, the cyclic message header vector arrives at the same time as the next message header vector, the cyclic message header vector enters the stage, and the next message header vector enters the buffer before the stage. In the second case, the cyclic message header vector arrives at the same time as the internal marker message inserted by the message insertion module, and in this case, the cyclic message header vector enters this stage, and the internal marker message is discarded. And in the third case, judging whether the buffer memory has the message or not only when the head vector of the later message arrives, if so, entering the stage by the first message in the buffer memory, and inserting the head vector of the message into the tail of the buffer memory queue. If the buffer memory is judged to have no message, the message head vector directly enters the stage.
In one embodiment, the pipeline architecture may also implement inter-stage loop execution, such as the stage 1 forward stage 0 loop execution illustrated in FIG. 3, similar to a single-stage loop, but in the inter-stage loop, the loop message header vector enters a stage preceding the current processing stage, not the current processing stage. Other processing modes are consistent with single-stage loops. Specific reference to specific hardware designs is made to how many stages can be cycled between, and is related to hardware resource consumption.
In an embodiment, the scheme may further include internal loop-back, and a common design may be adopted to perform loop-back execution of the whole pipeline through an internal logic port, where the specific design manner of the internal loop-back is not limited in this embodiment.
After the input message is processed through a plurality of processing stages, a message processing result can be output through a physical outlet.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. In the case where there is no conflict between the embodiments, the following embodiments and features in the embodiments may be combined with each other. In addition, the sequence of steps in the method embodiments described below is only an example and is not strictly limited.
Please refer to fig. 4, which is a message processing method according to an embodiment of the present application, the method may be executed by the electronic device 1 shown in fig. 1, and may be applied to the application scenarios of the message processing systems shown in fig. 2 to 3, so as to implement single-stage circulation on a pipeline, and flexible service deployment may be implemented through the single-stage circulation, complete all processing of the service at this stage, and release bus resources, so as to solve the problem of insufficient deployment of hardware resources, not only improve the performance of message processing, but also improve the resource utilization. In this embodiment, a network device is taken as an execution end as an example, and the method includes the following steps:
step 401, obtaining a message to be processed reaching a current processing stage in a pipeline.
In this step, the current processing stage may be any stage in the pipeline, and each stage of the pipeline, for example, a switch, may refer to the description of the foregoing embodiment, which is not repeated herein.
Step 402, judging whether the message to be processed needs to enter the circulation processing at the current processing stage. If yes, go to step 403, otherwise go to step 404.
In this step, for the message to be processed in the current processing stage, by judging whether to enter the cyclic processing in the current processing stage, the service to be processed in the current processing stage can be determined, and then the service to be processed in the current processing stage is processed completely, so as to release the bus resource and improve the resource utilization rate.
In one embodiment, step 402 may specifically include determining whether the message to be processed needs to enter into the loop processing at the current processing stage according to one or more of a pre-classification result of a protocol to which the message to be processed belongs, a header vector, and a hit entry of the message to be processed in the reconfigurable matching table.
In this embodiment, the message to be processed may be pre-classified by a message pre-classifying module as shown in fig. 3, specifically, when the manager analyzes the message protocol, the manager pre-classifies the protocol type of the message to be processed according to the configuration information of the compiler, where the protocol type to which the message to be processed belongs may be a certain protocol or a combination of multiple protocols. In an actual scenario, each protocol combination is configured with a corresponding message circulation instruction, such as information indicating a message circulation stage, circulation times and the like, so that based on a pre-classification result of a protocol to which a message to be processed belongs, whether the message to be processed needs to enter circulation processing in a current processing stage can be determined.
The field in the header vector of the message to be processed may be used to pre-mark the cycle indication of the message to be processed, such as indicating the message cycle stage, the cycle number, etc., so that based on the mark field in the header vector of the message to be processed, it may be determined whether the message to be processed needs to enter the cycle processing in the current processing stage.
In the RMT architecture, the match table is made up of a plurality of entries, each containing a match field and a set of actions associated therewith. When a packet of a pending message enters the network device, its key fields (e.g., source IP address, destination IP address, port number, etc.) will be matched with entries in the matching table. Once the match is successful, the relevant actions will be triggered to be performed, such as modifying the header of the data packet, redirecting traffic, counter updating, etc. The message to be processed has relevant execution actions corresponding to hit items in the reconfigurable matching table, the corresponding action of each item can be pre-configured with circulation instructions, such as information indicating the circulation stage, circulation times and the like of the message, and when the corresponding action of the hit item is that the message to be processed needs to enter circulation processing in the current processing stage, the message to be processed is determined to enter circulation processing in the current processing stage.
Thus, whether the stage circulation is needed or not can be judged by the pre-classification result of the message or the header vector field of the message or the execution action of the hit entry in each stage, and whether the stage circulation is needed or not can be determined by combining various judging conditions. The multiple judging modes can be flexibly selected based on actual requirements.
Step 403, the head vector of the message to be processed is circularly input into the appointed target stage on the pipeline.
In this step, if the message to be processed needs to enter the loop processing at the current processing stage, it is indicated that the message to be processed needs to be processed in the loop processing from the current processing stage, and the specified target stage may be a stage at or before the current processing stage, and the header vector of the message to be processed may be input into the specified target stage to be processed in the loop processing. The single-stage circulation on the pipeline is realized, the service deployment can be flexibly realized through the single-stage circulation, the whole processing of the service is completed at the stage, and the bus resources are released, so that the problem of insufficient hardware resource deployment is solved.
In an embodiment, step 403 may specifically include determining a target stage for cyclic processing of the message to be processed according to one or more of a pre-classification result of a protocol to which the message to be processed belongs, a header vector, and hit entries of the message to be processed in a reconfigurable matching table, if the message to be processed needs to enter cyclic processing in a current processing stage. The header vector of the message to be processed is input into a target phase comprising the current processing phase and/or any phase preceding the current processing phase.
In this embodiment, the target stage to be cycled may not be the current processing stage, but may return to a more previous stage, and the header vector of the message to be processed may be sent to the entry of the corresponding target stage according to the pre-classification result of the message to be processed or the header vector field of the message or the target stage position to be cycled determined by the execution action of the hit entry. In an actual scene, a plurality of stages of service deployment are required, and the service requirement cannot be met only in a single-stage cycle, so that the requirement of service deployment can be met through the multi-stage backtracking skip mechanism, and the problem of insufficient hardware resource deployment is solved.
In one embodiment, the pre-classification result includes a set of protocol types to which the message to be processed belongs. According to the pre-classification result of the message to be processed, determining a target stage of the message to be processed, wherein the target stage comprises the steps of obtaining the actual execution path length corresponding to the reference message, and preparing the reference message and the message to be processed in the same type. And determining a target stage needing to circularly process the message to be processed according to the protocol type set to which the message to be processed belongs and the actual execution path length of the reference message.
In this embodiment, the reference message is of the same type as the to-be-processed message, and the reference message may be a processed message, so that an actual execution path of the reference message may be recorded. The message path prediction part shown in fig. 3 can be used for pre-classifying the message to be processed, determining the protocol type set to which the message to be processed belongs, acquiring the actual execution path length of the reference message as feedback information, and then statically or dynamically deducing the processing path of the current message to be processed according to the pre-classification result of the front stage of the message to be processed and the actual execution path length of the reference message at the rear stage so as to predict whether the message to be processed needs to be circularly executed at a certain or certain target stages. Some protocol type messages cannot accurately predict a message path only through a pre-classification result, and the actual path information of the processed reference messages of the same type needs to be used as feedback to accurately predict the processing path of the current message to be processed, so as to accurately judge in which stages the message to be processed needs to be circularly executed and the corresponding execution times.
In one embodiment, the plurality of processing stages of the pipeline are respectively configured with a corresponding default buffer, where the default buffer is used to buffer a message queue waiting for processing at the corresponding stage.
In this embodiment, as shown in fig. 3, a preset buffer component may be configured before each stage, so as to implement temporary storage of a normal message or a cyclic message, and avoid the influence of the cyclic stage on the normal pipeline system. Since the foregoing embodiment introduces normal pipeline processing, there may be a case where the to-be-processed message header vector and the normal message header vector that need to be recycled arrive at a certain stage at the same time, which may cause a conflict, and the system cannot process, so that a buffer is added before each stage for buffering the recycled message header vector or the normal message header vector, and in this way, the conflict is avoided, so that the pipeline can work normally.
In an embodiment, if only a head vector of a normal message a that does not need to be circularly processed reaches the current processing stage, it may be first determined whether there is an unprocessed message in a preset buffer queue of the current processing stage, and if there is an unprocessed message in the preset buffer, the first message in the preset buffer and ordered at the forefront is entered into the current processing stage for processing, where the head vector of the normal message a is inserted into the tail of the preset buffer queue of the current processing stage. If the preset buffer memory has no unprocessed message, the head vector of the normal message A can be directly entered into the current processing stage.
In an embodiment, step 403 may further include detecting whether there is another message that arrives at the target stage simultaneously with the message to be processed if the message to be processed needs to enter the loop processing in the current processing stage. If no other message exists, the head vector of the message to be processed is input into the target stage. If other messages exist, the head vector of the message to be processed is input into the target stage, and the other messages are added into a preset cache queue corresponding to the target stage.
In an embodiment, if a message to be processed needs to enter a loop processing at a current processing stage, before the loop processing, in order to avoid collision between the message to be processed and other messages, it is necessary to detect whether other messages which reach a target stage simultaneously with the message to be processed exist, if so, the message to be processed is preferentially processed, a head vector of the message to be processed is input into an entry of the target stage, and the other messages are added into a preset cache queue before the target stage to wait for processing, if not, the head vector of the message to be processed can be directly input into the entry of the target stage, and the loop processing is executed.
The step of detecting whether there are other messages arriving at the target phase simultaneously with the message to be processed may occur before single-phase loop execution or before inter-phase loop execution across phases.
In an embodiment, step 403 may further include inputting the header vector of the message to be processed into the target stage and adding the next message into the preset buffer queue corresponding to the target stage if there is another message and when the other message is the next message of the message to be processed.
In this embodiment, if a message to be processed needs to enter a loop processing in a current processing stage, taking a single-stage loop of the current processing stage as an example when a target stage is performed, if a header vector of the message to be processed and a header vector of a next message B arrive at the current processing stage at the same time, in this case, the header vector of the message to be processed is preferentially entered into the current processing stage, and the header vector of the next message B enters a preset buffer queue before the current processing stage to wait for processing, so as to avoid a collision in processing the message on a pipeline.
In one embodiment, before the message to be processed reaches the current processing stage, the method may further include determining a target stage and a number of cycles of the message to be processed that require cyclic processing in the pipeline according to one or more of a pre-classification result of a protocol to which the message to be processed belongs, a head vector, and hit entries of the message to be processed in the reconfigurable matching table. And inserting a preset message carrying a discard mark after the message to be processed according to the cycle times, wherein the discard mark is used for indicating that the preset message is discarded when the preset message reaches a specified target stage.
In this embodiment, since the foregoing embodiment introduces inter-stage buffering, the problem of pipeline conflict is solved, but if the number of messages circulating in the system increases, the front stage cannot sense that the back stage needs to circulate, and still continuously send the messages, so that great buffering is required, and such design brings about great additional hardware resource expenditure. The preset message may be a protocol null message, where the protocol null message carries information to be discarded at a certain stage, and when the protocol null message reaches a specified stage of the discard flag, the protocol null message is not entered into a preset buffer corresponding to the specified stage, but is discarded directly. In this way, the size of the cache can be reduced, thereby saving hardware resources.
The determination of the target stage and the number of cycles can be referred to the description of the related embodiments, and will not be repeated here.
In an embodiment, the result of the path prediction of the previous message may be inserted into a plurality of protocol empty messages with internal discard marks after the message to be processed by the message insertion part shown in fig. 3, so as to ensure that the message header vector of the loop execution does not affect the processing of other subsequent message header vectors.
In one embodiment, the number of preset messages is equal to the number of cycles of the message to be processed.
In this embodiment, in practical application, for the messages of different types of protocols, the required cycle times are basically inconsistent, if a preset message is inserted when all the messages to be processed requiring cycle are encountered to a previous stage, taking an inserted protocol null message as an example, when the number of the inserted protocol null messages is smaller than the cycle times of the messages to be processed, a normal message header will enter a queue of a preset buffer, when the number of the inserted protocol null messages is greater than the cycle times of the messages to be processed, cavitation will occur on a pipeline, so that the efficiency of the pipeline is affected by idle running of computing resources, and the line speed processing of the messages is further affected, and only when the number of the inserted protocol null messages is exactly equal to the cycle times of the messages to be processed, the normal message does not need to enter the preset buffer, and the computing resources will not be wasted. Therefore, the present embodiment proposes to perform message pre-classification and path prediction on the message to be processed, and identifies the number of cycles required for the message to be processed when the header of the message is parsed by the programmable header parser, where the number of cycles may be the sum of the number of cycles required for multiple protocols, such as N1 for protocol 1 and N2 for protocol 2, and the number of cycles required for the message to be processed is n1+n2. Through message pre-classification and path prediction, the number of protocol empty messages which need to be inserted after the message to be processed can be accurately identified, so that normal messages do not need to enter a cache, and computational resources are not wasted. For the programmable header parser, only the marking function is added on the original function, and the overhead of hardware is negligible.
In an embodiment, assuming that the protocol types to which the message to be processed belongs are protocol 1 and protocol 2, and a total of 5 times is required to be circulated, wherein protocol 1 indicates to circulate 2 times in stage 2, protocol 2 indicates to circulate 3 times in stage 3, then 5 protocol null messages are inserted in advance after the message to be processed, and supposing that null message 1 is closest to the message to be processed, null message 2, null message 3, null message 4 and null message 5 are sequentially arranged behind the null message 1 and null message 5, the discard marks carried by null message 1 and null message 2 indicate to discard in stage 2, the discard marks carried by null message 3, null message 4 and null message 5 indicate to discard in stage 3, and then null message 1 and null message 2 are discarded in stage 2, and null message 3, null message 4 and null message 5 are discarded in stage 3 in the circulation execution process, so that the buffer resource is saved.
For example, in the pipeline shown in fig. 3, the number of protocol empty messages after the messages to be processed are inserted in advance is configured to be equal to the cycle number of the messages to be processed, so that normal messages do not need to enter a buffer memory, and no calculation resources are wasted, the capacity of the buffer memory before the stage 0 and/or the buffer memory between the stages can be designed to be very small, and the buffer memory can be used for temporarily storing the messages entering the pipeline due to the deviation between the path prediction and the actual execution path, thereby saving the buffer memory resources.
In one embodiment, the method further includes inputting the header vector of the message to be processed into the target stage when the other message is a preset message carrying a discard flag, and discarding the preset message in the designated stage according to the discard flag.
In this embodiment, if a preset protocol null message is inserted in advance after a message to be processed, if the message to be processed needs to enter circulation processing in a current processing stage, taking a single-stage circulation of the current processing stage when in a target stage as an example, when it is detected that other messages which arrive at the current processing stage simultaneously with the message to be processed exist and are the protocol null message carrying a discard mark, a header vector of the message to be processed is input again to an inlet of the current processing stage for circulation execution, and the protocol null message is discarded in a specified current processing stage according to the discard mark.
In one embodiment, taking a single-stage cyclic process as an example, a determination is made at an entry of each stage, and there are three cases, in which, in the first case, a cyclic message header vector to be circularly processed arrives simultaneously with a subsequent message header vector, where the cyclic message header vector enters the current processing stage, and the subsequent message header vector enters a preset buffer before the current processing stage. In the second case, the cyclic message header vector arrives at the same time as the internally marked preset message inserted by the message insertion module, and in this case the cyclic message header vector enters this stage, and the internally marked preset message is discarded. In the third case, when only a head vector of a normal message a arrives (when the normal message is referred to as a message which does not need to be circularly processed in the current processing stage and does not carry a discard mark), judging whether an unprocessed message exists in a preset buffer memory of the current processing stage, if the unprocessed message exists in the preset buffer memory, entering a first message which is arranged at the head of a queue in the preset buffer memory into the current processing stage, and inserting the head vector of the normal message a into the tail of a preset buffer memory queue. If no message exists in the preset buffer, the head vector of the normal message A directly enters the current processing stage.
In one embodiment, for an inter-phase loop, such as the loop execution process from phase 1 to phase 0 shown in fig. 3, loop processing is performed in phase 0, and the process is similar to single-phase loop, and a determination is made at the entry of each phase, where the loop message header vector to be processed in the loop and the next message header vector arrive at phase 0 at the same time, and the loop message header vector enters phase 0, and the next message header vector enters the preset buffer before phase 0. In the second case, the cyclic message header vector and the internally marked preset message inserted by the message insertion module arrive at stage 0 at the same time, and in this case, the cyclic message header vector enters stage 0, and the internally marked preset message is discarded. In the third case, when only one head vector of the normal message A reaches the stage 0, judging whether an unprocessed message exists in the preset buffer memory of the stage 0, if the unprocessed message exists in the preset buffer memory of the stage 0, entering the first message arranged at the head of the queue in the preset buffer memory of the stage 0 into the current processing stage, and inserting the head vector of the normal message A into the tail of the preset buffer memory queue of the stage 0. If the judgment is that the message does not exist in the preset cache of the stage 0, the head vector of the normal message A directly enters the stage 0.
That is, during the inter-phase cycle, the cyclic message header vector enters a phase preceding the current processing phase, and is not the current processing phase. Specific reference may be made to specific hardware designs as to how many stages may cycle through, and as to hardware resource consumption.
In one embodiment, the method further includes transferring the message that is not required to be processed in the target stage to the corresponding processing stage through the bypass when the message to be processed enters the target stage for cyclic processing.
In this embodiment, in order to solve this problem, the present embodiment proposes that the pipeline scheduler saves the computing resources without driving the protocol null message, by using the protocol null message that is driven in advance or passing through the processing stage before being discarded. Taking the pipeline structure in fig. 3 as an example, it is assumed that stage 0 is logic for processing Ethernet (Ethernet) protocol, ethernet packet will be processed once in stage 0, no cycle is needed, stage 1 is logic for processing MPLS protocol, and MPLS packet will be cycled several times in stage 1. In this case, by the pipeline scheduler shown in fig. 3, after the packet to be processed in the MPLS protocol is sent, the Ethernet packet that does not need to be processed in the stage 1 is preferentially scheduled to enter, and when the packet to be processed in the stage 1 is circularly executed, the stage 1 is a target stage corresponding to the MPLS packet, and other packets that do not need to be processed in the stage 1 can be transferred to a corresponding processing stage through a bypass, for example, can bypass the stage 2 to go to a later processing stage. By the method, resources of the pipeline can be utilized more fully, and the resource utilization rate is improved.
Step 404, send the message to be processed to the next stage of the pipeline.
In this step, if the message to be processed does not need to enter the circulation processing in the current processing stage, the message to be processed is normally sent to the inlet of the next stage of the current processing stage for processing, and the pipeline processing process of the normal message is completed.
It should be noted that, the sub-modules involved in the above message processing method, such as pre-classification of the message, prediction of the message path, insertion of the message, single-stage circulation, and inter-stage circulation, may be configured with specific software and hardware implementation schemes alone, or may be deployed together comprehensively, and may be selected based on actual requirements in actual use, which is not limited by the embodiments of the present application.
The message processing method can support the entering of the message header vector circulation or the previous stage, and the processing units of the multiple stages are multiplexed in this way. Specifically, through modules such as message pre-classification, message path prediction, message insertion and the like, information is fed back to a front-stage scheduler to assist in message scheduling, whether circulation is required to enter from the stage is judged through a message pre-classification result or a message header vector field or an execution action of hit entries in each stage, if circulation is required, the circulation message header vector is re-entered into the stage or the previous stage as input, and if circulation is not required, the circulation message header vector is normally sent to a next stage inlet, so that single-stage circulation on a pipeline or backtracking to the previous stage is realized, flexible service deployment is realized, and the problem of insufficient hardware resource deployment is solved. In addition, the performance can be reduced linearly when encountering messages requiring complex processing.
The message processing method at least has the following advantages:
1. Support single-phase loop processing, multi-phase rebound processing, meaning more flexible service deployment.
2. And directly processing a certain service, thereby releasing bus resources to relieve the condition of insufficient bus resources.
3. Hardware solves the problem of message conflict, and reduces development difficulty of compiler
4. The pipeline architecture can derive the requirement of supporting multiple table lookup, and avoid repeated deployment of resources.
5. And pipeline operation, which ensures the linear speed of general service messages, the linear decline of the water performance of complex service single flows and the linear speed of messages among multiple pipelines.
Referring to fig. 5, a message processing apparatus according to an embodiment of the application is shown, and the apparatus can be applied to the electronic device 1 shown in fig. 1. The method can be applied to the application scenes of the message processing systems shown in fig. 2-3 to realize single-stage circulation on a pipeline, can flexibly deploy the service through the single-stage circulation, complete all processing of the service at the stage, release bus resources, solve the problem of insufficient deployment of hardware resources, improve the message processing performance and improve the resource utilization rate. The embodiment takes the application to network equipment as an example, and the device comprises an acquisition module, a judging module, a circulation module and a processing module, wherein the functional principle of each module is as follows:
And the acquisition module is used for acquiring the message to be processed reaching the current processing stage in the production line.
And the judging module is used for judging whether the message to be processed needs to enter the circulation processing at the current processing stage.
And the circulation module is used for circularly inputting the head vector of the message to be processed into the appointed target stage on the pipeline if the message to be processed needs to enter circulation processing in the current processing stage.
And the processing module is used for sending the message to be processed to the next stage of the pipeline if the message to be processed does not need to enter the circulation processing in the current processing stage.
In an embodiment, the judging module is configured to judge whether the to-be-processed packet needs to enter the loop processing at the current processing stage according to one or more of a pre-classification result of a protocol to which the to-be-processed packet belongs, a header vector, and a hit entry of the to-be-processed packet in the reconfigurable matching table.
In an embodiment, the loop module is configured to determine a target stage for loop processing of the message to be processed according to one or more of a pre-classification result of a protocol to which the message to be processed belongs, a header vector, and hit entries of the message to be processed in the reconfigurable matching table, if the message to be processed needs to enter loop processing in the current processing stage. The header vector of the message to be processed is input into a target phase comprising the current processing phase and/or any phase preceding the current processing phase.
In one embodiment, the pre-classification result includes a set of protocol types to which the message to be processed belongs. The circulation module is specifically configured to obtain an actual execution path length corresponding to a reference packet, where the reference packet is of the same type as the to-be-processed packet. And determining a target stage needing to circularly process the message to be processed according to the protocol type set to which the message to be processed belongs and the actual execution path length of the reference message.
In one embodiment, the plurality of processing stages of the pipeline are respectively configured with a corresponding default buffer, where the default buffer is used to buffer a message queue waiting for processing at the corresponding stage. The circulation module is specifically configured to detect whether there are other messages that reach the target stage simultaneously with the message to be processed. If no other message exists, the head vector of the message to be processed is input into the target stage. If other messages exist, the head vector of the message to be processed is input into the target stage, and the other messages are added into a preset cache queue corresponding to the target stage.
In an embodiment, the loop module is further specifically configured to, if there is another message, input a header vector of the message to be processed into the target stage when the other message is a next message of the message to be processed, and add the next message into a preset buffer queue corresponding to the target stage. And/or, the circulation module is specifically further configured to input the header vector of the message to be processed into the target stage when the other message is a preset message carrying a discard mark, and discard the preset message in the designated stage according to the discard mark.
In one embodiment, the system further comprises a determining module, configured to determine, before the message to be processed reaches the current processing stage, a target stage and a number of cycles of the message to be processed in the pipeline, where the target stage and the number of cycles need to be processed in the pipeline according to one or more of a pre-classification result, a header vector, and a hit entry of the message to be processed in the reconfigurable matching table of a protocol to which the message to be processed belongs. The inserting module is used for inserting a preset message carrying a discarding mark after the message to be processed according to the cycle times, and the discarding mark is used for indicating that the preset message is discarded when the preset message reaches a specified target stage.
In one embodiment, the number of preset messages is equal to the number of cycles of the message to be processed.
In one embodiment, the system further comprises a bypass module, which is used for transmitting the message which is not required to be processed in the target stage to the corresponding processing stage through a bypass when the message to be processed enters the target stage for cyclic processing.
For detailed description of the above message processing apparatus, please refer to the description of the related method steps in the above embodiment, the implementation principle and technical effects are similar, and the detailed description of this embodiment is omitted herein.
Fig. 6 is a schematic structural diagram of a cloud device 60 according to an exemplary embodiment of the present application. The cloud device 60 may be used to run the methods provided in any of the embodiments described above. As shown in fig. 6, the cloud device 60 may include a memory 604 and at least one processor 605, one processor being illustrated in fig. 6.
Memory 604, for storing computer programs, may be configured to store other various data to support operations on cloud device 60. The memory 604 may be an object store (Object Storage Service, OSS).
The memory 604 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The processor 605 is coupled to the memory 604, and is configured to execute the computer program in the memory 604, so as to implement the solutions provided by any of the method embodiments described above, and specific functions and technical effects that can be implemented are not described herein.
Further, as shown in FIG. 6, the cloud device further comprises a firewall 601, a load balancer 602, a communication component 606, a power component 603, and other components. Only some components are schematically shown in fig. 6, which does not mean that the cloud device only comprises the components shown in fig. 6.
In one embodiment, the communication component 606 of fig. 6 is configured to facilitate wired or wireless communication between the device in which the communication component 606 is located and other devices. The device in which the communication component 606 is located may access a wireless network based on a communication standard, such as a WiFi,2G, 3G, 4G, LTE (Long Term Evolution, long term evolution, LTE for short), 5G, or a combination thereof. In one exemplary embodiment, the communication component 606 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the Communication component 606 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on radio frequency identification (Radio Frequency Identification, RFID) technology, infrared data Association (IrDA) technology, ultra Wide Band (UWB) technology, bluetooth (BT) technology, and other technologies.
In one embodiment, the power component 603 of fig. 6 provides power to various components of the device in which the power component 603 is located. The power components 603 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the devices in which the power components reside.
The embodiment of the application also provides a computer readable storage medium, wherein computer executable instructions are stored in the computer readable storage medium, and when the processor executes the computer executable instructions, the method of any of the previous embodiments is realized.
Embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the method of any of the preceding embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules may be combined or integrated into another system, or some features may be omitted or not performed.
The integrated modules, which are implemented in the form of software functional modules, may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or processor to perform some of the steps of the methods of the various embodiments of the application.
It should be appreciated that the Processor may be a central processing unit (Central Processing Unit, abbreviated as CPU), or may be other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, abbreviated as DSP), application SPECIFIC INTEGRATED Circuit (ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution. The memory may include a high-speed RAM (Random Access Memory ) memory, and may further include a nonvolatile memory NVM (Nonvolatile memory, abbreviated as NVM), such as at least one magnetic disk memory, and may further be a U-disk, a removable hard disk, a read-only memory, a magnetic disk, or an optical disk.
The storage medium may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random-Access Memory (SRAM), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY EEPROM), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory (EPROM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an Application SPECIFIC INTEGRATED Circuits (ASIC). It is also possible that the processor and the storage medium reside as discrete components in an electronic device or a master device.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method of the embodiments of the present application.
In the technical scheme of the application, the related information such as user data and the like is collected, stored, used, processed, transmitted, provided, disclosed and the like, which are all in accordance with the regulations of related laws and regulations and do not violate the popular public order.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.