Disclosure of Invention
The invention aims to solve the problem that the existing network flow analysis system cannot realize the processing of mass data and the dynamic switching of data storage at the same time, and provides a data storage dynamic switching method and device for the network flow analysis system, so that the network flow analysis system can realize the dynamic switching of different data storage units under the condition of mass data processing.
In order to achieve the above object, the present invention provides the following technical solutions:
The invention provides a data storage dynamic switching device for a network flow analysis system. The data storage dynamic switching device is erected on a database unit of the network flow analysis system and at least comprises a data interaction unit, a data operation agent unit and a plurality of data storage units. And the data storage units are used for being responsible for real data storage services. The data interaction unit is used for receiving the instruction sent by the network flow acquisition and processing unit of the network flow analysis system, receiving the instruction sent by the network flow data analysis and display unit of the network flow analysis system, and calling a data operation layer configured by the data interaction unit according to the received instruction, wherein the instruction at least comprises the data structure type of the network flow data. And the data operation layer declares the data structure type and the operation content to the data operation proxy unit according to the instruction. And the data operation agent unit matches the proper data storage unit according to the data type and the operation content under the condition of receiving the declaration information, and then performs unified processing conversion on a Driver (Driver) and an operation SDK (Template) of the matched data storage unit to complete the actual operation of the data.
According to a preferred embodiment, the data structure types include at least structured data, semi-structured data, unstructured data and metadata.
According to a preferred embodiment, the data manipulation layer comprises at least a number of data storage DAO interfaces. Specifically, the data storage DAO interface can be a structured data storage DAO interface, a semi-structured data storage DAO interface, an unstructured storage DAO interface and a metadata storage DAO interface.
According to a preferred embodiment, the data interaction unit determines the data structure type of the relevant network traffic data according to the flag in the instruction, and invokes the data storage DAO interface conforming to the type according to the data structure type.
According to a preferred embodiment, the data manipulation agent unit is provided with a common data interface for receiving declaration information of the respective data storage DAO interfaces.
According to a preferred embodiment, the data operation agent unit at least includes reading configuration information of the data storage unit of the network traffic analysis system, obtaining the type and key parameters of the data storage unit configured by the network traffic analysis system, and matching the data storage unit according to the connection URI (Uniform Resource Identifier ) in the declaration information.
According to a preferred embodiment, after the matching of the data storage units is completed, the data operation agent unit loads the driver of the matched data storage unit to determine whether the matched data storage unit is available. If the matched data storage unit is available, the data operation agent unit loads the matched data storage unit with relevant configuration to generate a database capable of completing the data storage requirement of the corresponding structure type.
According to a preferred embodiment, the data operation agent unit performs unified processing and conversion on the Driver and the operation SDK (Template) of the matched data storage unit, and then completes actual operation on the data.
The invention also provides a data storage dynamic switching method for the network flow analysis system, which at least comprises the following steps:
dividing data in a network traffic analysis system into at least two data structure types;
Configuring a plurality of data storage units for the network flow analysis system, wherein the data storage units meet the requirement of storing data of different data structures;
Establishing association between data structure types and data operation layers, and distributing unique data structure types for different data operation layers;
And constructing a data operation proxy unit, and performing unified processing conversion on a driving program and an operation SDK of the data storage unit by using the data operation proxy unit to finish actual operation on data.
According to a preferred embodiment, the working steps of the data storage dynamic switching method at least include:
Step 1, a data interaction unit calls the data operation layer conforming to the type of the data according to the structure type of the data in the network flow analysis system;
Step 2, the data operation layer declares the data structure type and the operation content to the data operation proxy unit;
And step 3, the data operation agent unit is matched with a proper data storage unit according to the declaration information to generate a database capable of completing the data storage requirement of the corresponding structure type.
Compared with the prior art, the invention has the beneficial effects that:
The invention determines the data structure type of the network flow data through the data interaction unit, calls the corresponding data operation layer to state the data structure type and the operation content to the data operation proxy unit, and the data operation proxy unit matches the proper data storage unit according to the data type and the operation content, and then performs unified processing and conversion on the matched driving program and operation SDK of the data storage unit to finish the actual operation of the data, thereby enabling the network flow analysis system to realize the dynamic switching of multiple databases while finishing the processing of mass data, enabling the network flow analysis system to process different types of data, improving the data processing efficiency of the network flow analysis system, and enabling the dynamic switching of relational databases (such as MySQL, oracle), non-relational databases (such as MongoDB, redis), columnar databases (such as Apache Doris, clickHouse) and the like. The invention can realize the dynamic switching of the database while the common storage units do not influence the service operation of the network flow analysis system.
Detailed Description
The present invention will be described in further detail with reference to test examples and specific embodiments. It should not be construed that the scope of the above subject matter of the present invention is limited to the following embodiments, and all techniques realized based on the present invention are within the scope of the present invention.
Example 1
The embodiment provides a data storage dynamic switching device for a network flow analysis system. The data storage dynamic switching device is erected on a database unit of the network flow analysis system and at least comprises a data interaction unit, a data operation agent unit and a plurality of data storage units. And the data storage units are used for being responsible for real data storage services. The data interaction unit is used for receiving the instruction sent by the network flow acquisition and processing unit of the network flow analysis system, receiving the instruction sent by the network flow data analysis and display unit of the network flow analysis system, and calling a data operation layer configured by the data interaction unit according to the received instruction, wherein the instruction at least comprises the data structure type of the network flow data. The data operation layer declares the data structure type and the operation content to the data operation agent unit according to the instruction. Under the condition of receiving the declaration information, the data operation agent unit matches the proper data storage unit according to the data type and the operation content, and then performs unified processing conversion on the Driver and the operation SDK (Template) of the matched data storage unit to complete the actual operation of the data.
Common storage units include relational databases (such as MySQL and Oracle), non-relational databases (such as MongoDB, redis), column databases (such as Apache Doris and ClickHouse), and the like, which can be dynamically switched through the embodiment.
According to the embodiment, the data structure type of the network traffic data is determined through the data interaction unit, the corresponding data operation layer is called to state the data structure type and the operation content to the data operation proxy unit, the data operation proxy unit is matched with the proper data storage unit according to the data type and the operation content, and then the matched driving program of the data storage unit and the operation SDK are processed and converted uniformly to finish the actual operation of the data, so that the network traffic analysis system can realize the dynamic switching of multiple databases while finishing the processing of mass data, the network traffic analysis system can process different types of data, the data processing efficiency of the network traffic analysis system is improved, and the common storage units such as relational databases (such as MySQL and Oracle), non-relational databases (such as MongoDB, redis) and columnar databases (such as Apache Doris and ClickHouse) can realize the dynamic switching while not affecting the service operation of the network traffic analysis system.
Example 2
This embodiment is a further improvement of embodiment 1, and the repetition is not repeated. The data structure types in this embodiment include at least structured data, semi-structured data, unstructured data, and metadata.
Preferably, the data manipulation layer comprises at least a structured data store DAO interface, a semi-structured data store DAO interface, an unstructured store DAO interface, and a metadata store DAO interface.
Preferably, the data interaction unit determines the data structure type of the related network traffic data according to the flag in the instruction, and calls the data storage DAO interface conforming to the type according to the data structure type.
Preferably, the data manipulation agent unit is provided with a common data interface for receiving declaration information of the respective data storage DAO interfaces.
Preferably, the data operation agent unit at least comprises the steps of reading configuration information of the data storage unit of the network traffic analysis system, obtaining the type and key parameters of the data storage unit configured by the network traffic analysis system, and matching the data storage unit according to the connection URI (Uniform Resource Identifier ) in the declaration information.
Preferably, after the matching of the data storage units is completed, the data operation agent unit loads the driver of the matched data storage unit to determine whether the matched data storage unit is available. If the matched data storage unit is available, the data operation agent unit loads the matched data storage unit with relevant configuration to generate a database capable of completing the data storage requirement of the corresponding structure type.
Preferably, the data operation agent unit performs unified processing conversion on the Driver (Driver) and the operation SDK (Template) of the matched data storage unit, and then completes the actual operation on the data.
Example 3
This embodiment is a further improvement of embodiment 1 and embodiment 2, and the repetition is not repeated. The embodiment provides a data storage dynamic switching device for a network flow analysis system. Referring to fig. 1, the network traffic analysis system may be composed of a network traffic acquisition and processing unit, a database unit, and a network traffic data analysis and display unit. And the network flow acquisition and processing unit acquires network flow data and stores the network flow data into the database unit. The network traffic data analysis and display unit queries and retrieves relevant traffic data from the database unit.
The data storage dynamic switching device provided by the embodiment is erected on a database unit of a network flow analysis system.
Referring to fig. 2, the data storage dynamic switching device comprises a data interaction unit, a data operation agent unit and a plurality of data storage units.
The data interaction unit is respectively in communication connection with the network flow acquisition and processing unit and the network flow data analysis and display unit.
When the network flow collection and processing unit stores the collected network flow data into the database unit, a storage instruction is sent to the data interaction unit. The storage instructions may include a flag for indicating data structure type information of network traffic data to be stored.
And when the network traffic data analysis and display unit inquires or calls the network traffic data from the database unit, an inquiry instruction is sent to the data interaction unit. The query instruction may include a flag for indicating data structure type information of network traffic data to be queried or invoked.
The data structure types may include structured data, semi-structured data, unstructured data, and metadata.
Structured Data (Structured Data) is Data organized according to a predefined schema, typically stored in a relational database or spreadsheet. It has well-defined fields and data types and is easy to query and analyze. For example, the rows and columns in a table are typical representations of structured data.
Semi-Structured Data (Semi-Structured Data) is Data that has partially Structured features, but has no strictly predefined schema. It contains some marks or tags that make it easier to handle and organize. The semi-structured data may be represented using different formats (e.g., XML, JSON) and fields may be flexibly added, deleted or modified. For example, element tags in an HTML document are examples of semi-structured data.
Unstructured data (Unstructured Data) is data that has no explicit structure or pattern. It is usually in free form and cannot be directly incorporated into a conventional relational database for processing. Unstructured data may be in the form of text files, images, audio, video, and the like. For example, social media posts, email content, and image files all belong to unstructured data.
Metadata (Metadata) is data describing data. It provides information about the attributes, structure, format and meaning of the data. Metadata may help understand and manage data, including data source, creation date, owner, data type, etc. For example, metadata in a photograph may include a camera model, a photographing time, an aperture value, and the like.
The data storage units are responsible for real data storage services, and different data storage units are used for storing data of different data structures. Preferably, the number of data storage units may be set to 8 (DB 1 to DB 8) in this embodiment.
Preferably, each data storage unit (DB 1-DB 8) can support storage processing of multiple types of data, and different data storage units can respectively meet the requirement of storing data with different data structures. Preferably, each data storage unit (DB 1-DB 8) can be a conventional database such as a relational database (such as MySQL, oracle), a non-relational database (such as MongoDB, redis), a columnar database (such as Apache Doris, clickHouse), and the like.
The data interaction unit is configured with a number of data manipulation layers. Preferably, the data manipulation layer is a key component of the Model part in the MVC (Model-View-Controller) design pattern.
Preferably, the data manipulation layer in this embodiment adopts a DAO mode (DATA ACCESS Object Pattern, data access Object mode).
The DAO mode separates the data access logic from the service logic and encapsulates the data access operations in a special class. With DAO mode, the rest of the application (e.g., the controller and business logic layers) can be independent of the specific implementation details of the data store, such as database type, SQL query statements, etc.
The role of DAO mode includes:
the DAO mode is through the details such as creation of the connection of the encapsulation database, execution of SQL sentences, mapping of results, etc., so that the business logic layer does not need to interact with the database directly, thereby reducing the coupling degree of codes.
The reusability of codes is improved, namely different business logics can need to access the same data table, and the business logics can share the same database access codes through a DAO mode, so that the reusability of the codes is improved.
The maintainability of the codes is enhanced, namely, when the database structure is changed, only the related codes of the DAO mode are required to be modified, and the codes of a business logic layer or a controller layer are not required to be modified, so that maintenance work is simplified.
The DAO mode is not limited to database access, but can be extended to other types of data persistence mechanisms, such as files, caches and the like, so as to provide a uniform data access interface for application programs.
Implementation of DAO mode typically involves the steps of:
Defining DAO interfaces-defining a DAO interface for each data table to be accessed, the interface declaring all database operating methods associated with the data table, such as add, delete, modify, query, etc.
Implementing DAO interfaces-one or more implementation classes are provided for each DAO interface, the implementation classes containing specific database access logic.
DAO is used-in the business logic layer or controller layer, the data is accessed through the DAO interface (instead of its implementation class), so that the implementation class of DAO can be changed without modifying the business logic layer or controller layer code.
Preferably, the data manipulation layer in this embodiment may be a different data storage DAO interface.
Preferably, the data operation layer of the embodiment is provided with four DAO interfaces, including a structured data storage DAO interface, a semi-structured data storage DAO interface, an unstructured storage DAO interface and a metadata storage DAO interface.
Preferably, the data interaction unit is configured with a structured data store DAO interface, a semi-structured data store DAO interface, an unstructured store DAO interface and a metadata store DAO interface.
The data processing system comprises a structured data storage DAO interface, a semi-structured data storage DAO interface, an unstructured data storage DAO interface and a metadata storage DAO interface, wherein the structured data storage DAO interface is used for processing structured data, the semi-structured data storage DAO interface is used for processing semi-structured data, the unstructured data storage DAO interface is used for processing unstructured data, and the metadata storage DAO interface is used for processing metadata.
The data interaction unit determines the data structure type of the related network flow data according to the query instruction or the mark in the storage instruction, and calls the data storage DAO interface conforming to the type according to the data structure type.
The data storage DAO interface simply declares the data structure type and the operation content, and does not perform the actual data processing operations. The actual data processing operations are performed by the data manipulation agent units.
The manipulation of the data store DAO interface to declare the data structure type and the manipulation content may be implemented based on polymorphisms in the Java architecture. In the Java architecture, polymorphism (Polymorphism) is a property that allows us to use a reference variable of a parent class type to reference objects of different child classes. In brief, polymorphism allows us to invoke methods of child types through references to parent types.
The data storage DAO interface may declare the data structure type and the operation content through the interface, and implement the actual data processing operation according to the registered specific implementation interface.
Registration is divided into dynamic binding and method overwriting.
Dynamic binding-in polymorphisms, the invocation of a method is determined at runtime rather than at compile time. When a parent class type reference variable is used to reference a child class type object and invoke its method, it is actually determined which child class method should be invoked based on the type of the runtime object. This makes the code more flexible and can decide at run-time which subclass of method to execute specifically.
Method overwriting polymorphism also relies on child to parent methods. The subclass may redefine the methods inherited from the parent class and provide its own implementation. When a method that is overwritten is called using a reference variable of a parent class type, a method of a child class is actually called instead of the method of the parent class.
The specific implementation interface is used for rewriting the method of the parent class, and the child class can redefine the method inherited from the parent class and provide own implementation.
The data storage DAO interface does not directly operate Driver and Template.
Driver is a software component or library that is used to interact with a database.
Taking a monglodb type storage unit as an example, in monglodb, driver refers to a client library dedicated to communicating with the monglodb database. The driver provides a set of APIs and methods that enable developers to interact with the database through programming languages (e.g., java, python, etc.). The driver is responsible for handling specific data processing operations such as underlying network communications, data conversion, and query execution so that a developer can conveniently operate the database.
Template (operation SDK) Template is an abstraction layer that encapsulates the specific implementation details of the underlying database driver, providing a higher level interface to simplify database operations.
Taking a monglodb type storage unit as an example, in monglodb, template refers to MongoTemplate, which is a module provided by Spring Framework for interacting with monglodb. MongoTemplate provides a set of methods to perform CRUD operations (create, read, update, delete), query data, and support transaction management and aggregation operations, among others. Through MongoTemplate, developers can operate the MongoDB database more conveniently without writing complex original database query codes.
The data manipulation agent unit is provided with a common data interface for receiving declaration information of each data storage DAO interface.
After receiving the data type and the operation content declared by the data storage DAO interface, the data operation proxy unit matches the proper data storage unit according to the data type and the operation content, and then performs unified processing and conversion on the Driver (Driver) and the operation SDK (Template) of the matched data storage unit to complete the actual operation of the data. The unified processing transformation specifically rewrites the method of the parent class, and the child class can redefine the method inherited from the parent class and provide its own implementation.
The matching rule of the data operation agent unit to the declaration information and the data storage unit is to match the data storage unit according to the connection URI in the declaration information.
The data operation agent unit can acquire the configuration information of the database unit of the network traffic analysis system, and determine the type of the data storage DB supported by the database unit and key parameters such as service addresses, ports, authentication information and the like.
Preferably, when the data operation proxy unit matches the data storage units, if a plurality of data storage units with the same type of data storage capability are configured, the data operation proxy unit can select the corresponding data storage units for connection according to the configured priority rule, can judge whether to connect according to the storage service availability of the data storage units, and can also connect according to the data storage units manually selected by a worker.
Preferably, the data operation agent unit connects different data storage units through dynamic identification and dynamic assembly, dynamically loads relevant configuration on each data storage unit as required, generates a database meeting the data storage requirements of different structure types, and finally realizes the operation as the data operation agent.
Preferably, the data operation agent unit receives the declaration information of the data storage DAO interface, reads the configuration information of the data storage unit of the network traffic analysis system, loads Driver and connects the data storage unit, dynamically loads relevant configuration on demand to each data storage unit, and generates a database capable of completing one or more types of data storage processing of 4 types of structured data, semi-structured data, unstructured data and metadata.
If a plurality of data storage units configured by the flow analysis system exist, the data operation agent unit can load service templates of the data storage units with highest adaptation degree and available service according to the implementation priority rule built in the data storage units.
Preferably, the configuration information of the data storage unit comprises the type of the data storage unit configured by the network traffic analysis system and key parameters such as service address, port, authentication information and the like.
Preferably, in the case that the plurality of data storage units having the same type of data storage capability configured by the traffic analysis system are configured, when a specific data storage unit is selected to be connected to generate the database, the data operation proxy unit may manually select the data storage unit to be connected, may automatically connect according to the priority configured by the data storage unit, and may determine whether to connect according to the storage service availability of the data storage unit. The priority of the data storage unit configuration may be the priority order of the hardware configuration or the priority order of the built-in program.
Preferably, the data manipulation agent unit may connect the data storage units according to the system priority without manually designating the priority, and the different DBs (data storage units) have the same type of data storage capability. Preferably, the system priority can be built-in according to experience of actual deployment power consumption, stability and performance indexes, and the matching degree is feedback of the performance condition of the data storage DB for storing the indexes.
Preferably, the mode of judging whether the storage service of the data storage unit is available can be that the data operation agent unit initiates authentication service by loading Driver program and returns a message of successful authentication. After the authentication is successful, the data operation proxy unit can initiate connection to the corresponding data storage unit according to the received information of the authentication success.
Preferably, after the data operation proxy unit connects the corresponding data storage units according to the declaration information of the data storage DAO interface, the data operation proxy unit rewrites the data storage units according to the interface declaration information (parent class), so that the data storage units can complete storage processing of one or more types of 4 types of data, namely structured data, semi-structured data, unstructured data and metadata.
Referring to fig. 1, the data manipulation agent unit generates a database satisfying data storage requirements of different structure types, and may include:
Connecting DB1, DB2 and DB3 to generate a database meeting the requirement of the structured data storage;
connecting DB3, DB4 and DB5 to generate a database meeting the semi-structured data storage;
Connecting DB5, DB6 and DB7 to generate a database meeting unstructured data storage;
Connecting DB7 and DB8 to generate a database meeting metadata storage.
DB1-DB8 in FIG. 1 refers to one or more types of implementations capable of performing 4 types of data storage processing, structured data, semi-structured data, unstructured data, metadata. DB1-DB8 may be a real database product, either a stand-alone database or a database serving clusters. DB1-DB8 are used to complete the real data storage service, and DB1-DB8 in the drawings are only used to illustrate that the embodiment is configured with a plurality of data storage units of different types, so as to meet the data storage requirements of different data structures, and not to limit that the embodiment can only be configured with 8 data storage units.
Referring to fig. 3, the workflow of the data storage dynamic switching apparatus may be:
Step 1, a data interaction unit receives an instruction sent by a network traffic acquisition and processing unit or a network traffic data analysis and display unit, wherein the instruction comprises a mark representing data structure type information of related network traffic data;
step 2, the data interaction unit calls a data storage DAO interface which accords with the data structure type represented by the data interaction unit according to the mark in the instruction;
step 3, the data storage DAO interface declares the data structure type and the operation content to the data operation proxy unit through the public interface;
And 4, the agent unit generates a database meeting the data storage requirements of different structure types according to the received declaration information.
Preferably, taking storage of structured data and semi-structured data as an example, the workflow of the dynamic switching device will be described, and the specific flow includes:
S1, a network flow collection and processing unit collects data packet data, application data statistics data (including various network performance indexes of an App) and semi-structured data transaction log data (including network performance indexes and other fields of variable quantity and types) of structured data are generated according to built-in service data logic processing, and then a storage instruction with structured data marks and semi-structured data marks is sent to a data interaction unit.
S2, the interaction unit calls the structured data storage DAO interface and the semi-structured data storage DAO interface according to the structured data marks and the semi-structured data marks in the storage instruction.
S3, the structured data storage DAO interface and the semi-structured data storage DAO interface declare data warehouse entry load operation and corresponding data structure types to the data operation proxy unit according to the storage instruction. The structured data store DAO interface declares a load operation for the structured data, and the semi-structured data store DAO interface declares a load operation for the semi-structured data.
S4, after the data operation agent unit receives the declaration information of the structured data storage DAO interface and the semistructured data storage DAO interface, the configuration information of the data storage unit of the network traffic analysis system is read, the data storage unit supporting structured data storage and semistructured data storage is determined, the driving program (Driver) and the operation SDK (Template) of the determined data storage unit are processed and converted uniformly, the data storage unit is connected, the driving program (Driver) and the operation SDK (Template) are registered as the real implementation of the data agent operation, a dynamically assembled database is generated, and the structured data and the semistructured data are stored and processed.
Example 4
The present embodiment provides a dynamic switching method for data storage for a network traffic analysis system, and the dynamic switching method for data storage for a network traffic analysis system provided in this embodiment may be implemented by the dynamic switching device for data storage for a network traffic analysis system provided in embodiment 1, embodiment 2, and embodiment 3. The data storage dynamic switching method for the network flow analysis system at least comprises the following steps:
dividing data in a network traffic analysis system into at least two data structure types;
configuring a plurality of data storage units meeting the requirement of storing data of different data structures for a network traffic analysis system;
establishing association between data structure types and data operation layers, and distributing unique data structure types for different data operation layers;
and constructing a data operation proxy unit, and performing unified processing conversion on the driver of the data storage unit and the operation SDK by using the data operation proxy unit to finish actual operation on the data.
Preferably, the working steps of the data storage dynamic switching method at least comprise:
step 1, a data interaction unit calls a data operation layer conforming to the type of the data according to the structure type of the data in a network flow analysis system;
step 2, the data operation layer declares the data structure type and the operation content to the data operation agent unit;
and 3, the data operation agent unit is matched with a proper data storage unit according to the declaration information, and a database capable of completing the data storage requirement of the corresponding structure type is generated.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.