Disclosure of Invention
The invention aims to provide an intelligent analysis system and method for identifying dimer isomer in beta-lactam antibiotics, which are used for solving the problem that impurity identification is difficult due to the fact that dimer isomer impurities are relatively close to the structure of main components of medicines in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions:
an intelligent analysis system for identifying dimer isomers in beta-lactam antibiotics, comprising:
The potential characteristic fragment data collection subsystem is used for intelligently reading dimer isomers and potential characteristic fragments thereof by utilizing theoretical cracking fragments of dimer impurities to obtain theoretical calculated potential characteristic fragment data of each dimer isomer, wherein the theoretical calculated potential characteristic fragment data comprises the molecular weight of the potential characteristic fragments in a positive ion mode;
The unique characteristic fragment data screening subsystem is used for screening the potential characteristic fragment data of each dimer isomer to obtain corresponding unique characteristic fragment data, wherein the unique characteristic fragment data comprises the molecular weight of the unique characteristic fragments in a positive ion mode;
The mass spectrum image recognition subsystem is used for selecting a target mass spectrum image under the target molecular weight of a mass spectrum image obtained by detecting actual dimer impurities, and extracting mass spectrum image data of the target mass spectrum image, wherein the mass spectrum image data comprises mass-to-charge ratios corresponding to fragment ion peaks;
and the comparison subsystem is used for comparing the theoretically calculated fragment data with the mass spectrum image identification data of the actual dimer impurity, and obtaining a matching result of the molecular weight and the mass-to-charge ratio according to a preset fragment matching strategy.
Preferably, the latent feature fragment data collection subsystem comprises:
the first data reading module is used for reading theoretical cracking fragments of dimer impurities;
the characteristic acquisition module is used for carrying out polymerization mode listing based on each theoretical cracking fragment and different potential polymerization sites thereof to obtain different dimer isomers and carrying out relevant characteristic data acquisition;
The calculation module is used for carrying out relevant characteristic data processing on each potential characteristic fragment set of each dimer isomer, wherein the potential characteristic fragments are fragments obtained by cutting the dimer isomers according to the cutting rule of the carbapenem compound on the premise of keeping the integrity of amide or ester bonds in the dimer connecting two molecules;
And the first data output module is used for outputting data of each dimer isomer and potential characteristic fragments thereof, and each potential characteristic fragment data comprises the molecular weight of each potential characteristic fragment in a positive ion mode.
Preferably, the unique feature fragment data screening subsystem comprises:
A second data reading module for reading the latent feature fragment data for each dimer isomer;
the feature screening module is used for removing potential feature fragments shared by other dimer isomers from the potential feature fragment data to obtain unique feature fragment data of each dimer isomer;
And a second data output module for outputting unique characteristic fragment data for each dimer isomer, including the molecular weight of the unique characteristic fragments in positive ion mode.
Preferably, the mass spectrum image recognition subsystem comprises:
the third data reading module is used for reading the mass spectrum image which is actually detected;
the image recognition and conversion module is used for determining an RT range of the dimer impurity in the total ion flow graph based on the target molecular weight, acquiring a mass spectrum image in the RT range as a target mass spectrum image, and extracting mass spectrum image recognition data for any target mass spectrum image, wherein the mass spectrum image recognition data comprises mass-to-charge ratio and mass spectrum peak intensity under a positive ion mode;
And the third data output module is used for outputting the identification data of each mass spectrum image within the target RT range.
Preferably, in the image recognition module, the number of the target mass spectrum images is not less than 3.
Preferably, the alignment subsystem includes:
The fourth data reading module is used for reading the fragment data of each unique characteristic and the identification data of each mass spectrum image;
The accuracy control module is used for setting a fragment matching strategy and matching accuracy thereof, wherein the fragment matching strategy comprises a first matching strategy and a second matching strategy, the first matching strategy is used for identifying that corresponding unique characteristic fragment data is matched with mass spectrum image identification data when the absolute difference value between the molecular weight and the mass-to-charge ratio does not exceed a preset value, and the second matching strategy is used for identifying that corresponding dimer isomer is consistent with dimer impurity of the RT interval when the same unique characteristic fragment data is matched with at least two mass spectrum image identification data of the same RT interval.
The comparison module is used for matching the molecular weight of the unique feature fragment data and the mass-to-charge ratio of the mass spectrum image identification data according to a preset fragment matching strategy;
and the fourth output module is used for outputting a matching result.
Preferably, the comparison subsystem further comprises a noise reduction module for removing background impurity peaks in the mass spectrum image identification data.
Preferably, the intelligent analysis system further comprises a network visualization module for visualizing the matching result.
A method of intelligent analysis for identifying dimer isomers in a β -lactam antibiotic, the method comprising the steps of:
Carrying out pre-condition, namely carrying out actual detection and theoretical simulation on the medicine to obtain a mass spectrum image and a theoretical fragmentation collection of dimer impurities;
carrying out polymerization on each theoretical cracking fragment according to a potential polymerization site to obtain a plurality of dimer isomers, extracting all potential characteristic fragments of each dimer isomer, and calculating the molecular weight of each dimer isomer to obtain corresponding potential characteristic fragment data;
Screening the latent signature fragment data for unique signature fragment data for each dimer isomer;
Screening mass spectrum images by utilizing the target molecular weight and the RT range in the total ion flow diagram, selecting a plurality of mass spectrum images in the RT range as target mass spectrum images, and extracting mass-to-charge ratios corresponding to fragment ion peaks from each target mass spectrum image obtained by screening to obtain corresponding mass spectrum image identification data;
and matching the mass spectrum image identification data with the unique feature fragment data according to a preset fragment matching strategy to obtain a matching result.
Compared with the prior art, the invention has the beneficial effects that:
The intelligent analysis system constructs potential dimer isomer impurity configuration through theoretical fragmentation fragments of dimer impurities, can effectively screen the configuration of the isomer impurities through matching of mass spectrum data with unique characteristic fragments of the isomer impurity configuration, provides a new path for identifying beta-lactam antibiotic polymer impurities, and can replace detection personnel to manually read and analyze instrument data so as to intelligently and effectively identify the dimer isomer impurities.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present invention, based on the embodiments of the present invention.
Referring to fig. 1, an intelligent analysis system for dimer isomers in beta-lactam antibiotics is composed of a latent feature fragment data collection subsystem, a unique feature fragment data screening subsystem, a mass spectrum image recognition subsystem, a comparison subsystem and a network visualization module.
Referring to fig. 2, the latent feature fragment data collection subsystem includes a first data reading module, a feature collection module, a calculation module, and a first data output module.
The first data reading module is used for reading medicine information, sample source information of medicines and a theoretical cracking fragment set, wherein the theoretical cracking fragment set is formed by cracking fragments of all possible dimer impurities.
The characteristic acquisition module is used for carrying out different arrangement on each theoretical understanding fragment according to different potential polymerization sites to obtain relevant characteristic data of all possible dimer isomers.
The calculation module is used for developing acquisition codes for each dimer isomer and running the acquisition codes to acquire corresponding potential characteristic fragment data of the dimer isomer, wherein the acquisition codes comprise molecular weight data of each potential characteristic fragment generated by possible fragmentation in a hydrogen-containing positive ion mode.
In the characteristic acquisition module, because theoretical cleavage fragments of dimer impurities all have a plurality of potential polymerization sites, the possible dimer isomers can be obtained by arranging the possible polymerization sites between monomers. In the case of penem drugs, the dimer isomer may be formed by linking a carboxyl group to an amino group or an imino group or a carboxyl group to a hydroxyl group, and in the case of penicillin drugs, the dimer isomer may be formed by polymerizing a carboxyl group to an amino group or an imino group.
In the calculation module, under the premise of keeping the integrity of amide or ester bonds in dimers connecting two molecules, possible dimer isomers are cut according to the cutting rule of a carbapenem compound, fragments which are generated by cracking the same dimer isomer and possibly generated by cracking the dimer isomer in a hydrogen-containing positive ion mode can be obtained to form potential characteristic fragment sets, namely cracking samples, and fragments which have potential structural differences with other dimer configurations are contained in the potential characteristic fragment sets in each dimer configuration.
The first data output module is used for outputting potential characteristic fragment data of each dimer isomer, including fragment document codes, isomers and numbers thereof, drug names, cleavage samples and molecular weights (positive ions) thereof.
Referring to fig. 3, the unique feature fragment data screening subsystem includes a second data reading module, a feature screening module, and a second data output module.
The second data reading module is used for reading potential characteristic fragment data of each dimer isomer, including fragment document codes, isomers and numbers thereof, drug names, cracked samples and molecular weights thereof.
The feature screening module is used for removing potential feature fragments shared by other dimer isomers from the potential feature fragment data to obtain unique feature fragment data of each dimer isomer.
The second data output module is used for outputting UNIQUE characteristic fragment data (UNIQUE profile) of each dimer isomer, including UNIQUE characteristic fragment document codes, isomer numbers, drug information, cleaved samples and molecular weights (positive ions).
Referring to fig. 4, the mass spectrum image recognition subsystem includes a third data reading module, an image recognition module, and a third data output module.
The third data reading module is used for reading mass spectrum image data, including a medicine name, a sample condition, a mass spectrum instrument name and model, an operation mode and an actually detected LC-MS/MS mass spectrum image.
The image data identification and conversion module is used for identifying LC-MS/MS mass spectrum images, selecting a plurality of mass spectrum images as target mass spectrum images in the RT range based on the target molecular weight and the RT range in the total ion flow diagram, and extracting mass-to-charge ratio and mass spectrum peak intensity of any target mass spectrum image in a positive ion mode to obtain corresponding mass spectrum image identification data (MSDATA).
The total ion flow diagram is a chart commonly used in mass spectrometry, and shows the change of signal intensity of all ions in a sample with time, and is an experimental data diagram obtained by actually detecting LC-MS/MS mass spectrum images.
And the third data output module is used for outputting each mass spectrum image identification data, including image data document codes, medicine names, sample conditions, target molecular weight (positive ions), RT values, interval ranges to which RT belongs, and mass spectrum instrument names and models.
Referring to fig. 5, the alignment subsystem includes a fourth data reading module, an accuracy control module, an alignment module, and a fourth data output module.
The fourth data reading module is used for reading the unique characteristic fragment data output by the unique characteristic fragment data screening subsystem and the mass spectrum image identification data output by the mass spectrum image identification subsystem.
The accuracy control module is used for setting a fragment matching strategy and matching accuracy thereof, wherein the fragment matching strategy comprises a first matching strategy and a second matching strategy.
The first matching strategy is that if the absolute difference value between the molecular weight (positive ions) of the unique characteristic fragments in the unique characteristic fragment data and the mass-to-charge ratio in the mass spectrum image identification data does not exceed a preset value, the dimer isomer corresponding to the unique characteristic fragment data is matched with the mass spectrum image identification data, and the specific characterization formula of the absolute difference value is as follows:
Wherein D represents a difference value, m represents a charge-to-mass ratio in the mass spectrum image identification data based on the positive ion mode, and t represents a charge-to-mass ratio of the unique feature fragments in the theoretical unique feature fragment data in the positive ion mode.
And a second matching strategy, wherein if the same unique characteristic fragment data is matched with not less than two mass spectrum image identification data in the same RT interval, the substances in the interval are considered to correspond to dimer isomers of the unique characteristic fragment data.
The second matching strategy is completed based on the first matching strategy, the first matching strategy is used for comparing and matching the unique characteristic fragment data of the dimer isomer with mass spectrum image identification data with the same target molecular weight of each RT interval, and the second matching strategy is executed based on the matching after the matching is completed.
In the invention, the preset value in the first matching strategy is the matching precision, generally between 5 and 20ppm, and the specific value of the matching precision can be set by a person skilled in the art according to the actual situation.
The comparison module is used for matching the dimer isomer theoretical characteristic fragment data with the actual dimer impurity mass spectrum image identification data based on a preset fragment matching strategy to obtain a matching result.
And the fourth data output module is used for outputting a matching result, and comprises the document name of the unique characteristic fragment data successfully matched under the first matching strategy and the mass spectrum image identification data and the same numerical value of the document name and the document name, and/or the unique characteristic fragment data of the dimer isomer successfully matched under the second matching strategy.
According to the invention, the matching precision is set by setting the precision control module, so that the precision and the specificity of the risk control of the beta-lactam antibiotics impurity are improved.
Further, the comparison subsystem also comprises a noise reduction module for filtering the non-detection meaning background impurity peaks in the mass spectrum image identification data.
The network visualization module is used for performing data visualization processing on the matching result output by the comparison subsystem, and displaying related data of the unique characteristic fragments of the dimer isomer after matching and a relational network, such as a document name, a mass spectrum peak and an RT value, a comparison numerical value, a molecular weight in a positive ion mode, a structural formula of a cracked sample and a fragment corresponding to the unique characteristic fragments. The function of ' can be set according to the ' visual option ' in the module, and selection can be made for the display of the relational data, such as the visualization of the unique data, the corresponding relationship between the excel data and the unique feature fragment structural formula and between the mass spectrum data of the unique relational data.
The intelligent analysis system is characterized in that a potential characteristic fragment data collecting subsystem performs theoretical simulation of theoretical dimer isomers and potential characteristic fragments thereof through code development and data intelligent processing, a unique characteristic fragment data screening subsystem screens dimer isomers about unique characteristic fragment data, a mass spectrum image recognition subsystem extracts mass spectrum image recognition data from mass spectrum image information of actually detected impurities, and a comparison subsystem realizes accurate pairing of the mass spectrum image recognition data and the unique characteristic fragment data and multidimensional display of a related data relationship network.
Based on the intelligent analysis system, the invention also provides an intelligent analysis method, and the specific development explanation is given below by taking the medicine a as an example.
Preconditioning is performed-referring to fig. 7, by performing actual detection and theoretical simulation of drug a, a mass spectrum image and all possible fragmentation fragments of dimer impurities are obtained, which constitute a theoretical fragmentation fragment set.
Step1, carrying out polymerization on each theoretical cracking fragment according to a potential polymerization site to obtain all possible dimer forms, cutting each dimer according to a cutting rule of a carbapenem compound on the premise of keeping the integrity of amide or ester bonds in the dimer connecting two molecules to obtain a corresponding potential characteristic fragment set, and calculating the molecular weight of each potential characteristic fragment in a positive ion mode to obtain corresponding potential characteristic fragment data.
In step 1 of the present invention, for each theoretical cleavage fragment, different dimers can be obtained after polymerization according to different potential polymerization sites thereof, and different dimers of different theoretical cleavage fragments are cut respectively to obtain corresponding potential characteristic fragment data, including molecular weight of each potential characteristic fragment and corresponding dimer configuration, etc., as shown in fig. 8 (a).
Step 2, screening the potential characteristic fragment data of each dimer isomer to obtain unique characteristic fragment data of each dimer isomer, as shown in fig. 8 (b).
In step 2 of the present invention, potential feature fragments common to other dimer isomers are removed from the set of potential feature fragments for each dimer isomer, and the potential feature fragments that are not removed at this time serve as unique feature fragments for the dimer isomers.
And 3, screening mass spectrum images obtained by actually detecting the medicine a by utilizing the target molecular weight and the RT range in the total ion flow diagram, selecting at least 3 mass spectrum images in the RT range as target mass spectrum images, and respectively extracting image data of each target mass spectrum image obtained by screening, wherein the image data comprise mass-to-charge ratio and mass spectrum peak intensity in a positive ion mode as shown in fig. 9 (b).
In step 3 of the present invention, the target molecular weight is the molecular weight of the dimer impurity, and in the example, the molecular weight of the dimer impurity of the drug product a is 701. The molecular weight of the dimeric impurity may be calculated based on the structure of drug a, as is known in the art. The target mass spectrum image and the mass spectrum image identification data are in one-to-one correspondence.
And 4, matching the mass spectrum image identification data of each actual dimer impurity with the theoretical unique characteristic fragment data of each theoretical unique characteristic fragment data according to a first matching strategy, and obtaining a matching result.
In step 4 of this embodiment, as shown in fig. 10, the preset value in the first matching policy is set to 10ppm, and unique feature fragment data with molecular weight of 701 is compared with each mass spectrum image identification data to obtain a matching relationship between each target mass spectrum image with mass-to-charge ratio of 701m/z and each unique feature fragment data in positive ion mode, where the matching relationship is recorded by excel, column a represents the RT interval where the successfully matched target mass spectrum image is located, column B and column C are used to represent the successfully matched target mass spectrum image and unique feature fragment data, column D and column E represent the mass-to-charge ratio of the same target mass spectrum image in column B and the fragment molecular weight of the same unique feature fragment data in column C, respectively, and absolute difference values between the mass-to-charge ratio and the fragment molecular weight are smaller than 10ppm.
And 5, visually outputting the matching result according to different options, as shown in fig. 11.
According to the invention, through code development and data intelligent processing from actually detected impurities and mass spectrum image information thereof, unique characteristic fragment data obtained by theoretical calculation of each dimer is screened out, accurate pairing of mass spectrum detection data and potential dimer characteristic data is realized, and multidimensional display of a related data relationship network is realized. The invention provides a new path for identifying the impurities of the beta-lactam antibiotic polymer, and improves the accuracy and specificity of the risk control of the beta-lactam antibiotic impurities. The technical scheme of the invention is suitable for other types of medicines except beta-lactam antibiotics, and can be popularized and used in medicine analysis and verification and medicine quality control industries.