[go: up one dir, main page]

CN101776671B - Real-time feature extraction method for analysis of complex ingredient of traditional Chinese medicine - Google Patents

Real-time feature extraction method for analysis of complex ingredient of traditional Chinese medicine Download PDF

Info

Publication number
CN101776671B
CN101776671B CN2010100395440A CN201010039544A CN101776671B CN 101776671 B CN101776671 B CN 101776671B CN 2010100395440 A CN2010100395440 A CN 2010100395440A CN 201010039544 A CN201010039544 A CN 201010039544A CN 101776671 B CN101776671 B CN 101776671B
Authority
CN
China
Prior art keywords
time
mass
noise
baseline
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2010100395440A
Other languages
Chinese (zh)
Other versions
CN101776671A (en
Inventor
张玉峰
范骁辉
程翼宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN2010100395440A priority Critical patent/CN101776671B/en
Publication of CN101776671A publication Critical patent/CN101776671A/en
Application granted granted Critical
Publication of CN101776671B publication Critical patent/CN101776671B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

本发明提供一种用于中药复杂成分分析的实时特征提取方法,由数据通讯模块、二维特征链检测、局部噪音和局部基线校正、以及特征分辨四个模块构成,顺序分析质谱仪采集的质谱数据,判断与上一个时间点采集的数据是否有连续特征,从而动态的完成二维特征链的检测;利用二维特征链中所含有的质荷比和时间信息,可快速去除时间维中的噪音和基线,克服了以往算法单纯利用时间维难以准确估算基线的缺点;由于所估算的噪音和基线具有局部特征,所以局部的信噪比是特征链中是否含有组分的特征,简化了特征检测的实现。本发明方法设计合理,数据处理系统不仅具有实时的特点,而且用户自定义参数少,运算速度快,尤其适用于液相色谱质谱联用仪。The invention provides a real-time feature extraction method for complex component analysis of traditional Chinese medicine, which consists of four modules: data communication module, two-dimensional feature chain detection, local noise and local baseline correction, and feature resolution, and sequentially analyzes the mass spectra collected by mass spectrometers data, to judge whether the data collected at the previous time point have continuous features, so as to dynamically complete the detection of the two-dimensional feature chain; using the mass-to-charge ratio and time information contained in the two-dimensional feature chain, the time dimension can be quickly removed. Noise and baseline, overcome the shortcomings of previous algorithms that are difficult to accurately estimate the baseline by simply using the time dimension; because the estimated noise and baseline have local characteristics, the local SNR is a feature of whether there are components in the feature chain, which simplifies the feature detection implementation. The method of the invention is reasonable in design, and the data processing system not only has real-time characteristics, but also has few user-defined parameters and fast operation speed, and is especially suitable for liquid chromatography mass spectrometry.

Description

一种用于中药复杂成分分析的实时特征提取方法A real-time feature extraction method for complex component analysis of traditional Chinese medicine

技术领域 technical field

本发明属于制药领域,涉及中药复杂成分分析的实时在线特征提取方法。The invention belongs to the field of pharmacy and relates to a real-time online feature extraction method for complex component analysis of traditional Chinese medicines.

背景技术 Background technique

中药是中华民族的瑰宝,已有2000多年的临床实践,在中华民族的繁衍生息过程中起到了不可磨灭的作用。随着国家在中药科技方面的不断投入,中药现代化的进程取得了可喜进展,一些药物的疗效再次被科学实验证明,有的甚至超过化学药物。当今西方国家在化药的研究和开发方面具有绝对优势,并且很难在短期内有所改观,故大力发展中药事业对我国的制药行业具有极其重要的意义。但中药成分及其复杂,且长期以来基础研究不够深入,这虽然有历史性的原因,但现有技术的局限,显得尤其突出。在液相色谱质谱联用(LC-MS)技术成熟之前,对中药的研究,需先用植物化学的分离手段从中药中分离到单体化合物,然后经过四大光谱分析,才能了解其中化合物的结构信息。然而,LC-MS技术完全改观了传统的中药物质基础研究模式,在提高结构确证速度的同时,也使以前不能分离提取得到的微量成分的鉴定成为可能。但是,现有LC-MS数据的分析主要靠人工完成,成为当前质谱应用的一个瓶颈问题,尤其是在需要分析大量的中药组分库样品时。目前,主流的LC-MS生产商(如热电集团,应用生物公司和Waters公司)提供的工作站,只能在数据采集完毕后,进行简单的一维数据分析,用户要设定多个参数,一套参数也仅能适用于特定的样本,不同的样本需做相应调整,故数据的分析成为当前大批LC-MS应用的限速步骤。Traditional Chinese medicine is the treasure of the Chinese nation. It has been clinically practiced for more than 2,000 years and has played an indelible role in the process of the Chinese nation's reproduction. With the country's continuous investment in Chinese medicine science and technology, gratifying progress has been made in the modernization of Chinese medicine. The curative effects of some medicines have been proved by scientific experiments again, and some even surpass chemical medicines. Today's western countries have an absolute advantage in the research and development of chemical medicines, and it is difficult to change them in a short period of time. Therefore, vigorously developing the cause of traditional Chinese medicine is of great significance to my country's pharmaceutical industry. However, the composition of traditional Chinese medicine is extremely complex, and the basic research has not been deep enough for a long time. Although there are historical reasons for this, the limitations of the existing technology are particularly prominent. Before liquid chromatography-mass spectrometry (LC-MS) technology matured, the study of traditional Chinese medicine needed to use phytochemical separation methods to separate monomer compounds from traditional Chinese medicine, and then go through four major spectral analysis to understand the compounds. structural information. However, LC-MS technology has completely changed the traditional traditional Chinese medicine substance basic research mode, and while improving the structure confirmation speed, it also makes it possible to identify trace components that could not be separated and extracted before. However, the analysis of existing LC-MS data is mainly done manually, which has become a bottleneck problem in current mass spectrometry applications, especially when a large number of samples of traditional Chinese medicine component libraries need to be analyzed. At present, the workstations provided by mainstream LC-MS manufacturers (such as Thermoelectric Group, Applied Bio, and Waters) can only perform simple one-dimensional data analysis after data collection is completed. Users need to set multiple parameters. The set of parameters can only be applied to specific samples, and different samples need to be adjusted accordingly, so data analysis has become the speed-limiting step for a large number of LC-MS applications.

LC-MS采集的信号由时间维和质量维构成,而一般的液相色谱与紫外检测器(LC-UV)连接采集的信号只有一个时间维度。通常人们将从LC-UV中,化合物洗脱的一段时间内强度的变化称为“色谱峰”;而在二维LC-MS中化合物洗脱时,不仅有时间过程,还有质量的分布,我们称同时含有两维信息的区域成为化合物的“特征”,那么用于寻找这些区域的算法称为特征提取算法或方法。由于LC-MS所采集数据维度的增加,大大增加了从其中提取信息的难度。在中药领域,研究LC-MS特征提取的方法很少,而在生物信息学领域,却是一个非常热门的方向,这得益于蛋白组学、代谢组学研究中需要处理大量LC-MS数据的需求推动。比较著名的开源工具包括:XCMS,MZmine等;商业软件包括:AnalyzerPro,ProTrawler等。这些工具仅用于LC-MS采集以后数据的离线分析,其算法是建立在整个分析时间内的数据的基础上的,比如XCMS要先对采集完毕以后的数据就某个质量数范围的信号合并,然后才能从其中进行峰检测,并且这些软件都需要设定多个参数,一些参数没有实际的物理意义,比如小波的尺度、系数等,难以被一般用户所理解。The signal collected by LC-MS is composed of time dimension and mass dimension, while the signal collected by general liquid chromatography connected with ultraviolet detector (LC-UV) has only one time dimension. Usually, people refer to the change of the intensity of the compound elution over a period of time from LC-UV as "chromatographic peak"; while in two-dimensional LC-MS, when the compound is eluted, there is not only a time course, but also a mass distribution. We call the regions containing both two-dimensional information the "features" of compounds, and the algorithms used to find these regions are called feature extraction algorithms or methods. Due to the increase in the dimensionality of the data collected by LC-MS, it greatly increases the difficulty of extracting information from it. In the field of traditional Chinese medicine, there are few methods for studying LC-MS feature extraction, but in the field of bioinformatics, it is a very popular direction, which benefits from the need to process a large amount of LC-MS data in proteomics and metabolomics research driven by demand. Well-known open source tools include: XCMS, MZmine, etc.; commercial software includes: AnalyzerPro, ProTrawler, etc. These tools are only used for off-line analysis of data after LC-MS acquisition. The algorithm is based on the data within the entire analysis time. For example, XCMS must first combine the signals of a certain mass number range after the acquisition is completed. , and then peak detection can be performed from it, and these software need to set multiple parameters, some parameters have no actual physical meaning, such as wavelet scale, coefficients, etc., which are difficult for ordinary users to understand.

发明内容 Contents of the invention

本发明针对现有技术的不足和缺陷,提供一种用于中药复杂成分分析的实时特征提取方法。该方法基于LC-MS的时间维和质量维的两维特征信息,通过二维特征链检测,局部噪音和基线估计,以及特征分辨来实现,不仅假阳性低,而且运算速度快,可实现实时分析。本发明通过以下步骤实现:The invention aims at the deficiencies and defects of the prior art, and provides a real-time feature extraction method for complex component analysis of traditional Chinese medicine. This method is based on the two-dimensional feature information of the time dimension and quality dimension of LC-MS, and is realized through two-dimensional feature chain detection, local noise and baseline estimation, and feature resolution. . The present invention is realized through the following steps:

1.质谱数据采集:中药复杂样品首先经色谱单元进行分离,然后质谱仪在一定的采样频率(f)下,顺序以全扫描模式分析色谱洗脱的流份,采集的数据以centroid(棒状图)格式存储(这是现有质谱仪都支持的格式)。每一时间点(1/f的整数倍)采集的数据为一张质谱图,对应于质谱维的数据;不同的时间点采集的数据构成色谱维信息,比如每一时间点采集的质谱图中所有离子的强度相加,得到每一时间点的响应强度,那么所有时间点的响应强度就构成了总离子流色谱图。本发明中色谱包括液相色谱(HPLC)和超高压液相色谱(UPLC);质谱包括能进行高分辨和低分辩全扫描,并通过大气压电离源与前述色谱联用的质谱仪,如单重四级杆质谱,三重四级杆质谱,离子阱质谱或飞行时间质谱;1. Mass spectrometry data collection: The complex samples of traditional Chinese medicine are firstly separated by the chromatographic unit, and then the mass spectrometer analyzes the fractions eluted by the chromatogram in a full-scan mode at a certain sampling frequency (f), and the collected data are represented by centroid (stick diagram ) format storage (this is the format supported by existing mass spectrometers). The data collected at each time point (integer multiple of 1/f) is a mass spectrum, corresponding to the data of the mass spectrum dimension; the data collected at different time points constitutes the chromatographic dimension information, such as the mass spectrum collected at each time point The intensities of all ions are added to obtain the response intensity at each time point, then the response intensities at all time points constitute the total ion current chromatogram. Chromatography in the present invention includes liquid chromatography (HPLC) and ultra-high pressure liquid chromatography (UPLC); Quadrupole mass spectrometry, triple quadrupole mass spectrometry, ion trap mass spectrometry or time-of-flight mass spectrometry;

2.二维特征链检测:BNN(minWidth,CC)2. Two-dimensional feature chain detection: BNN (minWidth, CC)

质谱仪每采集到一个时间点的质谱图,即传给BNN模块进行分析。首先质谱图中的质荷比和强度信息,分别赋值给质荷比数组MZ和强度数组INTEN,然后依时间顺序用双向最近邻算法检测含有化合物信息的二维特征链,检测到的二维特征链存储在CC中,可被其他模块随时获取;Every time the mass spectrometer collects a mass spectrum at a time point, it is sent to the BNN module for analysis. First, the mass-to-charge ratio and intensity information in the mass spectrogram are respectively assigned to the mass-to-charge ratio array MZ and the intensity array INTEN, and then the two-dimensional feature chain containing compound information is detected by the bidirectional nearest neighbor algorithm in time order, and the detected two-dimensional features The chain is stored in CC and can be obtained by other modules at any time;

3.局部噪音和局部基线估计:De_Noise_Baseline(minWidth)3. Local noise and local baseline estimation: De_Noise_Baseline(minWidth)

随着采集数据的增多,若CC中的某个二维特征链CCk的长度Nk大于minWidth,则可对其进行噪音和基线的估计。二维特征链包含色谱维和质谱维双重信息,分别由时间与MZ和INTEN构成。将二维特征链的响应强度信息与高通滤波器进行线性卷积,并应用3倍总体标准差过滤掉脉冲信号,即为色谱维的噪音估计。为了估算色谱维中的基线,本发明依据二维特征链质谱维中组分区与零组分区质量波动的差异,设计以下算法:With the increase of collected data, if the length N k of a certain two-dimensional feature chain CC k in CC is greater than minWidth, noise and baseline can be estimated for it. The two-dimensional feature chain contains dual information of chromatography dimension and mass spectrum dimension, which are composed of time, MZ and INTEN respectively. The response intensity information of the two-dimensional feature chain is linearly convolved with the high-pass filter, and the pulse signal is filtered out by applying 3 times the population standard deviation, which is the noise estimation of the chromatographic dimension. In order to estimate the baseline in the chromatographic dimension, the present invention designs the following algorithm based on the difference in mass fluctuation between the group partition and the zero group partition in the two-dimensional characteristic chain mass spectrum dimension:

(1)在二维特征链CCk中找到强度最大的时间点,然后计算其临近区域的平均质量波动(相邻质荷比的差值)mzMin;(1) Find the time point with the highest intensity in the two-dimensional feature chain CC k , and then calculate the average mass fluctuation (the difference between adjacent mass-to-charge ratios) mzMin of its adjacent area;

(2)以5倍mzMin为阈值,找到所有质量波动大于此阈值的位置,将这些位置和CCk的第一个点定义为关键点;(2) With 5 times mzMin as the threshold, find all positions where the quality fluctuation is greater than this threshold, and define these positions and the first point of CC k as key points;

(3)这些关键点也对应于色谱维上的关键点,在色谱维上,将这些关键点用直线连接,即为基线B(x)的估计,若最后一个关键点不是CCk的最后一点,则该关键点水平延伸到最后的线即为对应区域的基线估计。(3) These key points also correspond to key points on the chromatographic dimension. On the chromatographic dimension, connecting these key points with a straight line is the estimation of the baseline B(x). If the last key point is not the last point of CC k , then the line extending horizontally from the key point to the end is the baseline estimate of the corresponding area.

4.特征分辨:FeatureReslove(minWidth,minSN,feature_list)4. Feature resolution: FeatureReslove(minWidth, minSN, feature_list)

当二维特征链CCk的局部噪音和基线估计完成后(指当前时间,特征链在后继的时间可能还会延长,相应的噪音和基线会被重新估算),即可进行特征分辨。由于特征检测的实时性,一般当时只有部分特征被洗脱,特征分辨的目的即判断当前时间点处于色谱峰(特征)洗脱的什么位置:起点、终点等。从原始信号强度中减去噪音ε(x)和基线B(x)(x为时间点),得到近似的真实信号估计NS(x)。若是首次对CCk进行特征分辨,则需初始化特征检测状态s=0,具体算法参看实施例1。检测到的特征保存在feature_list(特征列表)中,After the local noise and baseline estimation of the two-dimensional feature chain CC k is completed (referring to the current time, the feature chain may be extended in the subsequent time, and the corresponding noise and baseline will be re-estimated), feature discrimination can be performed. Due to the real-time nature of feature detection, generally only part of the features are eluted at that time. The purpose of feature resolution is to determine where the current time point is at the elution position of the chromatographic peak (feature): starting point, end point, etc. Subtracting the noise ε(x) and the baseline B(x) (where x is the time point) from the raw signal strength yields an approximate true signal estimate NS(x). If the feature discrimination is performed on CC k for the first time, the feature detection state s=0 needs to be initialized. For the specific algorithm, refer to Embodiment 1. The detected features are saved in feature_list (feature list),

定义CCk中任意一点的信噪比为:Define the signal-to-noise ratio at any point in CC k as:

SNSN (( xx )) == Ff (( xx )) -- BB (( xx )) -- ϵϵ (( xx )) LSDLSD

其中LSD为位置x附近的标准差,CCk中的最后一个点即为当前采集的数据点,计算其信噪比SN。Among them, LSD is the standard deviation around position x, the last point in CC k is the data point collected currently, and its signal-to-noise ratio SN is calculated.

5.以上四步,每采集一张质谱图,即为一个运算周期;每周期仅对可进入二维特征链的数据进行运算,其他信号被认为是噪音;每一时间点被处理的最大二维特征链数为上一张质谱图中所有离子的个数,而实际大部分情况下远远小于这个数值,这也是算法运算快的原因之一。当所有质谱数据采集完毕时,特征检测也相应结束,从而实现了特征的实时检测。5. In the above four steps, each acquisition of a mass spectrum is a calculation cycle; each cycle only operates on the data that can enter the two-dimensional feature chain, and other signals are considered as noise; the maximum two-dimensional data processed at each time point The number of dimensional characteristic chains is the number of all ions in the previous mass spectrum, but in most cases it is far smaller than this value, which is one of the reasons for the fast operation of the algorithm. When all the mass spectrometry data are collected, the feature detection also ends accordingly, thereby realizing the real-time detection of features.

本发明优点如下:Advantage of the present invention is as follows:

(1)二维特征链契合了色谱质谱联用数据的分布特征,一般一个数据集中所有二维特征链的数据量仅占总数据量的一小部分(<1%),从本质上提高了特征检测算法的效率;(1) The two-dimensional feature chain fits the distribution characteristics of the chromatography-mass spectrometry data. Generally, the data volume of all two-dimensional feature chains in a data set only accounts for a small part (<1%) of the total data volume, which essentially improves the Efficiency of feature detection algorithms;

(2)本发明设计的三点高通滤波器,能准确估算色谱信号中的随机噪音,具有方差不变的特性;(2) The three-point high-pass filter designed by the present invention can accurately estimate the random noise in the chromatographic signal, and has the characteristic of constant variance;

(3)本发明的基线估计方法利用了质谱维中的质量波动信息,克服了单纯从色谱维信息难以准确估算基线的缺点;(3) The baseline estimation method of the present invention utilizes the mass fluctuation information in the mass spectrum dimension, and overcomes the shortcoming that it is difficult to accurately estimate the baseline simply from the chromatographic dimension information;

(4)本发明设计的算法,参数少且优化简单,具有实际物理意义,一套参数可适用于不同复杂度的样本;(4) The algorithm designed by the present invention has few parameters and simple optimization, which has practical physical meaning, and a set of parameters can be applied to samples of different complexity;

(5)本发明设计的算法,实现了样品采集与特征提取的同步进行,特别适于数字化中药组分库大量样品的分析。(5) The algorithm designed in the present invention realizes the synchronization of sample collection and feature extraction, and is especially suitable for the analysis of a large number of samples in a digitized Chinese medicine component library.

附图说明 Description of drawings

图1是LC-MS实时特征提取示意图。Figure 1 is a schematic diagram of LC-MS real-time feature extraction.

图2是含有高斯白噪音和不同采样频率(d)的模拟信号(A)以及应用高通滤波器以后的信号(绿线)与原始高斯白噪音(蓝线)的叠加图,其中虚线为3倍标准差位置(B)。Figure 2 is the overlay of the analog signal (A) containing Gaussian white noise and different sampling frequencies (d), the signal after applying the high-pass filter (green line) and the original Gaussian white noise (blue line), where the dotted line is 3 times Standard deviation location (B).

图3是比较本发明与Savitzky-Golay平滑算法对噪音的估计:A图为采样率从1到20时,不同的算法估算值与实际噪音的标准差比较;B图和C图为取样率在5和15时,噪音水平从1%到10%时,不同算法的比较;蓝线为理论噪音标准差,绿线为本发明估算的标准差,红线为Savitzky-Golay估算的标准差。Fig. 3 compares the estimation of the noise with the Savitzky-Golay smoothing algorithm of the present invention: when the A figure is that the sampling rate is from 1 to 20, the standard deviation of different algorithm estimates and the actual noise is compared; the B figure and the C figure are the sampling rate at 5 and 15, when the noise level is from 1% to 10%, the comparison of different algorithms; the blue line is the standard deviation of theoretical noise, the green line is the standard deviation estimated by the present invention, and the red line is the standard deviation estimated by Savitzky-Golay.

图4是一个二维特征链实例,来自胃复春片:A图为二维特征链的时间维,B图为其质量维,C图为质量波动与时间的关系(虚线为5倍mzMin);红色星号为关键点位置,基线用绿线连接。Figure 4 is an example of a two-dimensional feature chain from Weifuchun Tablets: Picture A is the time dimension of the two-dimensional feature chain, picture B is its quality dimension, and picture C is the relationship between quality fluctuation and time (the dotted line is 5 times mzMin) ; The red asterisks are key points, and the baselines are connected by green lines.

图5是胃复春片中柚皮芸香苷和柚皮素的特征检测:A图为柚皮芸香苷和柚皮素的准分子离子及其同位素峰的选择离子色谱图;B图为柚皮芸香苷和柚皮素的二维特征区域,棕色的线为二维特征链,其中检测到的“特征”用绿色方框指示,顶点用红色星号指示。Figure 5 is the characteristic detection of naringenin and naringenin in Weifuchun Tablets: Figure A is the selected ion chromatogram of the quasi-molecular ions and isotope peaks of naringenin and naringenin; Figure B is the selected ion chromatogram of naringenin The two-dimensional feature regions of rutin and naringenin, the brown lines are two-dimensional feature chains, the detected "signatures" are indicated by green boxes, and the vertices are indicated by red asterisks.

图6是胃复春片的总离子流色谱图(A图),由检测到的特征重构的色谱图(B图)以及残留信号和噪音重构的色谱图(C图)。Figure 6 is the total ion current chromatogram of Weifuchun Tablets (Panel A), the chromatogram reconstructed from the detected features (Panel B) and the chromatogram reconstructed from residual signal and noise (Panel C).

图7是双丹颗粒的总离子流色谱图。Figure 7 is a total ion chromatogram of Shuangdan granules.

图8是双丹颗粒中丹酚酸E、丹酚酸B及一未知化合物(m/z 719)特征检测:A图为m/z 719的选择离子色谱图,B图为m/z 718的选择离子色谱图,C图为m/z 717的选择离子色谱图,D图为丹酚酸E和丹酚酸B的二维特征区域,棕色的线为二维特征链,其中检测到的“特征”用绿色方框指示,顶点用红色星号指示。Figure 8 is the characteristic detection of salvianolic acid E, salvianolic acid B and an unknown compound (m/z 719) in Shuangdan Granules: A is the selected ion chromatogram of m/z 719, and B is the selected ion chromatogram of m/z 718 Selected ion chromatogram, picture C is the selected ion chromatogram of m/z 717, picture D is the two-dimensional characteristic area of salvianolic acid E and salvianolic acid B, the brown line is the two-dimensional characteristic chain, and the detected " Features" are indicated by green boxes and vertices are indicated by red asterisks.

图9是灯盏细辛注射液的总离子流色谱图(A图),基峰(base-peak)色谱Fig. 9 is the total ion current chromatogram (Figure A) of Erigeron breviscapus injection, the base peak (base-peak) chromatogram

图(B图)以及由检测到的特征重构的色谱图(C图)。Figure (Panel B) and the chromatogram reconstructed from the detected features (Panel C).

具体实施方式Detailed ways

本发明结合附图和实施例作进一步的说明。The present invention will be further described in conjunction with drawings and embodiments.

实施例1本发明的一种用于中药复杂成分分析的实时特征提取方法Embodiment 1 A kind of real-time feature extraction method for complex component analysis of traditional Chinese medicine of the present invention

1.通讯模块:MS_Communication(acq_mode,cur_ms_data)1. Communication module: MS_Communication (acq_mode, cur_ms_data)

该函数负责与质谱进行通讯,若采集模式(acq_mode)为profile,当从质谱获取到当前数据后,则将其用分水岭算法转化为centroid格式后,通过cur_ms_data参数返回;若采集模式为centroid,则直接返回数据。参数cur_ms_data为包含质荷比及其对应强度的二维数据。This function is responsible for communicating with the mass spectrometer. If the acquisition mode (acq_mode) is profile, when the current data is obtained from the mass spectrometer, it will be converted into centroid format using the watershed algorithm, and returned through the cur_ms_data parameter; if the acquisition mode is centroid, then Return the data directly. The parameter cur_ms_data is two-dimensional data containing the mass-to-charge ratio and its corresponding intensity.

2.二维特征链检测:BNN(minWidth,CC)2. Two-dimensional feature chain detection: BNN (minWidth, CC)

在BNN模块中通过调用MS_Communication,可以得到当前采集的质谱数据,赋值给质荷比数组MZ和强度数组INTEN。顺序采集到的数据,用双向最近邻算法(Bilateral Nearest Neighbor,BNN)检测二维特征链。BNN算法的原理是:依次取当前质谱图中的一个离子MZi,j(i为扫描数scan_number,相当于当前采集的第i张质谱图;j为MZi中的第j个离子),然后在上一时间点采集的质谱图中寻找与其质量最接近的离子MZi-1,J;若在当前质谱图中与MZi-1,J最接近的离子也为MZi,j,则连接MZi,j与MZi-1,J。随着采集质谱数据的增多,有的二维特征链会延长,有的会中断,只有长度len(CCk)大于minWidth的二维特征链才会被认为其中可能含有真实信号,并被存储在CC中,否则,被认为是噪音。CC为全局变量,可被其他模块访问。In the BNN module, by calling MS_Communication, the currently collected mass spectrum data can be obtained and assigned to the mass-to-charge ratio array MZ and the intensity array INTEN. For the data collected in sequence, use Bilateral Nearest Neighbor (BNN) to detect two-dimensional feature chains. The principle of the BNN algorithm is: sequentially take an ion MZ i, j in the current mass spectrum (i is the scan number scan_number, which is equivalent to the i-th mass spectrum currently collected; j is the jth ion in MZ i ), and then Find the ion MZ i-1, J that is closest to its mass in the mass spectrum collected at the previous time point; if the ion closest to MZ i-1, J in the current mass spectrum is also MZ i, j , then connect MZ i,j and MZ i-1,J . With the increase of collected mass spectrometry data, some two-dimensional characteristic chains will be extended and some will be interrupted. Only the two-dimensional characteristic chains whose length len(CC k ) is greater than minWidth will be considered as possibly containing real signals and stored in CC, otherwise, is considered noise. CC is a global variable that can be accessed by other modules.

3.局部噪音和局部基线估计:De_Noise_Baseline(minWidth)3. Local noise and local baseline estimation: De_Noise_Baseline(minWidth)

当某个二维特征链CCk(k为已检测到的特征链的序号)的长度大于minWidth时,即可以开始估算局部的噪音和基线。二维特征链的时间维相当于一张色谱图,一般认为由真实信号、高斯白噪音和基线构成(F(x)=B(x)+NS(x)+ε(x))。其中高斯白噪音ε(x)用原始信号与三点高通滤波器进行线性卷积估计:When the length of a certain two-dimensional feature chain CC k (k is the serial number of the detected feature chain) is greater than minWidth, the local noise and baseline can be estimated. The time dimension of a two-dimensional feature chain is equivalent to a chromatogram, which is generally considered to be composed of real signal, Gaussian white noise and baseline (F(x)=B(x)+NS(x)+ε(x)). Where Gaussian white noise ε(x) is estimated by linear convolution with the original signal and a three-point high-pass filter:

&epsiv;&epsiv; (( xx )) == Ff (( xx )) &CircleTimes;&CircleTimes; ff

ff == [[ -- 11 // 66 ,, 22 // 66 ,, -- 11 // 66 ]]

由图2中的模拟信号可知,当真实信号的采样率小于5时,色谱峰区域会残留部分信号,导致高估此区域的噪音水平。残留的信号具有脉冲噪音特性,并且强度比整体的标准差大的多,故用3倍整体标准差作为阈值,将大于此阈值的信号置零。经上面卷积和阈值操作以后的向量即为高斯白噪音的估计,它可准确反应真实白噪音的局部方差,如图2所示。通过比较不同的采样率和不同的噪音水平,本发明的噪音估算方法与实际值非常接近,优于常用的平滑滤波方法,相应的结果见图3。From the simulated signal in Figure 2, it can be seen that when the sampling rate of the real signal is less than 5, some signals will remain in the chromatographic peak area, resulting in overestimation of the noise level in this area. The residual signal has the characteristics of impulsive noise, and its intensity is much larger than the overall standard deviation, so 3 times the overall standard deviation is used as the threshold, and the signal greater than this threshold is set to zero. The vector after the above convolution and threshold operation is the estimate of Gaussian white noise, which can accurately reflect the local variance of real white noise, as shown in Figure 2. By comparing different sampling rates and different noise levels, the noise estimation method of the present invention is very close to the actual value, which is better than the commonly used smoothing filtering method, and the corresponding results are shown in FIG. 3 .

二维特征链的质量维反映了相应的质量波动(图4),当化合物洗脱时,也即检测到真实信号时,相应的质量波动趋向于一个极小值mzMin(此值与质谱仪的质量精密度有关),而在没有真实信号的区域,质量波动成随机特征,远远大于mzMin;同时,质量波动最小的区域也是响应强度最大的区域。具体基线估计方法如下:The mass dimension of the two-dimensional characteristic chain reflects the corresponding mass fluctuation (Figure 4). When the compound is eluted, that is, when the real signal is detected, the corresponding mass fluctuation tends to a minimum value mzMin (this value is the same as that of the mass spectrometer mass precision), and in the area where there is no real signal, the mass fluctuation becomes a random feature, which is much larger than mzMin; at the same time, the area with the smallest mass fluctuation is also the area with the largest response intensity. The specific baseline estimation method is as follows:

(1)在CCk中找到强度最大的位置,然后其对应位置附近(本发明中的“附近”意义为以指定位置为中心,宽度为minWidth的区域,或指定位置前面宽度为minWidth的区域)的质量质量波动为mzMin;(1) Find the position with the greatest intensity in CC k , and then the vicinity of its corresponding position ("near" in the present invention means the area with the specified position as the center and a width of minWidth, or the area with a width of minWidth in front of the specified position) The mass quality fluctuation of is mzMin;

(2)以5倍mzMin为阈值(图4C),找到所有质量波动大于此阈值的位置,将这些位置和CCk的第一个点定义为关键点;(2) With 5 times mzMin as the threshold (Figure 4C), find all positions where the mass fluctuation is greater than this threshold, and define these positions and the first point of CC k as key points;

(3)这些关键点也对应于色谱维上的关键点,在色谱维上,将这些关键点用直线连接,即为基线B(x)的估计(图4A)。4.时间维上的特征分辨:FeatureReslove(minWidth,minSN,feature_list)(3) These key points also correspond to the key points on the chromatographic dimension. On the chromatographic dimension, connecting these key points with a straight line is the estimation of the baseline B(x) (Fig. 4A). 4. Feature resolution on the time dimension: FeatureReslove(minWidth, minSN, feature_list)

从原始信号中减去第3步估算的ε(x)和B(x),得到近似的真实信号估计NS(x),其中仍含有一些不规则基线波动残留的成分。定义CCk中任意一点的信噪比为:Subtract the ε(x) and B(x) estimated in step 3 from the original signal to obtain an approximate true signal estimate NS(x), which still contains some residual components of irregular baseline fluctuations. Define the signal-to-noise ratio at any point in CC k as:

SNSN (( xx )) == Ff (( xx )) -- BB (( xx )) -- &epsiv;&epsiv; (( xx )) LSDLSD

其中LSD为位置x附近的标准差,CCk中的最后一个点即为当前采集的数据点,计算其信噪比SN。用线性最小二乘法拟合CCk的最后minWidth个点,定义其斜率slope为最后一点的斜率,然后进行如下判断(若是首次对CCk进行特征分辨,则需初始化特征检测状态s=0):Among them, LSD is the standard deviation around position x, the last point in CC k is the data point collected currently, and its signal-to-noise ratio SN is calculated. Use the linear least squares method to fit the last minWidth points of CC k , define its slope slope as the slope of the last point, and then make the following judgment (if it is the first time to perform feature resolution on CC k , you need to initialize the feature detection state s=0):

(1)若slope*minWidth>minSN,且s=0,则此处为一个“特征”的开始,记录在feature_list(特征列表)中;(1) If slope*minWidth>minSN, and s=0, then this is the beginning of a "feature", recorded in feature_list (feature list);

(2)若slope<0,则置s=1;(2) If slope<0, set s=1;

(3)若slope*minWidth>-minSN,且s=1,则此处为一个“特征”的结束,记录在feature_list中,并置s=0。(3) If slope*minWidth>-minSN, and s=1, then this is the end of a "feature", recorded in feature_list, and set s=0.

5.本发明算法具有实时特性,质谱采集的数据,立即被BNN等模块进行分析,特征起点的检测最大可能被延迟minWidth/f(约几秒钟),而一般从色谱柱洗脱的色谱峰都比这个时间长的多,并不影响特征的检测。本发明用VC++6.0实现系统原型,用户只需提供minWidth和minSN两个具有实际物理意义的参数。5. The algorithm of the present invention has real-time characteristics, and the data collected by mass spectrometry is analyzed immediately by modules such as BNN, and the detection of the characteristic starting point may be delayed by minWidth/f (about a few seconds), while the chromatographic peaks eluted from the chromatographic column are generally It is much longer than this time and does not affect the detection of features. The present invention uses VC++6.0 to realize the system prototype, and the user only needs to provide two parameters with actual physical meanings, minWidth and minSN.

实施例2胃复春片中复杂成分分析Example 2 Analysis of Complex Components in Weifuchun Tablets

A.制备胃复春片总提物A. Preparation of total extract of Weifuchun Tablets

取胃复春片20片,除去薄膜衣,研成细粉。精密称取0.5g置于50mL具塞锥形瓶中,准确加入甲醇10mL,超声提取45分钟。提取结束后将锥形瓶取出,冷却后用甲醇溶液补足重量。提取液摇匀后以12000rpm转速离心15min,上清液经0.45μm滤膜滤过后供HPLC分析。Take 20 Weifuchun Tablets, remove the film coating, and grind into fine powder. Accurately weigh 0.5g and place it in a 50mL Erlenmeyer flask with a stopper, accurately add 10mL of methanol, and conduct ultrasonic extraction for 45 minutes. After the extraction, take out the Erlenmeyer flask, and make up the weight with methanol solution after cooling. The extract was shaken and centrifuged at 12000rpm for 15min, and the supernatant was filtered through a 0.45μm filter membrane for HPLC analysis.

B.LC-MS分析的色谱和质谱条件B. Chromatographic and mass spectrometric conditions for LC-MS analysis

液相为Agilent1100型高效液相色谱仪(美国Agilent公司),配二元梯度泵、DAD紫外检测器、柱温箱、自动进样器。色谱柱:ZORBAX SB-C18色谱柱(4.6mm×250mm,5μm,Agilent),前置Agilent C18预柱。流动相:A相:0.05%甲酸水;B相:乙腈。线性洗脱梯度(min/%B):0/5,15/20,30/20,55/30,75/50,90/95。流速:0.5mL/min;柱温:30℃;进样量为10μL。质谱为Finnigan LCQ-DECA XP Plus离子阱质谱仪(美国Thermo公司),配电喷雾离子源及Xcalibur1.3控制系统,采用ESI负离子模式检测。扫描范围:100-1500Da;喷雾电压:4.5kV;鞘气和辅助气为氮气,分别为30和10单位。The liquid phase is an Agilent1100 high performance liquid chromatograph (Agilent, USA), equipped with a binary gradient pump, a DAD ultraviolet detector, a column thermostat, and an autosampler. Chromatographic column: ZORBAX SB-C 18 chromatographic column (4.6mm×250mm, 5μm, Agilent), front Agilent C 18 pre-column. Mobile phase: A phase: 0.05% formic acid in water; B phase: acetonitrile. Linear elution gradient (min/%B): 0/5, 15/20, 30/20, 55/30, 75/50, 90/95. Flow rate: 0.5mL/min; column temperature: 30°C; injection volume: 10μL. The mass spectrometer was a Finnigan LCQ-DECA XP Plus ion trap mass spectrometer (Thermo, USA), equipped with an electrospray ion source and Xcalibur1.3 control system, and was detected by ESI negative ion mode. Scanning range: 100-1500Da; spray voltage: 4.5kV; sheath gas and auxiliary gas are nitrogen, 30 and 10 units respectively.

C.特征检测参数,最小峰宽(minWidth)为9,最小信噪比(minSN)为4。C. Feature detection parameters, the minimum peak width (minWidth) is 9, and the minimum signal-to-noise ratio (minSN) is 4.

D.特征检测结果:在90分钟的分析时间内,总共检测到1827个特征,其所占方差为总方差的96.1%。在图5柚皮芸香苷(tR=38min)和柚皮素(tR=42.3min)的特征区域,可见本发明的二维特征链涵盖了所有可能存在化合物特征的区域,不仅强度高的准分子离子[M-H]-(m/z 579)可正确检测,连丰度极低的同位素峰[M-H+3]-(m/z 582)也可正确检测,说明本方法的检测灵敏度很高。D. Feature detection results: In the analysis time of 90 minutes, a total of 1827 features were detected, which accounted for 96.1% of the total variance. In the characteristic regions of naringenin rutin (t R = 38min) and naringenin (t R = 42.3min) in Figure 5, it can be seen that the two-dimensional characteristic chain of the present invention covers all possible regions of compound characteristics, not only those with high intensity The quasi-molecular ion [MH] - (m/z 579) can be detected correctly, and even the isotope peak [M-H+3] - (m/z 582) with extremely low abundance can also be detected correctly, which shows the detection sensitivity of this method very high.

为了比较直观的评价本发明的特征检测效果,将所有检测到的特征重构成时间维的色谱图,与所有信号构成的总离子流色谱图进行比较,同时非特征区域的信号构成的色谱图为噪音或残留色谱图,如图6所示。从图6中可知,几乎所有的真实信号都被正确检测,而在剩余的残留色谱图中没有明显的特征信号。In order to more intuitively evaluate the feature detection effect of the present invention, all detected features are reconstructed into time-dimensional chromatograms, and compared with the total ion current chromatograms composed of all signals, and the chromatograms composed of signals in non-characteristic regions are Noise or residual chromatograms, as shown in Figure 6. From Figure 6, it can be seen that almost all real signals are detected correctly, while there are no obvious characteristic signals in the remaining residual chromatograms.

实施例3双丹颗粒复杂成分分析Example 3 Shuangdan Granule Complex Component Analysis

A.制备双丹颗粒样品A. Preparation of Shuangdan Granule Samples

精密称取0.05g研细以后的双丹颗粒(山东孔圣堂制药有限公司,批号:040201,031001),加娃哈哈纯净水1mL,超声提取20min,然后10000rpm离心10min,取上清液0.5mL,用甲醇-水-甲酸(50∶50∶1)稀释1倍。Precisely weigh 0.05 g of Shuangdan granules after fine grinding (Shandong Kongshengtang Pharmaceutical Co., Ltd., batch number: 040201, 031001), add Wahaha purified water 1mL, ultrasonically extract for 20min, then centrifuge at 10000rpm for 10min, take 0.5mL of the supernatant, wash with methanol - Dilute 1-fold with water-formic acid (50:50:1).

B.LC-MS分析的色谱和质谱条件B. Chromatographic and mass spectrometric conditions for LC-MS analysis

Agilent 1100型液相色谱系统,包括二元高压泵,自动进样器,柱温箱和DAD检测器。色谱柱:Agilent SB-C18(2.1×250mm,3.5m)。流动相:0.1%甲酸乙腈(A)-0.1%甲酸水(B),A相在0~5min从10%线性升到20%,5~7min线性升到40%,7~20min线性升到95%;流速0.3mL/min,柱温35℃。所有分析样品均进样10L。Agilent 1100 liquid chromatography system, including binary high-pressure pump, automatic sampler, column oven and DAD detector. Chromatographic column: Agilent SB-C18 (2.1×250mm, 3.5m). Mobile phase: 0.1% formic acid acetonitrile (A)-0.1% formic acid water (B), phase A rises linearly from 10% to 20% in 0-5min, 40% in 5-7min, and 95% in 7-20min %; flow rate 0.3mL/min, column temperature 35°C. All analyzed samples were injected in 10 L.

Finnigan离子阱质谱仪(LCQ Deca XP plus,CA),配有ESI电离源;负离子检测,鞘气和辅助气均为N2,流量分别为30和10arb,喷雾电压4.5kV,源内裂解电压15V,加热毛细管温度350℃,扫描方式为一级全扫描,扫描范围100-800Da。Finnigan ion trap mass spectrometer (LCQ Deca XP plus, CA), equipped with ESI ionization source; negative ion detection, sheath gas and auxiliary gas are both N 2 , flow rates are 30 and 10arb, spray voltage 4.5kV, source fragmentation voltage 15V, The temperature of the heating capillary is 350°C, the scanning method is a full-level scan, and the scanning range is 100-800Da.

C.特征检测参数C. Feature Detection Parameters

最小峰宽minWidth=9,最小信噪比minSN=4。Minimum peak width minWidth=9, minimum signal-to-noise ratio minSN=4.

D.特征检测结果D. Feature detection results

实施例2中的样本经90分钟的梯度洗脱后,主要成分得到了良好的分离,在这种情况下,特征检测相对容易;而在本实施例中,双丹颗粒的样品经一个20分钟的快速梯度洗脱,人为的将多个成分的特征压缩在一起,大大增加了特征检测的难度,以此来考察算法在极端条件下的应用情况。从图7可以看出,双丹颗粒中的主要成分堆积在保留时间10至13分钟的区域。应用与实施例2相同的检测参数,即可得到良好的特征检测结果,共检测到510个特征,占所有信号方差的98.5%。下面举例说明,本发明对复杂体系中不完全分离成分的检测情况。After the sample in Example 2 was eluted with a gradient of 90 minutes, the main components were well separated. In this case, the feature detection is relatively easy; The rapid gradient elution of the algorithm artificially compresses the features of multiple components together, which greatly increases the difficulty of feature detection, so as to examine the application of the algorithm under extreme conditions. It can be seen from Figure 7 that the main components in Shuangdan granules accumulate in the area of retention time of 10 to 13 minutes. By applying the same detection parameters as in Example 2, a good feature detection result can be obtained. A total of 510 features were detected, accounting for 98.5% of the variance of all signals. The following examples illustrate how the present invention detects incompletely separated components in complex systems.

当复杂体系中的化合物质荷比不同时,即便保留时间相同,它们在LC/MS的二维投影面上,仍是不同的特征,可被本发明正确检测,与成分被完全分离的结果一样;若不同化合物的质荷比相同时,就会出现多个特征重叠的现象。图8为丹酚酸B(11.3min)和丹酚酸E(10.9min)的准分子离子m/z 717,及其同位素离子m/z 718,719的特征区域。由图8A可见,一个未知成分m/z 719插到了丹酚酸B与丹酚酸E的同位素离子之间,使3个特征部分重叠在一起。本发明仍可正确分辨这类重叠的特征,它们被分辨为3个不同的特征。另外,丹酚酸B的峰形严重拖尾,信号波动较大,在其峰顶点到完全洗脱之间,出现很多毛刺类的伪峰,用质谱工作站自带的峰检测算法(Avalon)分析时,丹酚酸B的色谱峰被分成7个峰,而本发明的算法,仅用两个参数,即可正确检测这些特征。When the mass-to-charge ratios of compounds in a complex system are different, even if the retention time is the same, they still have different features on the two-dimensional projection plane of LC/MS, which can be correctly detected by the present invention, which is the same as the result that the components are completely separated ; If the mass-to-charge ratios of different compounds are the same, multiple features will overlap. Fig. 8 is the quasi-molecular ion m/z 717 of salvianolic acid B (11.3min) and salvianolic acid E (10.9min), and the characteristic region of isotopic ion m/z 718,719 thereof. It can be seen from Figure 8A that an unknown component m/z 719 is inserted between the isotope ions of salvianolic acid B and salvianolic acid E, making the three features overlap together. The present invention can still correctly resolve such overlapping features, which are resolved into 3 distinct features. In addition, the peak shape of salvianolic acid B is seriously tailed, and the signal fluctuates greatly. Between the peak apex and the complete elution, there are many spurious peaks, which are analyzed by the peak detection algorithm (Avalon) that comes with the mass spectrometry workstation. , the chromatographic peak of salvianolic acid B is divided into 7 peaks, and the algorithm of the present invention can correctly detect these features with only two parameters.

实施例4灯盏细辛注射液复杂成分分析Example 4 Analysis of Complex Components of Erigeron Injection

A.分析样品制备A. Analytical sample preparation

精密吸取灯盏细辛注射液0.5ml,上样于经活化(甲醇1ml,1%甲酸水1ml活化)的Waters OASIS HLB固相小柱上,用0.5ml 1%甲酸水洗,弃去洗液,加0.5ml甲醇洗,收集洗脱液,备用。Precisely draw 0.5ml of Erigeron breviscapus injection, put the sample on the waters OASIS HLB solid-phase column that has been activated (1ml of methanol, 1ml of 1% formic acid water), wash with 0.5ml of 1% formic acid, discard the washing solution, add Wash with 0.5ml methanol, collect the eluate, and set aside.

B.LC-MS分析的色谱和质谱条件B. Chromatographic and mass spectrometric conditions for LC-MS analysis

Agilent 1100型液相色谱系统,包括二元高压泵,自动进样器,柱温箱和DAD检测器。色谱柱:YMC-C18250mm×4.6mm,5m;流动相:A相:0.1%甲酸水;B相:0.1%甲酸乙腈,线性洗脱梯度为:0min:10%B;20min:17.5%B;40min:17.5%B;80min:45%B;90min:45%B。分流比:1∶3。柱温:35℃。进样量:10L。Agilent 1100 liquid chromatography system, including binary high pressure pump, automatic sampler, column thermostat and DAD detector. Chromatographic column: YMC-C 18 250mm×4.6mm, 5m; mobile phase: phase A: 0.1% formic acid water; phase B: 0.1% formic acid acetonitrile, linear elution gradient: 0min: 10%B; 20min: 17.5%B ; 40min: 17.5%B; 80min: 45%B; 90min: 45%B. Split ratio: 1:3. Column temperature: 35°C. Injection volume: 10L.

质谱为Finnigan LCQ-DECA XP Plus离子阱质谱仪(美国Thermo公司),配电喷雾离子源及Xcalibur1.3控制系统,采用ESI负离子模式检测。ESI源电压:4.5kV;鞘气(N2)流速:30arb;辅助气(N2)流速:10arb;毛细管温度:350℃;毛细管电压:-15V(-),19V(+);采用全离子扫描方式,扫描范围m/z:100~800。The mass spectrometer was a Finnigan LCQ-DECA XP Plus ion trap mass spectrometer (Thermo, USA), equipped with an electrospray ion source and Xcalibur1.3 control system, and was detected by ESI negative ion mode. ESI source voltage: 4.5kV; sheath gas (N 2 ) flow rate: 30arb; auxiliary gas (N 2 ) flow rate: 10arb; capillary temperature: 350°C; capillary voltage: -15V(-), 19V(+); Scanning mode, scanning range m/z: 100~800.

C.特征检测参数C. Feature Detection Parameters

最小峰宽minWidth=9,最小信噪比minSN=4。Minimum peak width minWidth=9, minimum signal-to-noise ratio minSN=4.

D.特征检测结果D. Feature detection results

本实例分析的为中药注射液,其中主要为水溶性的酚酸类成分。由于流动相添加剂的缘故,产生大量高背景化学噪音,使很多强度低的信号被淹没,即便在基峰(base-peak)色谱图中仍不能看到低丰度的信号,如图9A和9B所示。应用与前面实施例2和实施例3相同的特征检测参数,共检测到571个特征,从由这些特征重构的色谱图中可以发现,已没有高背景噪音的干扰,不仅强度高的信号被正确检测,强度低的信号也显现出来。这说明本发明不仅可以滤除随机分布的白噪音,即便有明显异方差的有色噪音也可以自动滤除。This example analyzes traditional Chinese medicine injections, which mainly contain water-soluble phenolic acids. Due to the mobile phase additives, a lot of high background chemical noise is generated, so that many low-intensity signals are submerged, and even low-abundance signals cannot be seen in the base-peak (base-peak) chromatogram, as shown in Figures 9A and 9B shown. Using the same feature detection parameters as those in the previous examples 2 and 3, a total of 571 features were detected. From the chromatogram reconstructed from these features, it can be found that there is no interference of high background noise, and not only the signal with high intensity is Correctly detected, low intensity signals also appear. This shows that the present invention can not only filter out randomly distributed white noise, but also automatically filter out colored noise with obvious heteroscedasticity.

Claims (5)

1. real-time feature extraction method that is used for the Chinese medicine analysis of complex ingredient, this method detects through the two dimensional character chain based on the time dimension of LC-MS and the bidimensional characteristic information of quality dimension, and local noise estimates with baseline, and characteristic differentiates realization, and concrete steps are:
(1) mass spectrometric data collection: the Chinese medicine complex sample at first separates through the chromatogram unit; Mass spectrometer is under certain SF f then; Order is with stream part of full scan pattern analysis chromatography eluant, and the data of collection are with the bar graph format, is a mass spectrogram with the data of each time point collection of 1/f integral multiple; Corresponding to the data of mass spectrum dimension, the data that different time point is gathered constitute chromatogram dimension information;
(2) the two dimensional character chain detects: mass spectrometer whenever collects the mass spectrogram of a time point; Promptly passing to the BNN module analyzes; At first mass-to-charge ratio in the mass spectrogram and strength information are distinguished assignment and are given mass-to-charge ratio array MZ and intensity array INTEN, detect the two dimensional character chain that contains compound information with two-way nearest neighbor algorithm according to time sequencing then; Detected two dimensional character chain is stored among the CC, can be obtained at any time by other modules;
(3) local noise and local baseline are estimated: along with increasing of image data, if certain the two dimensional character chain CC among the CC kLength N kGreater than minWidth; Then it is carried out the estimation of noise and baseline; The two dimensional character chain comprises chromatogram peacekeeping mass spectrum dimension double-point information, is made up of time and MZ and INTEN respectively, and the response intensity information and the Hi-pass filter of two dimensional character chain carried out linear convolution; And use 3 times of population standard deviations and filter out pulse signal; The noise that is the chromatogram dimension estimates that actual signal, white gaussian noise and baseline constitute: F (x)=B (x)+NS (x)+ε (x), and wherein white gaussian noise ε (x) carries out the linear convolution estimation with original signal and 3 Hi-pass filters:
&epsiv; ( x ) = F ( x ) &CircleTimes; f
f = [ - 1 / 6 , 2 / 6 , - 1 / 6 ] ;
(4) characteristic is differentiated:
As two dimensional character chain CC kLocal noise and baseline estimate to accomplish after, be meant the current time, promptly carry out characteristic and differentiate, detected characteristic is kept at feature list,
Definition CC kIn arbitrarily any signal to noise ratio (S/N ratio) be:
SN ( x ) = F ( x ) - B ( x ) - &epsiv; ( x ) LSD
Wherein LSD is near the standard deviation the x of position, CC kIn last point be the data point of current collection, calculate its signal to noise ratio (S/N ratio) SN;
(5) detect in real time: in above four steps, mass spectrogram of every collection is an execution cycle, and the phase is only carried out computing to the data that can get into the two dimensional character chain weekly, and other signals are considered to noise; The maximum two dimensional character chain number that each time point is processed is the number of all ions in the last mass spectrogram, when all mass spectrometric data finishing collecting, and the also corresponding end of feature detection, thus realized the real-time detection of characteristic.
2. a kind of real-time feature extraction method that is used for the Chinese medicine analysis of complex ingredient according to claim 1; It is characterized in that; Step (3) according to the difference of component district in the two dimensional character chain mass spectrum dimension with zero component district quality fluctuation, designs following algorithm in order to estimate the baseline in the chromatogram dimension:
(a) at two dimensional character chain CC kIn find the maximum time point of intensity, calculate the average quality fluctuation of its close region then, i.e. the difference mzMin of adjacent mass-to-charge ratio;
(b) be threshold value with 5 times of mzMin, find the position of all quality fluctuations, these positions and CC greater than this threshold value kFirst point be defined as key point;
(c) these key points, connect these key points on the chromatogram dimension also corresponding to the key point on the chromatogram dimension with straight line, are the estimation of baseline B (x), if last key point is not CC kLast point, then this key point horizontal-extending line to the end baseline of being the corresponding region is estimated.
3. a kind of real-time feature extraction method that is used for the Chinese medicine analysis of complex ingredient according to claim 1; It is characterized in that; Step (4) is because the real-time of feature detection; Generally had only Partial Feature at that time by wash-out, the purpose that characteristic is differentiated judges that promptly current point in time is in the beginning or end position of chromatographic peak wash-out.
4. a kind of real-time feature extraction method that is used for the Chinese medicine analysis of complex ingredient according to claim 1 is characterized in that, the described Hi-pass filter of step (3) is made up of three data points, 3 and be that 0, three quadratic sum is 1.
5. a kind of real-time feature extraction method that is used for the Chinese medicine analysis of complex ingredient according to claim 1; It is characterized in that; Used chromatogram comprises liquid chromatography and UHV (ultra-high voltage) liquid chromatography, and mass spectrum comprises substance level Four bar mass spectrum, triple level Four bar mass spectrum, ion trap mass spectrometry and flight time mass spectrum.
CN2010100395440A 2010-01-05 2010-01-05 Real-time feature extraction method for analysis of complex ingredient of traditional Chinese medicine Expired - Fee Related CN101776671B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010100395440A CN101776671B (en) 2010-01-05 2010-01-05 Real-time feature extraction method for analysis of complex ingredient of traditional Chinese medicine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010100395440A CN101776671B (en) 2010-01-05 2010-01-05 Real-time feature extraction method for analysis of complex ingredient of traditional Chinese medicine

Publications (2)

Publication Number Publication Date
CN101776671A CN101776671A (en) 2010-07-14
CN101776671B true CN101776671B (en) 2012-06-27

Family

ID=42513183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010100395440A Expired - Fee Related CN101776671B (en) 2010-01-05 2010-01-05 Real-time feature extraction method for analysis of complex ingredient of traditional Chinese medicine

Country Status (1)

Country Link
CN (1) CN101776671B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102818868B (en) * 2012-08-27 2013-11-20 浙江大学 Screening method of active ingredients in complex natural product and application thereof
CN109697320B (en) * 2018-12-25 2023-04-11 华电智控(北京)技术有限公司 Tailing peak processing method and device
CN110806456B (en) * 2019-11-12 2022-03-15 浙江工业大学 A method for automatic analysis of untargeted metabolic profile data in UPLC-HRMS Profile mode
CN114354819B (en) * 2022-03-15 2022-07-15 四川德成动物保健品有限公司 Method and device for detecting residual components of traditional Chinese medicine extract

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101498661A (en) * 2008-01-30 2009-08-05 香港浸会大学 Infrared spectrum feature extraction method for high-precision distinguishing variety, producing area and growth mode of traditional Chinese medicinal material

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101498661A (en) * 2008-01-30 2009-08-05 香港浸会大学 Infrared spectrum feature extraction method for high-precision distinguishing variety, producing area and growth mode of traditional Chinese medicinal material

Also Published As

Publication number Publication date
CN101776671A (en) 2010-07-14

Similar Documents

Publication Publication Date Title
US5175430A (en) Time-compressed chromatography in mass spectrometry
US7982181B1 (en) Methods for identifying an apex for improved data-dependent acquisition
CN107271575B (en) Method and device for parallel analysis of ion mobility spectrometry and mass spectrometry
CA2608197C (en) Methods for improved data dependent acquisition
JP5542433B2 (en) Ion detection and parameter estimation of N-dimensional data
US6936814B2 (en) Median filter for liquid chromatography-mass spectrometry data
Holland et al. Mass spectrometry on the chromatographic time scale: realistic expectations
JP5068541B2 (en) Apparatus and method for identifying peaks in liquid chromatography / mass spectrometry data and forming spectra and chromatograms
GB2514836A (en) Isotopic Pattern Recognition
US12027358B2 (en) Mass spectrometry analysis method and mass spectrometry system
CN106290545B (en) A kind of method and device of detection trace compound
CN110579555B (en) Ion pair selection method for pseudo-targeted metabonomics analysis
CN101776671B (en) Real-time feature extraction method for analysis of complex ingredient of traditional Chinese medicine
CA2671536A1 (en) Method and apparatus for identifying the apex of a chromatographic peak
Prosek et al. On-line TLC-MS
Prošek et al. Quantification of caffeine by off-line TLC-MS
CN104833761A (en) Method for quickly analyzing carbohydrates in samples
Zhang et al. Virtual separation of phytochemical constituents by their adduct-ion patterns in full mass spectra
WO2013097059A1 (en) Method for quantification of proteome
CA2763261A1 (en) Methods for identifying an apex for improved data-dependent acquisition
Yue et al. Identification of coumarins in traditional Chinese medicine by direct-injection electrospray ionisation tandem mass spectrometry and high-performance liquid chromatography-mass spectrometry
CN112485361A (en) Chromatographic signal low-delay filtering method for gas chromatographic analyzer
Twohig et al. Improving MS/MS Sensitivity using Xevo TQ MS with ScanWave
HK1135058B (en) Ion detection and parameter estimation for n-dimensional data
Eaton et al. 5.10 Future Instrumental Development for Speciation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120627

Termination date: 20220105