CN114187968A

CN114187968A - Sterility detection method based on NGS technology

Info

Publication number: CN114187968A
Application number: CN202010969961.9A
Authority: CN
Inventors: 岳建辉; 陈超; 张曦; 李波; 孙海汐; 孙长斌; 马启旺
Original assignee: BGI Shenzhen Co Ltd
Current assignee: BGI Shenzhen Co Ltd
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2022-03-15

Abstract

本发明涉及生物信息领域，具体地，本发明涉及基于NGS技术的无菌检测方法。本发明提供的方法基于NGS技术进行无菌检测，样品无需培养，通过MDA单细胞基因组扩增技术，可对极微量的样品进行扩增测序，产生大量的测序数据，即使样品中微生物的含量很低，也能在测序数据中通过数据分析检测到，本发明可检测低至10CFU的微生物种类，检测周期只需24‑48小时，解决了传统药典规定的无菌检测方法周期长和灵敏度低的问题。The present invention relates to the field of biological information, in particular, the present invention relates to a sterility detection method based on NGS technology. The method provided by the invention is based on NGS technology for sterility detection, and the sample does not need to be cultured. Through the MDA single-cell genome amplification technology, a very small amount of sample can be amplified and sequenced, and a large amount of sequencing data can be generated, even if the content of microorganisms in the sample is very high. It can also be detected by data analysis in the sequencing data. The invention can detect microbial species as low as 10CFU, and the detection period only takes 24-48 hours, which solves the problems of long period and low sensitivity of the aseptic detection method stipulated in the traditional pharmacopoeia. question.

Description

Sterility detection method based on NGS technology

Technical Field

The invention relates to the field of biological information, in particular to a sterility detection method based on an NGS (Next Generation Sequencing) technology.

Background

Sterility testing refers to a method of testing whether drugs, medical instruments, raw materials, adjuvants, and other products suitable for sterility testing in pharmacopoeia requirements are sterile. The traditional sterility test methods mainly comprise a pharmacopoeia sterility test method, an Adenosine Triphosphate (ATP) bioluminescence method, a nucleic acid amplification method and the like, but different methods have the characteristics and the limitations (Song light, Von shock, Panying, application research of flow cytometry in sterility test of medicines, Chinese pharmacist 2012.21(2): p.342-345.). In the fields of production, circulation and supervision of medicines, medical instruments and the like, sterile detection is a necessary means for ensuring the quality of medicines and reducing adverse reactions of medicines, and has important significance for sterile detection of high-risk sterile preparations.

The sterility detection technology is of great importance to the field of medicine, in particular to the field of cell therapy of fire and heat at present. In recent years, with the rapid development of cell therapy technology, cell therapy products have gradually come to Clinical use, and based on the requirements of cell therapy, the cell therapy products generally need to be returned to patients 48 hours after the preparation is completed, and the detection period often needs 7-14 days according to the culture method specified by pharmacopoeia aiming at the sterility inspection of cell preparations, which cannot meet the current practical requirements, and new rapid sterility detection technology is urgently needed (Gu, w., s.miller, and c.y.chu, Clinical reagent Next-Generation Sequencing for Clinical protection. annu Rev route, 2019.14: p.319-338.).

The development of Next Generation Sequencing (NGS) has greatly advanced the basic research and disease treatment in the biomedical field, such as the application of NGS-based 16S/18S/Internal Transformed Spacer (ITS) Sequencing technology in bacterial species identification, and NGS-based metagenomic Sequencing technology in human intestinal microbiology research, etc. (Janda, J.M. and S.L.Abbott,16S rRNA gene Sequencing for bacterial identification in the diagnostic laboratory, utilities, and pitfalls. J.Clin Microbiol,2007.45(9): p.2761-4.). Although the NGS technology has been applied to the research of microorganisms and clinical research, there are many limitations to the technology. For example, some bacteria have DNA homology of < 50%, but almost 99% to 100% of 16S rRNA, and thus the species of bacteria cannot be distinguished with high accuracy by 16S/18S/ITS sequencing. In addition, conventional sample pretreatment and DNA amplification techniques have difficulty detecting minute amounts of DNA in a sample.

At present, no literature report that the NGS technology based on trace DNA amplification is used for sterility test required by pharmacopoeia exists, and no report that the NGS technology can be used for sample pretreatment related to sterility test required by pharmacopoeia, sterility judgment standards based on biological information and the like exists.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art. Therefore, the invention provides a method for determining whether a product to be detected is polluted by microorganisms, wherein the method is used for carrying out sterile detection based on an NGS technology, the sample does not need to be cultured, the MDA single cell genome amplification technology can be used for carrying out amplification sequencing on a trace amount of sample to generate a large amount of sequencing data, and even if the content of the microorganisms in the sample is very low, the sample can be detected by data analysis in the sequencing data.

To this end, the invention provides, in a first aspect, a method of determining whether a product to be tested is contaminated with a microorganism. According to an embodiment of the invention, the method comprises:

(1) separating a test sample from a product to be tested, and obtaining a nucleic acid sample;

(2) amplifying the nucleic acid sample to obtain an amplification product;

(3) constructing a sequencing library based on the amplification products;

(4) sequencing the sequencing library to obtain a sequencing result, the sequencing result being comprised of a plurality of sequencing reads;

(5) filtering the sequencing result by utilizing known background biological genome information to obtain a filtered sequencing result; and

(6) and determining whether the product to be tested is polluted by the known microorganism or not based on the sequencing result after the filtration treatment by utilizing the genome information of the known microorganism.

The method for determining whether the product to be detected is polluted by the microorganisms can accurately carry out qualitative (species level) and quantitative determination on the detected microorganisms, and can solve the problems that the detection false positive is high and the species of the microorganisms cannot be accurately identified in the traditional method (a pharmacopoeia method, a conventional sequencing analysis method and the like).

The method for determining whether the product to be tested is contaminated by the microorganism according to the embodiment of the invention can also have at least one of the following additional technical characteristics:

according to the embodiment of the invention, the product to be detected comprises a biological sample, a medicine, a medical raw material, a medical auxiliary material, a medical apparatus, a food, a cosmetic and a health product;

optionally, the drug comprises a cellular drug, an antibody drug, a chemical.

According to an embodiment of the invention, the product to be tested may also be other products suitable for use in pharmacopoeia requiring sterility checks.

According to an embodiment of the invention, the product to be tested contains a known organism or a part of the organism, and the genomic information of the known organism constitutes at least a part of the genomic information of the background organism.

According to an embodiment of the present invention, the background biological genomic information is human reference genomic information; or the product contains probiotics, and the background genome information is the genome information of the probiotics.

According to an embodiment of the invention, the version of the human reference genome is selected from hg38, hg19, hg18, preferably hg 38.

According to the embodiment of the invention, the product to be tested is an aqueous solution and is directly used as a test sample;

or, the product to be tested is a water-soluble solid, and an aqueous solution of the product to be tested is prepared to be used as a test sample;

or the product to be tested is a water-insoluble solid, and the water-insoluble solid is emulsified by using an emulsifier to obtain a test sample.

According to an embodiment of the invention, the nucleic acid sample is obtained using at least one of the following methods:

(a) performing lysozyme treatment on the test sample; and

(b) and (3) carrying out heating treatment on the test sample, wherein the heating is carried out at 65-90 ℃.

According to the embodiment of the present invention, the amount of lysozyme used and the treatment temperature can be selected as required, and the heating time can be selected as required, so long as the microorganisms in the test sample can be completely lysed and the genomic DNA can be released.

According to an embodiment of the present invention, the genome amplification is performed by one or more of the methods of MDA, MALBAC, DOC-PCR, preferably using MDA.

According to an embodiment of the invention, the sequencing is performed in at least one of DNBSEQ-T7, MGISEQ-2000, HiSeq 4000, or HiSeq X Ten sequencing systems.

According to the embodiment of the invention, the sequencing quantity of the sequencing is not less than 1G.

According to an embodiment of the invention, the filtering process comprises:

(i) comparing the sequencing result with the background biological genome information to obtain a matching sequencing read that can be compared with the background biological genome information; and

(ii) removing the matched sequencing reads from the sequencing result to obtain the filtered sequencing result.

According to an embodiment of the invention, the filtering process further comprises:

(iii) aligning the match sequencing reads to known endogenous biological genome information to obtain aligned endogenous biological sequencing reads; and

(iv) supplementing said endogenous biological sequencing reads with filtered said sequencing results to obtain filtered said sequencing results.

According to an embodiment of the invention, the genome of the known endogenous organism is genetically recombined with the genome of the background organism.

In some embodiments of the present invention, the method further comprises preprocessing and base correction of the data of the sequencing result before the filtering process, which can ensure the accuracy of the subsequent analysis result.

In some embodiments of the invention, the preprocessing and base correction of the sequencing result data is performed by filtering low quality reads (sequencing reads), filtering reads containing linker contaminants, and correcting potentially erroneous bases in reads by comparing base depth differences and base quality at overlapping portions between double-ended reads.

According to the embodiment of the present invention, in the step (6), the method further comprises:

(6-1) determining the number of sequencing reads aligned to the known microbial genome information;

(6-2) determining a contamination index of the known microorganism based on the number of sequencing reads obtained in step (6-1), the contamination index being positively correlated with the number of sequencing reads obtained in step (6-1);

(6-3) determining whether the product is contaminated by the known microorganism based on a difference of the contamination index from a predetermined threshold value.

According to an embodiment of the present invention, the pollution index is determined by the following formula,

RPM＝a*1000000/b，

wherein RPM represents the contamination index, a is the number of sequencing reads obtained in step (6-1), and b is the total number of sequencing reads in the filtered sequencing results obtained in step (5).

According to an embodiment of the invention, the predetermined threshold is not lower than 10.

In a second aspect the invention provides the use of a method according to the first aspect of the invention for identifying a microbial species in an article subjected to sterility testing.

The existing sterile detection or microorganism detection means mainly comprise the following schemes:

1) the sterility test method in pharmacopoeia, the current version of sterility test method is the sterility test method in Chinese pharmacopoeia (2015 version). Briefly, samples were subjected to microbial culture by specific media under strict adherence to sterile conditions to identify the presence of bacterial or fungal contamination in the samples (Chinese pharmacopoeia [ S ].2015 edition. four. 136-140.).

2) Adenosine Triphosphate (ATP) bioluminescence assay uses the reaction of ATP with a luciferin-luciferase complex to determine the presence or absence of ATP. Because the ATP content in the cells of each microorganism is constant, the ATP content in a sample is related to the number of microorganisms in the sample (Decool, A., et al, Detection of bacterial adenosine triphosphate through biologics, applied to a Rapid reliability test of objective preparations. analytical chip Acta,1991.255(2), 423-) 425.).

3) A nucleic acid amplification method, namely, a nucleic acid in vitro amplification technology is used for qualitatively and quantitatively detecting trace nucleic acid (Wangxingchong, Zhoukuang, Lijun, and the like.) in a sample, wherein a real-time fluorescence quantitative PCR method is applied to the research on the sterile rapid detection of medicines [ J ]. China modern application pharmacy, 2013, 30 (12): 1333-1337.). The method is an important technology in the fields of molecular biology, biological analysis and the like, and plays an important role in various related fields of clinical medicine, inspection medicine, food safety and the like.

4) The environmental microorganism identification based on the conventional NGS technology adopts a high-throughput sequencing technology and identifies the types of environmental (such as intestinal tracts, reproductive tracts, soil and the like) microorganisms by means of metagenomic sequencing or 16S/18S/ITS sequencing.

However, the above methods all have disadvantages, specifically as follows:

1) the pharmacopoeia sterility test method requires culturing microorganisms in a sample to be tested, may have a problem that the microorganisms in the sample cannot be cultured, and even if the microorganisms can be cultured, the culture period is long (generally, 7 to 14 days), and the detection sensitivity is less than 100CFU (note: CFU refers to the total number of bacterial colonies that can grow on ordinary nutrient agar plates when cultured aerobically at 37 ℃ for 48h, as specified by the national standard methods).

2) Adenosine Triphosphate (ATP) bioluminescence method, which can not identify the species of microorganism, has low accuracy and detection sensitivity less than 1000 CFU.

3) The nucleic acid amplification method has high false positive and sensitivity between 10 and 100CFU due to the possible amplification bias in the nucleic acid amplification process.

4) Based on the environmental microorganism identification of the conventional NGS technology, the types of microorganisms cannot be accurately identified through 16S, 18S and other sequencing, the conventional library-establishing sequencing cannot meet the sensitivity requirement of pharmacopeia sterile detection, and the method cannot be used in the field of cell preparation or medicine sterile detection required by pharmacopeia.

In recent years, amplification and library sequencing of very small amounts of DNA vectors by single-cell amplification techniques, such as single-cell DNA sequencing techniques based on Multiple Displacement Amplification (MDA) and multiple annealing-loop amplification cycles (MALBAC), have been used in the scientific field (spots, C., et al, white-genome amplification from single-nature protocol, 2006.1(4): p.1965-70; chemistry, M., expression, composition of Multiple Displacement Amplification (MDA) and multiple annealing-loop amplification cycles (L.E.M.) 2014.9

In order to overcome the defects and shortcomings of the prior art, the invention invents a rapid sterile detection method based on NGS through a large amount of experimental conditions, and a DNA library building technology amplified by MDA has the characteristics of small amplification bias and high amplification sensitivity, only needs extremely trace DNA in a sample to amplify, and solves the problem of low sensitivity of the traditional method. The method can achieve higher sterility detection sensitivity than a pharmacopoeia method, and meanwhile, the method can remarkably shorten the detection time, the whole detection period only needs 24-48 hours, and the problem that the speed and the accuracy of the traditional detection method cannot be obtained simultaneously is solved. The invention provides a pretreatment method of a biological product sterile detection sample based on sequencing; the invention also provides a result judgment standard of the NGS-based sterility test.

The invention is based on NGS whole genome sequencing and comprehensive microorganism marker molecule database, and can accurately perform qualitative (species level) and quantitative determination on detected microorganisms by an autonomously established bioinformatics analysis method, and can overcome the problems that the traditional methods (pharmacopoeia method, conventional sequencing analysis method and the like) are high in detection false positive and cannot accurately identify microorganism species.

The invention has the following beneficial effects:

1) the pretreatment method for the NGS-based sterile detection sample has the characteristics of no need of culturing the sample and high speed, solves the problems that the traditional method (such as sterile inspection in pharmacopoeia) needs culturing microorganisms and has long period, and can also detect the microorganisms which can not be cultured by the current pharmacopoeia method.

2) The DNA library construction technology based on MDA amplification provided by the invention has the characteristics of small amplification bias and high amplification sensitivity, only needs extremely trace DNA in a sample to amplify, and solves the problem of low sensitivity of the traditional method (such as a nucleic acid amplification method and ATP bioluminescence).

3) The multi-step biological information algorithm based on data filtering and correction, database comparison and the like can accurately identify the species of microorganisms, can quantify according to the abundance of detected reads, and solves the problems that the traditional method is not accurate enough, cannot identify the species and the like.

4) The NGS-based rapid sterile detection method provided by the invention is the only microbial detection technology which has high sensitivity, species identification, no need of culture and rapidness, and can be used for rapid sterile detection of clinical biological products, particularly cell preparations, which require sterile detection in pharmacopoeia.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 shows a schematic flow diagram of a method for determining whether a product to be tested is contaminated with microorganisms according to the present invention;

FIG. 2 is a schematic view showing a flow of bioinformatic analysis of the microbial detection according to the embodiment of the present invention;

FIG. 3a shows the results of the determination of the contents of the whole genome amplification products of Escherichia coli, Pseudomonas aeruginosa, Staphylococcus aureus, Bacillus subtilis, Clostridium sporogenes and Candida albicans by different treatment methods;

FIG. 3b shows an electrophoretogram of amplification products from a model bacterial whole genome, lanes 1-5 being E.coli (1, 5, 10, 50, 100 CFU); lanes 6-10 are Pseudomonas aeruginosa (1, 5, 10, 50, 100 CFU); lanes 11-15 are Staphylococcus aureus (1, 5, 10, 50, 100 CFU); lanes 16-20 are B.subtilis (1, 5, 10, 50, 100 CFU); lanes 21-25 are Clostridium sporogenes (1, 5, 10, 50, 100 CFU); lanes 26-30 are Candida albicans (1, 5, 10, 50, 100 CFU); lanes 31-35 are Fusobacterium nucleatum (1, 5, 10, 50, 100 CFU); lane 36 is a sterile water negative control;

FIG. 4 shows the data amount of sequencing samples of Fusobacterium nucleatum (029), Candida albicans (B), Escherichia coli (D), Staphylococcus aureus (J), Bacillus subtilis (K), Clostridium sporogenes (S) and Pseudomonas aeruginosa (T) at the species numbers of 1CFU, 5CFU, 10CFU, 50CFU, 100CFU and negative control group (sterile water);

FIG. 5 shows the data quality Q20 ratio of sequencing samples for Fusobacterium nucleatum (029), Candida albicans (B), Escherichia coli (D), Staphylococcus aureus (J), Bacillus subtilis (K), Clostridium sporogenes (S), and Pseudomonas aeruginosa (T) at strain numbers of 1CFU, 5CFU, 10CFU, 50CFU, 100CFU, and negative control (sterile water);

FIG. 6 shows the results of the sensitivity measurements of the number of strains with different gradients;

FIG. 7 shows the results of the coverage of the detection with different gradient species number sensitivity;

FIG. 8 shows the results of the average coverage depth of genomes of different species detected with different gradient sensitivities;

FIG. 9 shows the test results of mixed samples of Escherichia coli (D) and Staphylococcus aureus (J) mixed at different concentration ratios;

FIG. 10 shows the genomic coverage of mixed samples tested when Escherichia coli (D) and Staphylococcus aureus (J) were mixed at different concentration ratios.

Detailed Description

The invention provides a method for determining whether a product to be detected is polluted by microorganisms, wherein the method is used for carrying out aseptic detection based on an NGS technology, samples do not need to be cultured, and a trace amount of samples can be amplified and sequenced through an MDA single cell genome amplification technology to generate a large amount of sequencing data, so that even if the content of microorganisms in the samples is very low, the samples can be detected through data analysis in the sequencing data.

To this end, the invention provides a method for determining whether a product to be tested is contaminated with microorganisms. Referring to FIG. 1, the overall process flow of the method for determining whether a product to be tested is contaminated by microorganisms according to the present invention is shown. According to an embodiment of the invention, the method comprises:

(2) amplifying the nucleic acid sample to obtain an amplification product;

(3) constructing a sequencing library based on the amplification products;

In some embodiments of the invention, the product to be tested comprises a biological sample, a medicine, a medical raw material, a medical accessory, a medical apparatus, a food, a cosmetic and a health product;

optionally, the drug comprises a cellular drug, an antibody drug, a chemical.

In some embodiments of the invention, the product to be tested in the aqueous solution can be directly used as the solution to be tested; for water-soluble solids, sterile PBS/physiological saline diluent is adopted for dissolution or redissolution according to label instructions; for water insoluble samples, appropriate dilutions of polysorbate 80 or other suitable emulsifiers were used for dissolution. The concentration of the above-mentioned reagents used can be selected according to the concentration conventional in the art.

In some embodiments of the invention, after obtaining the test sample in liquid form, it is also possible to filter with a membrane no larger than 0.22 μm, then rinse the membrane with sterile PBS/saline, collect the rinse, concentrate the microorganisms, and finally increase the NGS-based microorganism detection sensitivity.

Through the treatment of the sample to be detected, the sample to be detected can reach and exceed the detection sensitivity of a sterile inspection culture method of pharmacopoeia without culture, and can also detect microorganisms which can not be cultured at present.

In some embodiments of the invention, the nucleic acid sample is obtained using at least one of the following methods:

(a) performing lysozyme treatment on the test sample; and

In some embodiments of the present invention, the amount of lysozyme used and the treatment temperature may be selected as desired, and the heating time may be selected as desired, so long as the test sample is completely lysed and genomic DNA is released.

Unlike animal cells, microorganisms (bacteria/fungi) have cell walls, which increase the difficulty of lysis of cellular DNA and affect genomic amplification, and in some embodiments of the invention, bacterial/fungal lysis and genomic DNA release may be facilitated by treating the sample to be tested with lysozyme and lysing the microorganisms at high temperatures (e.g., 70 ℃).

In some embodiments of the invention, the genomic amplification is performed by a combination of one or more of the methods MDA, MALBAC, DOC-PCR, preferably using MDA.

In some embodiments of the invention, the invention adopts an MDA-based whole genome amplification method in the genome amplification step, and can complete whole genome amplification of as few as 1 microbial cell in 6-8 hours by adding Phi29 polymerase, Phi29 polymerase Buffer, random primers and dNTP in a microbial DNA template.

In some specific embodiments of the invention, the background organism genome is human reference genome hg38 and the known endogenous organism genome is an endogenous viral sequence in a human genome.

In some embodiments of the invention, the sequencing is performed in at least one of a DNBSEQ-T7, MGISEQ-2000, HiSeq 4000, or HiSeq X Ten sequencing system.

In some embodiments of the invention, the sequencing amount of the sequencing is not less than 1G.

In some embodiments of the invention, the filtering process comprises:

(i) aligning the sequencing result with a human reference genome hg38 to obtain a matching sequencing read that can be aligned with human reference genome hg 38; and

The filtering process further comprises:

(iii) aligning the matched sequencing reads to endogenous viral sequences in the human genome to obtain aligned endogenous biological sequencing reads; and

FIG. 2 shows the detailed process of bioinformatic analysis of sequencing results. First, the sequencing data is subjected to pretreatment and base correction. For Raw data (Raw reads) of a sequencing system, a certain error rate exists in sequencing bases due to instrument system errors. Therefore, for the accuracy of subsequent microorganism identification, the raw off-machine data needs to be preprocessed and base corrected to improve the accuracy of subsequent alignment and species identification. The pretreatment process mainly comprises the steps of filtering low-quality reads, filtering reads containing adaptor pollution, and correcting potential wrong bases in the reads through base depth difference of overlapped parts between double-end reads and base quality condition comparison.

The data obtained by the above pre-treatment and base correction (called Clean reads) are aligned to the human reference genome hg38 by alignment software (e.g. Bowtie2, SOAP2 or BWA, etc., preferably Bowtie 2). For reads aligned to hg38, further alignments are made to endogenous viral sequences in the human genome, and reads that can be aligned to endogenous viral reference sequences are exported. Next, reads from the aligned endogenous viral sequences were merged with reads from the aligned hg 38-free reference sequence to yield the Filtered total data (Filtered data). And finally, comparing the Filtered data with a microbial molecular marker database MetaPhon 2 to obtain a Bam file, and identifying the types of microorganisms related to sequencing data through statistics. The comparison process and the comparison database mentioned in the method can comprehensively consider the types of the microorganisms and the specific tag sequences, thereby quickly and accurately identifying the types of the microorganisms.

In some embodiments of the invention, a detection threshold criterion for sterility detection is provided. Performing statistical analysis on the Bam files obtained by the comparison to obtain the numbers of Reads of each species in the comparison, and normalizing the numbers of the Reads through the total numbers of the Reads to obtain the numbers of the Reads (RPM, Reads Per Million) of the specific species compared in each Million of the total Reads, wherein a specific calculation formula is as follows:

comparing RPM (revolutions Per minute) with the number of all reads of one microorganism by 1000000/the total number of all microorganism reads;

note: the total number of all microbial reads refers to the number of all reads after alignment of the host (hg38) sequence was removed, but the endogenous viral sequence was retained.

By repeating the test of experiments on 7 kinds of microorganisms (fusobacterium nucleatum, escherichia coli, pseudomonas aeruginosa, staphylococcus aureus, bacillus subtilis, clostridium sporogenes, candida albicans) prescribed by the pharmacopoeia law, all the microorganisms can be detected only when RPM is greater than or equal to 10 under the condition that the detection sensitivity is 10CFU, and the detection result of a few samples is negative when RPM is less than 10 (see example 2 and example 3), so that RPM greater than or equal to 10 is finally determined as the judgment standard for positive detection (under the condition of 10CFU sensitivity).

The method provided by the invention can perform whole genome amplification on trace DNA of microorganisms in a sample, and can detect bacteria and fungi with the number of 10 CFU. The invention provides a biological information processing method. The method is characterized in that the method can rapidly and accurately analyze the microorganism species in the sample to be detected, and count the coverage and depth of the microorganism species to perform qualitative and quantitative detection on the microorganism. The NGS-based detection method provided by the invention is free of culture, rapid, high-throughput, high in accuracy, high in sensitivity and the like, and is suitable for products or materials which are required to be subjected to sterile detection by pharmacopoeia, including but not limited to cell preparations, medicines, medical apparatuses and other products required to be subjected to sterile detection by pharmacopoeia.

The following describes embodiments of the present invention in detail. The following examples are illustrative only and are not to be construed as limiting the invention. The examples, where specific techniques or conditions are not indicated, are to be construed according to the techniques or conditions described in the literature in the art or according to the product specifications. The reagents or instruments used are not indicated by the manufacturer, and are all conventional products commercially available.

Example 1 Whole genome amplification of microbial Trace DNA based on the MDA method

For comparison, different microbial lysis methods were used, and in this example, 4. mu.L of Escherichia coli, Pseudomonas aeruginosa, Staphylococcus aureus, Bacillus subtilis, Clostridium sporogenes, and Candida albicans containing 100CFU were used, respectively, while 4. mu.L of nucleic-free water was used as a negative control, according to method 1: adding 3 mu L of lysozyme, and treating for 10 minutes at 65 ℃; the method 2 comprises the following steps: treating at 90 deg.C for 10 min; the method 3 comprises the following steps: lysozyme was added for 10 minutes at 65 ℃ and then for 5 minutes at 90 ℃. After samples are treated by different treatment methods, the liquid on the tube wall is collected to the tube bottom by short-time centrifugation, 40 mu L of Amplification Reaction mixed liquid (Amplification Reaction Buffer 39 mu L, Amplification Enzyme 1 mu L) of the Chiense MGIEasy single cell whole genome Amplification kit is added, and the mixture is shaken and evenly mixed. And (3) placing the PCR tube on a PCR instrument to perform whole genome PCR amplification reaction. The reaction was carried out at 30 ℃ for 4 to 8 hours, 65 ℃ for 5 minutes. And (3) carrying out gel electrophoresis detection and concentration determination on the whole genome amplification product after the reaction is finished, and comparing the genome amplification differences of different sample processing methods. The results of the whole genome amplification product detection (FIG. 3a) show that relatively more genome amplification products can be obtained from the microorganisms subjected to lysozyme and heat treatment, and the genome integrity is not affected.

In order to detect the detection sensitivity of the trace microorganisms based on the NGS technology, 4 mu L of the lysozyme is taken to contain 1CFU, 5CFU, 10CFU, 50CFU, 100CFU Escherichia coli, pseudomonas aeruginosa, staphylococcus aureus, bacillus subtilis, clostridium sporogenes, candida albicans and fusobacterium nucleatum respectively, 4 mu L of nucleic-free water is taken as a negative control, and 3 mu L of lysozyme is added into a PCR tube containing a sample to be detected and the negative control sample. And (3) centrifuging for a short time to collect the liquid on the tube wall to the bottom of the tube, placing the PCR tube on a PCR instrument, and performing cell lysis reaction at 65 ℃ for 10 min. After the cell lysis Reaction is finished, taking out the PCR tube, instantly centrifuging, adding 3 mu L of neutralization Buffer solution, transiently centrifuging to collect tube wall liquid, adding 40 mu L of whole genome Amplification Reaction mixed solution (Amplification Reaction Buffer 39 mu L, Amplification Enzyme 1 mu L), shaking and uniformly mixing. And (3) placing the PCR tube on a PCR instrument to perform whole genome PCR amplification reaction. The reaction was carried out at 30 ℃ for 4 to 8 hours, 65 ℃ for 5 minutes. And after the reaction is finished, carrying out gel electrophoresis detection and concentration determination on the whole genome amplification product, and storing the whole genome amplification product in a refrigerator at the temperature of-20 ℃ for later use. The results of the whole genome amplification electrophoresis are shown in FIG. 3b, and the results show that the whole genome DNA products can be obtained from different types of bacteria with the number of CFU as low as 10 by the whole genome PCR amplification.

EXAMPLE 2 sensitivity test for detection of microbial samples

A total of 6 bacteria and a fungus (Fusobacterium nucleatum, Escherichia coli, Pseudomonas aeruginosa, Staphylococcus aureus, Bacillus subtilis, Clostridium sporogenes and Candida albicans) were taken, each microorganism was set to 5 gradient numbers of strains corresponding to 1CFU, 5CFU, 10CFU, 50CFU, 100CFU and a negative control group (sterile water), respectively, and the protocol is shown in Table 1. After each sample DNA is amplified by MDA, high-throughput sequencing is carried out by a DNBSEQ-T7 sequencing system, PE100 (namely double-end sequencing, each read segment is 100bp in length) is adopted as a sequencing strategy, and the sequencing data volume of each sample is not less than 1 Gb. Sequencing data was obtained for each sample.

TABLE 1 test of sensitivity of detection of different individual species (the number of the species is in parentheses after the species)

Next, the bioinformation analysis flow is as follows:

1) and filtering and base correcting the sequencing data of each sample, and removing low-quality sequencing data and reads containing joint pollution to obtain high-quality clean data. The Q20 and data volume after sample filtration are shown in fig. 4 and 5, respectively;

2) comparing the filtered clean data with the ginseng reference genome hg38 to obtain reads with hg38 and reads with hg38, and naming the reads without hg as a sequence file 1;

3) reads aligned to hg38 were further aligned to human endogenous viral sequences, resulting in reads aligned to endogenous viral sequences (seq id No. 2).

4) The sequence file 1 and the sequence file 2 are combined to obtain total reads (sequence file 3), and the total reads are aligned to a microbial molecular marker database MetaPhon 2.

5) And (4) counting the sequences of the databases on the alignment, and correspondingly obtaining the numbers of the Reads on the alignment in different microorganism species, namely the numbers of the Reads of the specific microorganisms on the alignment in each Million of total Reads (Reads Per Million, RPM).

6) And (3) comparing the sequence file 3 to a reference genome of the identified strains to obtain a compared Bam file, and counting the coverage and the average coverage depth of each strain by adopting bamdst software. Through the tests of different gradient sample amounts of the seven microorganisms, all microorganisms can be detected only when the RPM is greater than or equal to 10 under the condition of the sensitivity requirement of 10CFU, and when the RPM is less than 10, the detection results of 2 microorganisms (staphylococcus aureus and candida albicans) are negative (all the results are negative in multiple times of repetition), and the detection sensitivity and the coverage are shown in the graphs of 6-8. As can be seen from the figure, the lowest detection limit of all mode strains is as low as 10CFU if and only if RPM ≧ 10, and thus RPM ≧ 10 can be used as the detection threshold determination criterion at this sensitivity.

Example 3 Mixed microorganism NGS method accuracy detection test

In order to test the sensitivity of the detection of microorganisms in the mixed condition of different microorganisms, Escherichia coli and staphylococcus aureus are mixed according to different proportions, DNA extraction, MDA amplification and sequencing are carried out on the mixed microorganisms, the sequencing data volume is not less than 1G, and the data analysis flow and software parameters are consistent with the analysis process of the sensitivity detection of a single microorganism in the example 2. The specific experimental protocol is as follows:

TABLE 2 detection sensitivity test for different mixed strains (the number of the strain is in parentheses after the strain)

Bacterial strain	Sample (I)	Scheme(s)
			Escherichia coli (D), Staphylococcus aureus (J)	DJ-0	D and J mix at 100 to 0CFU
Escherichia coli (D), Staphylococcus aureus (J)	DJ-0.1	D and J were mixed at 999 to 1CFU
			Escherichia coli (D), Staphylococcus aureus (J)	DJ-0.5	D and J were mixed at 995 to 5CFU
Escherichia coli (D), Staphylococcus aureus (J)	DJ-1	D and J mixed at 99 to 1CFU
			Escherichia coli (D), Staphylococcus aureus (J)	DJ-2	D and J mix at 98 to 2CFU

As is clear from FIGS. 9 and 10, in Escherichia coli, Staphylococcus aureus was mixed at various ratios (0%, 0.1%, 0.5%, 1%, 2% in this order), and in addition to the 0% sample in which Staphylococcus aureus was not mixed, Staphylococcus aureus smaller than 10CFU was finally detected by the NGS method. Both Escherichia coli (D) and Staphylococcus aureus (J) were able to detect genomic coverage at different concentrations.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. a method for determining whether a product to be tested is contaminated by microorganisms, is characterized in that, comprising:

(1) Separate test samples from the product to be tested and obtain nucleic acid samples;

(2) amplifying the nucleic acid sample to obtain an amplification product;

(3) constructing a sequencing library based on the amplified product;

(4) sequencing the sequencing library to obtain a sequencing result, the sequencing result being composed of multiple sequencing reads;

(5) using the known background biological genome information to filter the sequencing results to obtain the filtered sequencing results; and

(6) Using the genome information of known microorganisms, and based on the filtered sequencing results, determine whether the product to be tested is contaminated with the known microorganisms.

2. The method according to claim 1, wherein the product to be tested comprises biological samples, medicines, pharmaceutical raw materials, pharmaceutical auxiliary materials, medical devices, food, cosmetics and health care products;

Optionally, the drugs include cellular drugs, antibody drugs, and chemical drugs.

3. The method according to claim 1, wherein the product to be tested contains a known organism or a part of the organism, and the genome information of the known organism constitutes the genome of the background organism at least part of the information;

Optionally, the background organism genome information is human reference genome information; or, the product contains probiotic bacteria, and the background genome information is the genome information of the probiotic bacteria.

4. method according to claim 1, is characterized in that, described product to be tested is an aqueous solution, directly as test sample;

Alternatively, the product to be tested is a water-soluble solid, and an aqueous solution of the product to be tested is prepared as a test sample;

Alternatively, the product to be tested is a water-insoluble solid, and the water-insoluble solid is emulsified with an emulsifier to obtain a test sample.

5. The method according to claim 1, wherein the filtering process comprises:

(i) aligning the sequencing results with the background organism genomic information to obtain matching sequencing reads that can be aligned with the background organism genomic information; and

(ii) removing the matching sequencing reads from the sequencing results to obtain the filtered sequencing results.

6. The method according to claim 5, wherein the filtering process further comprises:

(iii) aligning the matched sequencing reads with known endogenous organism genomic information to obtain aligned endogenous organism sequencing reads; and

(iv) supplementing the endogenous biological sequencing reads into the filtered sequencing results to obtain the filtered sequencing results;

Optionally, the genome of the known endogenous organism is genetically recombined with the genome of the background organism.

7. method according to claim 1, is characterized in that, in step (6), further comprises:

(6-1) Determine the number of sequencing reads compared with the known microbial genome information;

(6-2) Based on the number of sequencing reads obtained in step (6-1), determine a contamination index of the known microorganisms, and the contamination index is the same as the sequencing reads obtained in step (6-1). The number of segments is positively correlated;

(6-3) Determine whether the product is contaminated with the known microorganism based on the difference between the contamination index and a predetermined threshold.

8. The method according to claim 7, wherein the pollution index is determined by the following formula,

RPM=a*1000000/b,

Wherein, RPM represents the pollution index, a is the number of sequencing reads obtained in step (6-1), and b is the total number of sequencing reads in the filtered sequencing results obtained in step (5).

9 . The method according to claim 8 , wherein the predetermined threshold is not lower than 10. 10 .

10. Use of the method of any one of claims 1 to 9 for identifying microbial species in a product to be tested for sterility.