[go: up one dir, main page]

HK40053255A - Ssi cells with predictable and stable transgene expression and methods of formation - Google Patents

Ssi cells with predictable and stable transgene expression and methods of formation Download PDF

Info

Publication number
HK40053255A
HK40053255A HK62021042075.3A HK62021042075A HK40053255A HK 40053255 A HK40053255 A HK 40053255A HK 62021042075 A HK62021042075 A HK 62021042075A HK 40053255 A HK40053255 A HK 40053255A
Authority
HK
Hong Kong
Prior art keywords
cell
gene
locus
peaks
interest
Prior art date
Application number
HK62021042075.3A
Other languages
Chinese (zh)
Inventor
P·M·奥卡拉汉
S·贝万
R·扬
P·法拉色
L·张
Original Assignee
龙沙有限公司
巴布拉罕姆研究所
辉瑞大药厂
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 龙沙有限公司, 巴布拉罕姆研究所, 辉瑞大药厂 filed Critical 龙沙有限公司
Publication of HK40053255A publication Critical patent/HK40053255A/en

Links

Description

SSI cells with predictable and stable transgene expression and methods of formation
Cross Reference to Related Applications
This application claims the benefit of U.S. provisional patent application serial No. 62/739,546, filed 2018, 10/1, which is incorporated herein by reference for all purposes.
Background
The integration of recombinant protein (rP) expression cassettes in host cells for the expression of heterologous polypeptides has been performed for many years. Traditionally, Random Integration (RI) procedures are used, which exploit the existing double-strand breaks in the genome to incorporate expression cassettes. Unfortunately, due to site mottling effects, both the integrated gene copy number and the expression signature at the integration site can be highly variable during RI, resulting in undesirable phenotypic heterogeneity. Thus, the RI process requires expensive screening for integration events in developing useful cell lines. In addition, gene amplification methods for increasing expression can cause instability in the genome (e.g., deletions, duplications, translocations) as well as expression modification epigenetic effects (e.g., methylation, histone modification, heterochromatin invasion). Thus, RI-producing cell lines are generally unstable and show reduced yield over time.
Recently, site-specific integration (SSI) has been developed, in which a "landing pad" is formed in the genome of a cell by the integration of a Recombinant Target Site (RTS) from a site-specific recombinase system, such as the Saccharomyces cerevisiae-derived FLP-Frt system or the bacteriophage P1-derived Cre-loxP system. The process of cassette integration in SSI cell lines is known as recombinase-mediated cassette exchange (RMCE). RMCE typically involves co-transfection of an expression vector encoding the recombinase and a targeted expression vector containing a gene of interest (GOI) flanked by recombinase targeting sequences. By using different RTS (in donor and target DNA) at the 5 'and 3' ends of the cassette to be exchanged, the SSI integration approach can ensure that recombination occurs in a targeted manner and that only the preferred cassette regions are exchanged.
Unfortunately, SSI-producing cell lines may also have limitations. For example, the SSI system requires insertion of RTS into the genome as a prerequisite for vector targeting and generation of GOI expressing cell lines. RTS insertion is usually performed by RI or into a limited number of specific genomic regions, and thus the resulting cell lines still suffer from instability and reduced yield over time. Furthermore, SSI often results in low copy number of the integrated gene, which may indirectly limit rP production titers.
One method of increasing the integrated copies of a recombinant gene is known as cumulative or accumulating SSI (see, e.g., Kameyama et al biotechnol. bioeng.105: 1106-14(2010), Kawabe et al cytotechnology 64: 267-79(2012), and Turan et al j. mol. biol.402: 52-69 (2010)). This method may comprise repeated cycles of RMCE to load multiple copies of the rP expression cassette in sequence at a single site.
What is needed in the art is an SSI cell line that incorporates RTS at a transcriptionally active and highly stable locus in the genome of a host cell. Such a cell line would be capable of stable and long term expression of GOIs.
Publications, patents, and patent applications are cited herein, the disclosures of which are incorporated by reference in their entirety.
Disclosure of Invention
The present disclosure is based on the recognition that transcriptional export from a transgene insertion site and the stability of its expression system will be strongly influenced by the 3-dimensional (3D) structure of chromatin in this region. The present disclosure describes methods based on this recognition for determining the structure and validation of a genome in 3 dimensions (3D mapping of the genome). The disclosed 3D mapping method can be performed by using techniques such as, for example, Hi-C and other chromosome conformation capture methods (Elzo de Wit and Wuter de Laat. genes Dev.201226: 11-24) and promoter capture Hi-C (Schoenfelder et al. genome Res 25: 582-97(2015)), among others. Methods of using information obtained by 3D mapping protocols and mammalian cells that can be formed by the methods are also described. The present application teaches how to generate a multi-level 3D genomic map and then use this information to identify the optimal genomic integration site for expression of a heterologous gene. For example, by interrogating the mapped 3D genomic structure, integration sites that may exhibit high performance can be identified.
In one embodiment, the disclosure relates to a mammalian cell comprising RTS at a High Integration (HI) locus. The HI locus is a high performance genomic locus that the inventors identified by analyzing the 3D hierarchy of genomic chromatin. Beneficially, the HI locus is in a stable, transcriptionally active environment of the genome and can be repeatedly targeted to deliver predictable and stable levels of GOI expression.
The HI locus may be located in the active genomic compartment of accessible chromatin, and may also be located in about 30,000 base pairs of the topologically-associated domain (TAD) boundary. Furthermore, the HI locus may overlap with a region of the genome that interacts with at least one enhancer element. The HI locus may differ depending on whether expression of the GOI is driven by an endogenous promoter in situ or by a heterologous promoter. For example, in those cell lines where expression of the GOI is driven by an endogenous promoter in situ, the HI loci may overlap and be located downstream of the Transcription Start Site (TSS). Furthermore, in this embodiment, the HI locus may overlap with an active locus, and in some embodiments, also with a fully-annotated locus, e.g., its expression product or its lack of an active gene that is not important to the cell. In those cell lines where expression of the GOI is driven by a heterologous promoter, the HI locus may typically be outside of the active or non-transcribed locus. For example, the HI locus in such a cell may comprise a locus that does not overlap with any relevant promoter region of an active gene, or in one embodiment is not within about 1,000 base pairs of any active gene (e.g., within about 1,000 base pairs of any active and fully-tagged gene).
In some embodiments, the cell can comprise a plurality of RTS, e.g., in some embodiments, at least two RTS, at least four RTS, or even more. For example, a cell may comprise multiple RTS in a single HI locus, in different HI loci, and/or in separate loci (e.g., the FerIL4 locus).
In some embodiments, the RTS can comprise a Frt site, a lox site, a rox site, or an att site. In some embodiments, the RTS can comprise an amino acid sequence selected from SEQ ID No.: 126 and 155.
Cell types encompassed herein may include, but are not limited to, mouse cells, human cells, Chinese Hamster Ovary (CHO) cells, CHO-K1 cells, CHO-DXB11 cells, CHO-DG44 cells, CHOK1SV cells including all variantsTMCells, CHO glutamine synthetase knockout cells comprising all variants, HEK cells, HEK293 cells comprising both adherence and suspension adapted variants, HeLa cells or HT1080 cells.
In one embodiment, the cell may comprise a GOI, e.g., a chromosomally integrated GOI, such as a reporter gene, a selection gene, a gene of therapeutic interest, an auxiliary gene, or a combination of genes. The GOI may encode a difficult-to-express (DtE) protein, such as an Fc-fusion protein, an enzyme, a membrane receptor, or a monoclonal antibody (e.g., a bispecific or trispecific monoclonal antibody). In one example, a GOI can be located between two RTS within a single HI locus. In some embodiments, the cell can incorporate multiple GOIs. For example, a cell can incorporate two or more GOIs within a single HI locus, can incorporate multiple GOIs, one or more of which are located in different HI loci, and/or can incorporate multiple GOIs in any combination of HI loci and individual loci. In some embodiments, the cell can incorporate a recombinase gene, such as a site-specific recombinase gene, which in one embodiment can be chromosomally integrated.
Also disclosed are methods for producing recombinant cells. For example, a method may comprise mapping peaks in accessible chromatin of a cellular genome, and identifying within the mapped peaks in accessible chromatin a first set of peaks located within an active genomic compartment of accessible chromatin and also located within about 30,000 base pairs of a topologically related domain (TAD) boundary. In one embodiment, the first set of peaks may be in the active genomic compartment (e.g., as defined by principal component analysis method (PCA)) and may also be in open chromatin (e.g., as defined by ATAC-seq), although this is not a requirement of the method, and in other embodiments the first set of peaks may comprise those peaks in the active genomic compartment throughout the mapped accessible chromatin. The method may further comprise identifying those peaks in the first set of peaks that overlap with a region of the genome that interacts with the at least one enhancer element. The HI locus can then be defined in the peaks that meet these criteria. After the HI locus is identified, RTS can be inserted into the HI locus. Optionally, a gene encoding a site-specific recombinase may also be inserted into the cell.
In those embodiments in which expression of the gene from the HI locus will be driven by the endogenous promoter in situ, a method may further comprise identifying, among the first set of peaks overlapping the region of the genome that interacts with the at least one enhancer element, a second set of peaks that overlap with the TSS, and in particular a TSS for its expression product or its lack of an unimportant active gene. The HI locus may be defined in the second set of peaks, overlapping the active gene and located downstream of the TSS of the active gene.
In those embodiments in which expression of a gene from a HI locus is to be driven by a heterologous promoter, a method may further comprise identifying, within a first set of peaks that overlap a region of the genome that interacts with at least one enhancer element, those peaks in accessible chromatin that do not overlap an active gene or its associated promoter region, and may define a HI locus within the second set of peaks.
One method may further comprise transfecting the cell with a vector comprising an exchangeable cassette encoding a GOI, and integrating the exchangeable cassette into the HI locus. Then, a cell comprising the exchangeable cassette integrated into the chromosome at the HI locus can be selected as a recombinant protein producing cell.
Optionally, the method can comprise incorporating an additional RTS into the cell. For example, the additional RTS can be incorporated into the same HI locus as the first RTS, into one or more additional HI loci, and/or into one or more separate loci.
According to another embodiment, a method for producing a recombinant cell is disclosed, the method comprising mapping peaks in accessible chromatin of a genome of the cell, and identifying within the mapped peaks in accessible chromatin a first set of peaks located within an active genomic compartment of the accessible chromatin and also located within about 30,000 base pairs of a topologically related domain (TAD) boundary. In one embodiment, the first set of peaks may be in the active genomic compartment (e.g., as defined by principal component analysis method (PCA)) and may also be in open chromatin (e.g., as defined by ATAC-seq), although this is not a requirement of the method, and in other embodiments the first set of peaks may comprise those peaks in the active genomic compartment throughout the mapped accessible chromatin. The method may further comprise identifying those peaks in the first set of peaks that overlap with a region of the genome that interacts with the at least one enhancer element. Multiple HI loci can then be defined within the resulting set of mapped peaks. A method may further comprise integrating the RTS into a plurality of cells (e.g., according to RI protocols), and then selecting from the plurality of cells a cell that comprises the RTS integrated into the HI locus. Optionally, a gene encoding a site-specific recombinase may also be inserted into the selected cell.
In one example, the HI loci identified by this method can be ranked according to effectiveness. For example, the HI loci may be ranked according to one or more of the expression level of one or more genes associated with each locus, the distance from each locus to the nearest TAD boundary, and the predicted number of enhancer interactions for each locus. In one such example, where cells comprising RTS integrated into the HI locus are selected, the cells can be selected according to the ordering of the HI locus insertion sites.
In one embodiment, the method of defining the HI locus may also depend on whether the HI locus is intended for expression of a heterologous gene driven by an endogenous promoter in situ or a heterologous promoter. For example, in those embodiments in which expression of a gene from the HI locus will be driven by an endogenous promoter in situ, a method may further comprise identifying within the resulting set of mapped peaks as defined above those peaks that overlap with the TSS of an active gene (such as its expression product or its lack of an unimportant active gene). A second set of peaks overlapping the identified genes and located downstream of the TSS of the identified genes can then be defined, and a HI locus can be defined within the second set of peaks.
In those embodiments in which expression of a gene from an HI locus is to be driven by a heterologous promoter, a method may further comprise identifying within the resulting set of mapped peaks as defined above a second set of peaks that do not overlap with any gene (e.g., any active gene or its associated promoter region), and an HI locus may be defined within this second set of peaks.
One method may further comprise transfecting a selected cell comprising RTS integrated into the HI locus with a vector comprising an exchangeable cassette encoding a GOI, and integrating the exchangeable cassette into the HI locus. Cells containing the exchangeable cassette integrated into the chromosome can then be selected as recombinant protein producing cells.
Optionally, the method can comprise incorporating an additional RTS into the cell. For example, additional RTS can be incorporated into the first HI locus, into one or more additional HI loci, and/or into one or more separate loci.
Drawings
A full and enabling disclosure of the present subject matter, including the best mode thereof to one of ordinary skill in the art, is set forth more particularly in the remainder of the specification, including reference to the accompanying figures wherein:
fig. 1 presents a flow diagram illustrating one embodiment of a method for generating a 3D map of a genome and using it to define and rank candidate HI loci. This figure shows a summary of the sequential filtering or screening process by which data used to generate a multi-level 3D genomic map can then be used to identify candidate HI loci.
FIG. 2A shows a portion of a genome-wide Hi-C heatmap of data mapped to the LACHESISS module at the resolution of a single CHO-K1SV protoscaffold. Only cis interactions were mapped and the minimal LACHESIS groups 7, 8 and 9 were not included due to visual clarity.
FIG. 2B shows a 100% stacked bar graph showing the average percentage of near cis (< 10kb), far cis (> 10kb), and trans unique effective ditags in CHO-K1SV 10E9 Hi-C replicates mapped to a single input CHO-K1SV scaffold and final LACHESISS module. For comparison, the distribution of near-cis, far-cis and trans ditags averaged over each replicate of equivalent Hi-C datasets from human embryonic stem cells and mouse fetal hepatocytes was included (Nagano, t.et al. company of Hi-C causes using in-solution in-nuclear ligation. genome biol.16, 175 (2015)).
Fig. 3A shows candidate HI loci SEQ ID No.: 3 (positions indicated by diamonds). The results for Hi-CPCA indicate that the candidate locus is located in an active euchromatic-like region (left). The position of candidate loci associated with TAD identified in the vicinity (middle). The interaction profile of the candidate loci HindIII restriction fragments marked with the ATAC-Seq, H3K4me3, H3K27ac and H3K4me1 signals and the location of the induced promoter HindIII restriction fragment (right).
Fig. 3B shows candidate HI loci SEQ ID No.: 2 (positions indicated by diamonds). The results for Hi-CPCA indicate that the candidate locus is located in an active euchromatic-like region (left). The position of candidate loci associated with TAD identified in the vicinity (middle). The interaction profile of the candidate loci HindIII restriction fragments marked with the ATAC-Seq, H3K4me3, H3K27ac and H3K4me1 signals and the location of the induced promoter HindIII restriction fragment (right).
FIG. 3C shows the structural features (locations indicated by diamonds) of the currently industry relevant Fer1L4 landing pad. The results for Hi-CPCA indicate that the candidate locus is located in an active euchromatic-like region (left). The position of candidate loci associated with TAD identified in the vicinity (middle). Interaction profiles of candidate loci HindIII restriction fragments marked with ATAC-Seq, H3K4me3, H3K27ac, and H3K4mel signals and the location of the induced promoter HindIII restriction fragment (right).
Fig. 4A-4D show the results of screening a subset of genomic loci taken from table 1 for expression of an integrated eGFP reporter cassette under the control of a CMV promoter. Candidate loci were identified by the screening process described in figure 1 and empirically tested by targeting the same CMV-eGFP expression cassette to the locus using Cas9 nuclease binding site-specific guide RNA. The CMV-eGFP cassette was transfected into cells contained within the donor plasmid shown in fig. 4A, which also expressed the "pseudo gRNA" sequence required for Cas 9-mediated cleavage of the CMV-eGFP cassette from the plasmid in vivo after transfection. Once released from the plasmid, the CMV-eGFP cassette is targeted for integration into the desired genomic locus by expression of the locus-specific gRNA, cloned into the donor plasmid upstream of the gRNA scaffold sequence at the BbsI site. Cas9 nuclease was provided upon co-transfection on a separate plasmid (not shown). Fig. 4B shows the percentage of GFP positive cells obtained in pools of chinese hamster ovary SSI 10E9 cell line thirteen days after transfection with Cas9 and CMV-eGFP donor plasmids (Zhang et a1., Biotechnol prog.2015: 31(6)1645-56), where the median GFP signal of GFP + cells of each pool is shown in fig. 4C. In fig. 4C, two bars per locus represent a technical replica of flow cytometry analysis. To confirm target integration of the CMV-eGFP cassette in each pool, PCR-based assays were used on the extracted genomic DNA (fig. 4D). PCR products are only produced when the target genome is integrated, with no PCR products being produced when only the donor plasmid ('D') is used as a template. 'Donor' refers to donor plasmid, 'Het control' refers to heterochromatin control integration site, where 'Fer 1l 4' refers to the landing pad mentioned below with 10E9 cell line.
Detailed Description
It is to be understood by one of ordinary skill in the art that the present discussion is a description of exemplary embodiments only, and is not intended as limiting the broader aspects of the present disclosure.
The present disclosure relates generally to the construction of 3D maps of the genome of cells, and in one particular embodiment to the construction of 3D maps of the genome of chinese hamster ovary cells. The use of such a map to identify high performance integration sites (HI loci) from which recombinant transgenes can be expressed is also disclosed. In one particular embodiment, described further herein, 3D maps can be generated by using a combination of orthogonal methods such as ATAC-Seq (determination of transposase accessible chromatin using sequencing) (Buenrostro et al 10: 1213-8(2013)), Hi-C and promoter trapping Hi-C in combination with RNA-Seq data on whole genome transcriptional activity and a dataset of methylation and acetylation of nuclear histones. By these methods, a global image of the 3D genome and its expression profile can be generated, which can provide information for the identification and design of the H1 locus.
According to one embodiment, a mammalian cell comprising RTS integrated within the HI locus is disclosed. Also disclosed are rP producer cell lines that incorporate the mammalian cells and methods for forming such mammalian cells. The HI loci described herein and methods for identifying HI loci in the genome of a cell have been developed by understanding and mapping the 3D hierarchical structure of chromatin in mammalian cells. The HI locus is present in a transcriptionally active environment that can provide chromatin accessibility and epigenetic stability. Thus, SSI mammalian cells incorporating RTS at one or more HI loci (i.e., completely within, overlapping, or +/-about 5 Kb) can provide predictable and stable transgene production. For example, expression of the disclosed GOIs in mammalian cells can be stable for about 70, about 100, about 150, about 200, or about 300 passages. As used herein, expression can be considered "stable" if the expression level is reduced by about 30% or less, or maintained at the same level or increased level (e.g., about 30% or more) over time, as compared to the initial expression level immediately following initiation of production. In some embodiments, expression is considered stable if the volumetric productivity varies by less than ± 30%, or remains at the same level. In some embodiments, the SSI host cell may produce about 1.5g/L, about 2g/L, about 3g/L, about 4g/L, or about 5g/L or more of the GOI expression products. In some embodiments, SSI cells (e.g., SSI cell lines) may be maintained in culture without further selection. Thus, the disclosed cell lines may be more readily accepted by regulatory agencies.
As used herein, the term "about" is used to indicate that the value includes inherent variations in error of the method/apparatus used to determine the value, or variations that exist between study subjects. Generally, the term is meant to encompass variability of about or less than 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, or 20%, as the case may be.
In one embodiment, the mammalian cell may be derived from a Chinese Hamster Ovary (CHO) cell. While much of the discussion relates to CHO cells and cell lines, it should be understood that the present disclosure is in no way limited to any particular cell type, and that the term "mammalian cell" as described herein encompasses cells from any member of the mammalian order. Mammalian cells encompassed herein may include, but are not limited to, human cells, mouse cells, rat cells, monkey cells, hamster cells, bovine cells, and the like. In some embodiments, the mammalian cell is a mouse cell (e.g., a mouse myeloma such as NS0 or SP2/0 cell line), a human cell, a Chinese Hamster Ovary (CHO) cell, a CHO-K1 cell, a CHO-DXB11 cell, a CHO-DG44 cell, a CHOK1SV cell comprising all variants), a recombinant human cell, a mouse myeloma cell, a human cell, a mouse myeloma, a human cell, a mouse myeloma, a human cell, a mouse cell, a human cellTMCells (e.g., CHOK1 SV)TMLonza, Slough, UK), CHO glutamine synthetase knock-out cells containing all variants (e.g. GS-KOTM、XceedTM) DG44 CHO cells, DUXB11 CHO cells, CHOs, CHO FUT8 GS knockout cells, CHOZN or any CHO derived cell.
According to one embodiment, the HI loci naturally occurring in the genome can be identified, and using this identification, mammalian cells can be developed that incorporate a heterologous nucleic acid molecule chromosomally integrated at one or more HI loci. For example, the heterologous nucleic acid molecule can comprise an exogenous cassette designed to express a GOI in the formation of a cell line for the production of recombinant proteins.
As used herein, the terms "nucleic acid," "nucleic acid molecule," and "oligonucleotide" are interchangeable and refer to a polymeric compound that includes covalently linked nucleotides. The term encompasses poly (ribonucleic acid) (RNA) and poly (deoxyribonucleic acid) (DNA), both of which may be single-stranded or double-stranded. DNA includes, but is not limited to, complementary DNA (cDNA), genomic DNA, plasmid or vector DNA, and synthetic DNA. RNA includes, but is not limited to, mRNA, tRNA, rRNA, snRNA, microRNA, miRNA or MIRNA.
As used herein, the terms "peptide," "polypeptide," and "protein" are interchangeable and refer to polymeric forms of amino acids of any length, which may include coded and non-coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones. The terms "chain" and polypeptide "chain" are used interchangeably herein and refer to a polymeric form of amino acids of a single peptide backbone. The term "amino acid" refers to both natural and unnatural (i.e., synthetic) amino acids.
As used herein, the term "recombinant" when used in reference to a nucleic acid molecule, peptide, polypeptide, or protein refers to a new combination of genetic material not known to exist in nature or result therefrom. Recombinant molecules can be produced by any well-known technique available in the art of recombinant technology, including, but not limited to, Polymerase Chain Reaction (PCR), gene cleavage (e.g., using restriction enzymes), DNA ligation (e.g., using DNA ligase), RI, RMCE, CRISPR-mediated techniques, solid state synthesis of nucleic acid molecules, peptides or proteins, and combinations of techniques. In some embodiments, "recombinant" refers to a viral vector or virus that is not known to exist in nature, e.g., a viral vector or virus having one or more mutations, nucleic acid insertions, or heterologous genes in the viral vector or virus. In some embodiments, "recombinant" refers to a cell or host cell that is not known to exist in nature, e.g., a cell or host cell that has one or more mutations, nucleic acid insertions, or heterologous genes in the cell or host cell.
As used herein, the term "gene" refers to a collection of nucleotides encoding polypeptides, and includes cDNA and genomic DNA nucleic acid molecules. "Gene" also refers to a nucleic acid fragment that can serve as a regulatory element before (5 'non-coding sequence) and after (3' non-coding sequence) a coding sequence. The heterologous gene may be integrated into the host cell genome in a single copy, multiple copies, and/or at a predetermined copy number.
As used herein, the term "regulatory element" refers to a genetic element that controls some aspect of the expression of a nucleic acid sequence.
As used herein, the terms "promoter", "promoter sequence" or "promoter region" are interchangeable and refer to a DNA regulatory region/sequence capable of binding RNA polymerase and participating in initiating transcription of downstream coding or non-coding sequences. In some examples of the disclosure, the promoter sequence comprises a transcription start site (also referred to herein as a Transcription Start Site (TSS)), and extends upstream to comprise the minimum number of elements necessary to begin transcription at detectable levels above background. In some embodiments, the promoter sequence comprises a TSS and a protein binding domain responsible for binding RNA polymerase. Eukaryotic promoters will typically (but not always) contain "TATA" boxes and "CAT" boxes. Various promoters (including inducible promoters, leaky promoters, synthetic promoters, etc.) may be used to drive gene expression in the host cells and/or vectors of the present disclosure.
As used herein, the term "heterologous" refers to a nucleic acid sequence, such as a promoter optionally operably linked to a GOI, that is derived from a different species than the host cell in which it is located or derived from the same species, but naturally occurs at a different location in that species (or host cell). The heterologous nucleic acid sequence may be from a prokaryotic system or a eukaryotic system. The coding or non-coding sequence associated with the heterologous regulatory sequence (e.g., a sequence that is downstream of and transcribed by the initiation of a heterologous promoter) can be endogenous to the heterologous regulatory sequence (e.g., the heterologous promoter is operably linked to the sequence in its natural environment), or heterologous to the heterologous regulatory sequence (e.g., the heterologous promoter is not operably linked to the sequence in its natural environment).
As used herein, the term "endogenous" refers to a nucleic acid sequence that is naturally present in a host cell. For example, an endogenous promoter may be operably linked to initiate transcription of a downstream coding or non-coding sequence heterologous to the host cell.
As used herein, the terms "in operable combination," "in operable order," and "operably linked" are interchangeable and refer to the joining of nucleic acid sequences in a manner that results in a nucleic acid molecule capable of directing the transcription of a given gene and/or the synthesis of a desired protein molecule. The term also refers to the linkage of amino acid sequences in such a way as to produce a functional protein. For example, a GOI, helper gene, recombinase-encoding gene, or non-coding sequence can be operably linked to a promoter, and the nucleic acid sequence can be chromosomally integrated into a host cell.
As used herein, the term "chromosomally-integrated" or "chromosomal integration" refers to a nucleic acid sequence that stably incorporates the nucleic acid sequence into the chromosome of a host cell, such as a mammalian cell, i.e., chromosomally integrates into the genomic dna (gdna) of the host cell (e.g., a mammalian cell).
As used herein, the terms "chromosomal locus" and "locus" (plural "loci") are used interchangeably and refer to a defined location of a nucleic acid on a chromosome of a cell. In some embodiments, a locus can include at least one gene. For example, a chromosomal locus may comprise about 500 base pairs to about 100,000 base pairs; about 5,000 base pairs to about 75,000 base pairs; about 5,000 base pairs to about 60,000 base pairs; about 20,000 base pairs to about 50,000 base pairs; about 30,000 base pairs to about 50,000 base pairs; or about 45,000 base pairs to about 49,000 base pairs. In some embodiments, a chromosomal locus can extend to about 100 base pairs, about 250 base pairs, of the 5 'and/or 3' end of a defined nucleic acid sequence; about 500 base pairs; about 750 base pairs; about 1,000 base pairs; or about 5,000 base pairs.
In one embodiment, a method can comprise identifying a HI locus in a genome. The HI locus may be located in the active genomic compartment of accessible chromatin and may be located within about 30,000 base pairs in the 5 'or 3' direction of the topologically relevant domain boundaries. In one embodiment, the first set of peaks may be in the active genomic compartment (e.g., as defined by principal component analysis method (PCA)) and may also be in open chromatin (e.g., as defined by ATAC-seq), although this is not a requirement of the method, and in other embodiments, the first set of peaks may comprise those peaks in the active genomic compartment throughout the mapped accessible chromatin. The HI locus may also overlap with a region that interacts with at least one enhancer element. Thus, the identification of the HI locus may comprise 3D mapping of the genome to identify a set of peaks that meet these criteria.
As used herein, the terms "topologically-related domain" and "TAD" and "contact domain" are used interchangeably and refer to highly conserved genomic regions containing nucleic acid sequences that preferentially interact physically with each other. Thus, nucleic acid sequences within a TAD will physically interact with each other more frequently than sequences present outside the TAD limits. TAD can extend from thousands to millions of base pairs. TAD may be separated by a boundary region ("TAD boundary"), which may be rich in factors associated with active transcription. For example, TAD boundary regions may exhibit relatively high levels of CTCF binding. TAD border regions can also be identified by the presence of relatively large numbers of tRNA genes and housekeeping genes (e.g., actin, GAPDH, ubiquitin, etc.).
As used herein, the terms "enhancer", "enhancer element", "putative active enhancer element" and "predicted active enhancer element" are used interchangeably and refer to DNA regulatory regions/sequences capable of increasing the transcription rate of a target gene that do not overlap with the 2Kb region upstream or downstream of the annotated transcription start site, but are rich in ATAC-Seq signals (indicating open accessible chromatin) and H3K4me1 and H3K27ac histone marks (Shlyueva et al 2014.nat Rev Genet.15: 272-86) as shown by ChromHMM analysis (see, e.g., Ernst and Kelis M.Nat Protoc.12: 2478-2492 (2017)).
The term "enhancer element" may also comprise an "interacting putative active enhancer restriction fragment" which refers to a HindIII restriction fragment which does not contain a tagged Transcription Start Site (TSS) per se and/or overlaps with a H3K27me3 or H3K9me3 histone-rich labeled genomic region (as shown by chromohmm analysis), but does overlap with the putative active enhancer (as defined above), and does interact in cis and in multiple PCHi-C (promoter capture Hi-C) copies, wherein the HindIII restriction fragment contains a tagged TSS.
Enhancer elements can be linked to a promoter that encodes or non-encodes a sequence, and can be located upstream or downstream of the promoter and associated gene. Enhancer elements can generally exhibit activity when placed in either orientation, and enhancers can be active when located at a substantial distance from a promoter. For example, the enhancer element may be located about 1,000,000 upstream or downstream of the TSS, and may or may not be adjacent to the TSS. Methods for detecting enhancer activity are known in the art, see, for example, Molecular Cloning, A Laboratory Manual, Second Edition, (Sambrook Fritsch, Maniatis, eds., Cold Spring Harbor Laboratory Press, Cold Spring Harbor N.Y., 1989). The activity associated with such enhancer elements-viral sequences are described first (Banerji et al, 1981, Moreau et al, 1981) and subsequently sequences derived from the epigenetic locus (Banerji et al, 1983, Gillies et al, 1983) -involve activation of transcription regardless of the position or orientation of the element in the plasmid construct relative to the promoter.
As shown in fig. 1, one method may comprise identifying peaks in accessible chromatin. As used herein, the term "peak" refers to a region of the genome that comprises an increase in the number of DNA sequencing reads (i.e., the sequencing read depth). For example, as revealed by ATAC-Seq, an increase in the normalized background model for sequencing reads depth above the genomic region may indicate open chromatin, while an increase in the number of sequencing reads above a set threshold between two HindIII restriction fragments from the PCHi-C experiment (e.g., a normalized CHiCAGO score of 5 or more; Cairns J, et al, Genome biology.2016.17: 127) would indicate a statistically significant cis-interaction between the two genomic regions. The term "peak" may also refer to an increase in the frequency of contacts between two points in a genome above a predetermined threshold as revealed by techniques such as Hi-C and PCHi-C.
In some embodiments, peak identification may be performed as a result of performing a sequence protocol, for example, the ChiP-sequencing or MeDIP-seq (methylated DNA immunoprecipitation sequencing) protocol. Any peak calling tool as known in the art may be used to identify peaks as defined herein. Many known peak calling tools are optimized only for a certain type of assay, such as only for the transcription factor ChIP-seq or only for DNase-seq. However, the peak identification methods contemplated herein are not limited to such tools and any peak calling methods and software, including, but not limited to, DFilter, GEM, MACS2(Zhang et al model-based Analysis of ChIP-seq (MACS)TMAnd ZINBA. The peak calling method may include a method based on a generalized optimal detection theory and a method capable of utilizing different types of sequencing data.
The data set selected for mapping and identifying peaks in the sequence of interest can be optimized according to the type of peak identified. Furthermore, peaks can be identified by using multiple data sets as reference sequences. For example, peaks can be identified by using simulated ChIP-seq datasets, real datasets, and combinations thereof in conjunction with mathematical analysis (e.g., ordering candidate peaks using a poisson test). The data set may include, but is not limited to, ChIP-Seq, ATAC-Seq (see, e.g., U.S. patent application publication No. 2016/0060691 to Giresi et al; Buenrostro, et al.2015 "ATAC-Seq: A method for using chromatography in access geometry gene-wide." Curr Protoc Mol Bio 109: 21.29.1-21.29.9), Hi-C, promoter trap Hi-C (PCHiser-C) (see, e.g., U.S. patent application publication No. 2016/0194713 to Fraser et al), RNA-Seq, and any combination thereof. Other datasets can be used as known in the art, for example, the Feichtinger ChiP-Seq dataset (accession number-PRJEB 9291) (see, e.g., Feichtinger et al Biotechnol Bioeng.113 (10): 2241-53 (2016)). In a certain example, multiple datasets (e.g., multiple Hi-C datasets) can be used to assemble Chromosome-scale de novo reference genomic data that can be used to identify Hi loci in a sequence of interest using, for example, SALSA or LACHESIS software (see, e.g., Burton, et al, 2013 "Chromosome-scale scanning of nucleotide constructs based on Chromosome interactions" Nat Biotechnol 31: 1119-.
As shown in fig. 1, the HI locus may be located in the active genomic compartment of accessible chromatin (see also fig. 3). Thus, identification of HI loci on the genome can comprise initial identification of peaks in accessible chromatin (e.g., by using a peak call algorithm that utilizes ATAC-seq), followed by analysis to determine which of those peaks are present in the active genomic compartment, as shown in figure I. It should be understood that the particular order of identification steps shown in figure I is merely representative, and the disclosed methods are not limited to any particular order in which various aspects of the genome are mapped. For example, in the example shown in fig. 1, the step of identifying all peaks in accessible chromatin in the active genomic compartment is performed before identifying peaks located within 30Kb of TAD, but the specific order of these and other steps in this example may be varied.
According to one embodiment, peaks of accessible chromatin in the active genomic compartment of the sequence of interest can be identified by comparing the genomic sequence of interest to a reference sequence. The reference sequence may be a single known sequence or may be combined by compiling known sequences (e.g., by using LACHESIS software and multiple Hi-C and/or PCHi-C datasets). In one embodiment, the reference sequence can be examined to identify all peaks of interest, e.g., all ATAC-Seq peaks of the reference sequence. Comparison between peaks in accessible chromatin and peaks in the active genomic compartment can provide a set of peaks present in the active genomic compartment of accessible chromatin of the reference sequence. After mapping the sequence of interest to the reference sequence, a filtering protocol can be performed to identify peaks in the sequence of interest that are located in accessible chromatin-neutralizing active genomic compartments.
The HI locus may also be located within about 30,000 base pairs of the TAD border region. Thus, in one embodiment as shown in fig. 1, after identifying a set of peaks present in the sequence of interest in the active genomic compartment of accessible chromatin, the set of peaks can be further analyzed to determine which of these peaks are also within about 30,000 base pairs (upstream or downstream) of the TAD border region. This can be done by mapping the sequence of interest with the same or different reference sequences. If necessary, TAD border regions can be identified in the reference sequence prior to mapping. In one embodiment, the TAD boundary regions may be identified according to methods described using "directionality index" (see, e.g., in Dixon et al, 2012, "polar domains in macromolecular genes identified by analysis of chromatography in interactions," Nature.485 (7398): 376-80). Of course, other methods and tools for identifying TAD boundary regions may be equally employed.
In one embodiment (further described in the examples section below), identification of active genomic compartment and TAD boundary positions can be performed by comparing a reference sequence (e.g., a genomic assembly, a compilation of one or more Hi-C datasets, etc.) to a sequence of interest, e.g., by applying an algorithm to a genomic assembly obtained by using LACHESIS software mapped to the sequence of interest. After the TAD boundaries are identified, and by using one or more reference genomic sequences intact over at least the active genomic compartment of the accessible chromatin portion of the genome, peaks within about 30,000 base pairs of each TAD boundary can be identified.
As shown in the example shown in fig. 1, the set of peaks identified as within about 30,000 base pairs of the TAD boundary and also within the active genomic compartment of accessible chromatin can be further examined to determine which of these peaks also overlap with regions in the genome that interact with at least one enhancer element (typically cis-interactions, although trans-interactions are also encompassed herein). For example, a method may comprise identifying a region of a genome that interacts with at least one enhancer element using a data set, such as, but not limited to, a PCHi-C, ATAC-Seq, a ChIP-Seq, a ChromHMM, or a combination thereof. In one embodiment, statistically significant predictions of enhancer interaction can be identified by PCHi-C and ChrommHMM analysis of reference sequences mapped against the sequence of interest. The peaks previously identified in the sequence of interest may then be further filtered to contain only peaks that interact with enhancer elements. This further filtering can narrow the set of peaks to those that fall within these regions. The resulting set of filtered peaks can be used to identify HI loci of a genome, i.e., each of these peaks can define a potential HI locus of a genome.
Further modification of the HI locus may be made according to the type of promoter intended to drive transcription of the heterologous gene to be inserted into the genome.
In those embodiments in which a heterologous promoter is used for GOI transcription, the HI locus preferably does not overlap with any gene of the genome. In one example, the HI loci can comprise those loci that do not overlap with any active genes of the genome, but embodiments incorporating heterologous promoters are not limited to lack of overlap with active genes. In one embodiment, the HI locus will not overlap with any promoter of any gene or any promoter of any active gene of the genome. In one embodiment, the HI locus will not fall within about 1,000 base pairs on either side of any such promoter. Thus, in one embodiment, a method can further comprise filtering potential HI loci previously obtained by remapping the reference sequence to the sequence of interest to identify peaks outside of these regions of the sequence of interest (e.g., the active gene and its associated promoter region (about 1,000 base pairs of the promoter)). These peaks can then be identified as the desired HI locus.
The HI locus used in those embodiments in which an endogenous promoter in situ is used for GOI transcription may overlap with the endogenous TSS in situ of the active gene whose expression or lack of expression is not important to the cell, i.e., the recombinant cell may survive without the active gene. Thus, as shown in the scheme on the right side of fig. 1, a method may further comprise filtering potential HI loci previously obtained by remapping the reference sequence to the sequence of interest to identify non-essential active genes and their associated TSSs in the active compartments of accessible chromatin. For example, it may also be examined whether the gene of interest has other characteristics that may affect the use of the gene promoter in the expression of the inserted RTS, such as lethality. Those peaks that overlap with these regions of the appropriate gene can then be identified as the desired HI locus.
The resulting set of peaks that fit into all desired classes for a particular application may provide the HI loci of the genome. For example, a HI locus for use in applications involving the use of a heterologous promoter may comprise a peak located within about 30,000 base pairs (upstream or downstream) of the TAD boundary in an active genomic compartment of accessible chromatin. In addition, these HI loci may overlap with regions in the genome that interact with enhancer elements, and typically do not overlap with genes or their associated promoter regions.
The HI loci used in applications involving the use of endogenous promoters in situ may also comprise peaks located within about 30,000 base pairs (upstream or downstream) of the TAD border in the active genomic compartment of accessible chromatin, and these HI loci may also overlap regions in the genome that interact with enhancer elements. In addition, these HI loci will overlap with endogenous TSS of active genes that are restricted to active genomic compartments of accessible chromatin and have functions that have been classified as unimportant to the cell.
In one embodiment, a method may comprise ranking HI loci after their identification. For example, the HI locus may be ranked based on one or more expression levels of one or more genes associated with the locus, the distance from the locus to the nearest TAD boundary, the number of predicted enhancer interactions, and the steady state mRNA levels of the one or more genes associated with the locus. For example, in one embodiment, each identified HI locus may be ranked according to only a single parameter, and these multiple rankings for all HI loci may then be analyzed to determine an overall ranking. The combined analysis may be weighted or unweighted as desired. For example, according to a non-weighted combination method, a simple additive score for each ranking for each locus can be used to determine the overall ranking. Highly ordered loci (e.g., loci associated with highly expressed genes) near the nearest TAD boundary and predicted to have substantial enhancer interactions may be highly desirable loci for insertion into RTS.
By using the method, the HI locus can be identified in any mammalian cell. For example, table 1 below provides examples of CHO genomic HI loci identified according to the disclosed methods. However, it should be understood that the CHO genomic HI locus is in no way limited to the locus of table 1 and is identical to SEQ ID No.: 1-125 is included herein. In other embodiments, the CHO genomic HI locus may be about 5,000 base pairs, about 1,000 base pairs, about 750 base pairs, about 500 base pairs, about 250 base pairs, or about 100 base pairs of the 5 'and/or 3' end of the locus as identified in table 1 below.
The HI locus may have a small number of mismatches or gaps compared to the sequences of table 1. For example, a CHO genomic HI locus encompassed herein may have about 10 or fewer mismatches with the sequences described below. For example, a CHO HI locus encompassed herein may have 10, 9, 8, 7,6, 5, 4, 3, 2, or 1 mismatches to a sequence as set forth in table 1, and/or may have 5 or fewer gaps as compared to a sequence as set forth in table 1.
The HI locus as defined herein may further comprise SEQ ID No.: 1-125, and is not limited to SEQ ID No.: 1-125. For example, the HI locus may comprise a genomic sequence that is identical to only SEQ ID No.: 1-125, e.g., a sequence identical to a portion of any one of SEQ ID No.: about 5bp to about 98% or less of the region of any of 1-125 is equivalent or homologous. For example, a HI locus encompassed herein may comprise an amino acid sequence identical to SEQ ID No.: 1-125, about 5bp to about 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% identical or homologous to the total length of any one of the sequences.
As used herein, the term "homolog" or "homologous sequence" refers to a nucleotide sequence having sequence homology to a specifically given comparative sequence, e.g., to SEQ ID No.: 1-125 or any one of SEQ ID No.: 1-125 has sequence homology. As used herein, the term "sequence homology" refers to a measure of the degree of identity or similarity of two sequences based on a sequence alignment that maximizes the similarity between the aligned nucleotides and is a function of the number of identical nucleotides, the number of total nucleotides, and the presence and length of gaps in the sequence alignment. Various algorithms and computer programs are available for determining sequence similarity using standard parameters. In one example, sequence homology can be determined using the BLASTn program for nucleic acid sequences, which is available through the national center for Biotechnology information (www.ncbi.nlm.nih.gov /), and is described, for example, in Altschul et al (1990), J mol.biol.215: 403-; gish and States (1993), Nature Genet.3: 266-272; madden et al, (1996), meth. enzymol.266: 131-141; altschul et al (1997), nucleic Acids Res.25: 3389-; zhang et al, (2000), j.comput.biol.7 (1-2): 203-14. In one embodiment, sequence homology between two nucleotide sequences can be determined by scoring based on the following parameters of the BLASTn algorithm: the word length is 11; gap opening penalty-5; -2, a gap extension penalty; matching the reward is 1; and a mismatch penalty of-3.
The sequences of Table 1 below refer to the publicly available BGI CHO databases andpublicly available NCBI genetic sequence databaseThe GenBank component accession number of the sequence of table 1 is GCA _000223135.1 and the BGI CHO RefSeq component accession number of the sequence of table 1 is GCF _000223135.1, filed by the beijing genome institute on 23/8/2011. The "start" and "end" numbers referred to in table 1 refer to the start and end nucleotides of each HI locus in the complete sequence that is publicly available.
TABLE 1
According to one embodiment, upon identifying the HI locus of the genome, the mammalian cell can be modified to comprise a landing pad at the HI locus of the genome. For example, in one embodiment, a particular HI locus may be selected (e.g., by ordering the identified HI loci), and an RTS may be inserted at that locus to form a site-specific integration site (e.g., within or overlapping with any one of SEQ ID No.: 1-125, or within or overlapping with about 5,000 base pairs, about 1,000 base pairs, about 750 base pairs, about 500 base pairs, about 250 base pairs, or about 100 base pairs of the 5 'end or 3' end of any one of SEQ ID No.: 1-125).
In one embodiment, an integration protocol can be performed to integrate expression cassettes into the genome of multiple cells randomly. For example, in one embodiment, a random integration protocol can be performed and an expression cassette carrying a detectable label can be integrated into a cell. Subsequently, the cells can be examined to determine the integration site of the cassette, and cells comprising the integration site at the HI locus (e.g., in one embodiment, a high-ranked HI locus) can be selected. The selected cells can then be used to establish landing pads at the HI locus (e.g., within or overlapping any one of SEQ ID No.: 1-125, or about 5,000 base pairs, about 1,000 base pairs, about 750 base pairs, about 500 base pairs, about 250 base pairs, or about 100 base pairs within the 5 'end or the 3' end of any one of SEQ ID No.: 1-125).
As used herein, the term "landing pad" refers to a nucleic acid sequence that includes RTS chromosomally integrated into a host cell. In some embodiments, the landing pad comprises RTS integrated into the host cell from two or more chromosomes. The landing pad can be integrated into one or more different chromosomal loci. For example, different landing pads can be integrated into 1, 2, 3, 4, 5,6, 7, or 8 different chromosomal loci, and one or more of the different chromosomal loci can be a HI locus.
As used herein, the terms "site-specific integration site," "recombinant target site," "RTS," and "site-specific recombinase target site" are used interchangeably and refer to a short, e.g., less than about 60 base pairs, nucleic acid site or sequence that is recognized by a site-specific recombinase and can be a crossover region in a site-specific recombination event. In some embodiments, a recombinant target site may be less than about 60 base pairs, less than about 55 base pairs, less than about 50 base pairs, less than about 45 base pairs, less than about 40 base pairs, less than about 35 base pairs, or less than about 30 base pairs. In some embodiments, the recombinant target site may be about 30 to about 60 base pairs, about 30 to about 55 base pairs, about 32 to about 52 base pairs, about 34 to about 44 base pairs, about 32 base pairs, about 34 base pairs, or about 52 base pairs. Examples of site-specific recombinase target sites include, but are not limited to, lox sites, rox sites, frt sites, att sites, and dif sites. In some embodiments, the recombinant target site is a polypeptide having an amino acid sequence identical to SEQ ID No.: 126 and 155.
In some embodiments, RTS is a lox site selected from table 2. As used herein, the term "lox site" refers to a nucleotide sequence that Cre recombinase can catalyze site-specific recombination. A variety of non-identical lox sites are known in the art. The sequences of the various lox sites are similar in that they all contain identical 13-base pair inverted repeats flanking the 8-base pair asymmetric core region in which recombination occurs. It is the asymmetric core region that determines the directionality of the sites and the differences between different lox sites. Illustrative (non-limiting) examples of these include naturally occurring loxP (a sequence in the genome of P1), loxB, loxL and loxR (which are found in the escherichia coli chromosome) as well as some mutant or variant lox sites such as loxP 511, loxA86, lox Δ 117, loxC2, loxP 3 and loxP 23. In some embodiments, the lox recombination target site is a nucleic acid having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to a sequence in table 2.
TABLE 2
As used herein, the term "sequence identity" or "% identity" in the context of nucleic acid sequences or amino acid sequences refers to the percentage of residues in the sequences compared that are identical when the sequences are aligned over a specified comparison window. The comparison window may be a fragment of at least 10 to more than 1,000 residues over which sequences may be aligned and compared. Alignment methods for determining sequence identity are well known in the art and can beUsing publicly available databases such as BLAST: (blast.ncbi.nlm.nih.gov/Blast.cgi) The process is carried out.
In some embodiments, RTS is a lox site selected from lox Δ 86, lox Δ 117, loxC2, loxP 2, loxP 3, and loxP 23.
In some embodiments, the RTS is a Frt site selected from table 3. As used herein, the term "Frt site" refers to a nucleotide sequence of FLP gene product FLP recombinase from a yeast 2 μm plasmid that can catalyze site-specific recombination. A variety of non-identical Frt sites are known in the art. The sequences of the various Frt sites are similar in that they all contain identical 13-base pair inverted repeats flanking the 8-base pair asymmetric core region in which recombination occurs. It is the asymmetric core region that determines the directionality of the site and the differences between different Frt sites. Illustrative (non-limiting) examples of these include naturally occurring Frt (F) and several mutations or variant Frt sites, such as Frt F1 and Frt F2. In some embodiments, the Frt recombination target site is a nucleic acid having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to a sequence in table 3.
TABLE 3
In some embodiments, the RTS is a rox site selected from table 4. As used herein, the term "rox site" refers to a nucleotide sequence that Dre recombinase can catalyze site-specific recombination. A variety of non-identical rox sites are known in the art. Illustrative (non-limiting) examples of these sites include roxR and roxF. In some embodiments, the rox recombination target site is a nucleic acid having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to a sequence in table 4.
TABLE 4
Name (R) Identifier Sequence of
roxF SEQ ID NO.:147 TAACTTTAAATAATGCCAATTATTTAAAGTTA
roxR SEQ ID NO.:148 TAACTTTAAATAATTGGCATTATTTAAAGTTA
In some embodiments, the RTS is an att site selected from table 5. As used herein, the term "att site" refers to a lambda integrase orIntegrases can catalyze site-specific recombination nucleotide sequences. A variety of non-identical aat sites are known in the art. Illustrative (non-limiting) examples of such sites include attP, attB, proB, trpC, galT, thrA, and rrnB. In some embodiments, the att recombination target sites are nucleic acids having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity to sequences in table 5.
TABLE 5
In some embodiments, a cell can comprise multiple (e.g., at least four) RTS, e.g., multiple different RTS, and any useful combination of RTS can be used. As used herein, the term "different recombinant target site" or "different RTS" refers to recombinant target sites that are not identical or heterospecific. For example, there are several variant Frt sites, but recombination can usually only occur between two identical Frt sites. In some embodiments, the different recombination target sites refer to non-identical recombination target sites from the same recombination system (e.g., LoxP and LoxR). In some embodiments, different recombination target sites refer to non-identical recombination target sites from different recombination systems (e.g., LoxP and Frt). In some embodiments, the different recombinant target sites refer to a combination of recombinant target sites from the same recombinant system and recombinant target sites from different recombinant systems (e.g., LoxP, LoxR, Frt, and Frtl). For example, in some embodiments, the mammalian cell can comprise at least two different RTS, wherein at least one RTS is chromosomally integrated into the HI locus and at least one RTS is chromosomally integrated into a chromosomal locus selected from Fer1L4 (see, e.g., U.S. patent application No. 14/409,283), ROSA26, HGPRT, DHFR, COSMC, LDHA, or MGAT 1.
Cells incorporating RTS at the HI locus can be further processed to produce recombinant protein producing cells. In addition to RTS, recombinant protein producing cells can also contain genes encoding site-specific recombinases. Recombinases (recombinase enzymes), also known as recombinases (recombiases), are enzymes that catalyze recombination in site-specific recombination. In one embodiment, the recombinase that can be used for site-specific recombination can be from a non-mammalian system. For example, the recombinase may be from a bacterium, phage, or yeast.
In some embodiments, the nucleic acid sequence encoding the recombinase can be integrated into the host cell. For example, nucleic acid sequences encoding the recombinase can be delivered to the host cell by methods known in molecular biology. In some embodiments, the recombinase polypeptide sequence can be delivered directly to the cell.
Can useExamples of recombinases of (a) include, but are not limited to, Cre recombinase, FLP recombinase, Dre recombinase, KD recombinase, B2B3 recombinase, Hin recombinase, Tre recombinase, lambda integrase, HK022 integrase, HP1 integrase, γ δ lyase/invertase, ParA lyase/invertase, Tn3 lyase/invertase, Gin lyase/invertase, hi-t,Integrase, BxB1 integrase, R4 integrase, or another functional recombinase.
In one embodiment, FLP recombinase may be used. FLP recombinase catalyzes a site-specific recombination reaction that involves amplifying the copy number of the Saccharomyces cerevisiae 2. mu. plasmid during DNA replication. The FLP recombinase may be derived from a species of saccharomyces, and in one embodiment may be derived from a strain of saccharomyces cerevisiae. In some embodiments, the FPL recombinase is derived from a strain of saccharomyces cerevisiae. The FLP recombinase may be a thermostable mutant FLP recombinase, such as FLP1 or FLPe. In some embodiments, the nucleic acid sequence encoding FLP recombinase comprises human-optimized codons.
Cre recombinase is a member of the Int family of recombinases (Argos et al (1986) EMBO J.5: 433) and has been demonstrated to efficiently recombine lox sites (X-ing over sites) not only in bacteria but also in eukaryotic cells (Sauer (1987) mol.cell.biol.7: 2087; Sauer and Henderson (1988) Proc.Natl Acad.Sci.85: 5166). In one embodiment, the Cre recombinase may be from a bacteriophage, such as from the P1 bacteriophage.
In one example, a mammalian cell may comprise RTS chromosomally integrated within the HI locus, and the cell may be transfected with a vector comprising an exchangeable cassette encoding a gene of interest according to the SSI integration protocol. When the exchangeable cassette is integrated into the HI locus, a recombinant protein producing cell comprising the exchangeable cassette integrated into the chromosome may be selected. The selection may be, for example, by detecting the presence of the label, or may be by detecting the absence of the label using methods known to those skilled in the art.
The SSI protocol may be used to introduce one or more genes into the host cell chromosome. As used herein, "site-specific integration" may refer to the integration of a nucleic acid sequence into a chromosome at a particular site, and may also refer to "site-specific recombination," which refers to the rearrangement of two DNA chaperones by recombination of a particular enzyme at its cognate sequence pair or target site. Unlike homologous recombination, site-specific recombination does not require DNA homology between partner DNA molecules, is independent of RecA, and does not involve DNA replication at any stage. In some embodiments, site-specific recombination uses a site-specific recombinase system to achieve site-specific integration of a nucleic acid in a host cell, e.g., a mammalian cell. Recombinase systems generally consist of three elements: two matching DNA sequences (recombination target sites) and a specific enzyme (recombinase). The recombinase catalyzes a recombination reaction between matching recombination sites.
The term "match" in reference to two RTS sequences means that the two sequences are capable of being bound by a recombinase and effecting site-specific recombination between the two sequences. In some embodiments, the RTS of an exchangeable cassette that matches the RTS of a cell refers to the RTS of a cassette that has substantially the same sequence as the RTS of the cell. In some embodiments, the exchangeable cassette contains sequences that are substantially identical to one or both RTS chromosomally integrated into the genome of the host cell.
As used herein, "transfection" refers to the introduction of an exogenous nucleic acid molecule (comprising a vector) into a cell. A "transfected" cell includes an exogenous nucleic acid molecule within the cell, and a "transformed" cell is one in which the exogenous nucleic acid molecule within the cell induces a phenotypic change in the cell. The transfected nucleic acid molecule may be integrated into the genomic DNA of the host cell and/or may be maintained extrachromosomally for a temporary or long period of time by the cell. Host cells or organisms expressing exogenous nucleic acid molecules or fragments are referred to as "recombinant" organisms, "transformed" organisms, or "transgenic" organisms.
The vector (also referred to as an expression vector) may be any suitable replicon, such as a plasmid, phage, virus or cosmid, to which another DNA segment may be attached in order to effect replication and/or expression of the attached DNA segment in a cell. The vector may comprise episomal (e.g., plasmid) and non-episomal vectors. For example, in one embodiment, episomal vectors that are removed/lost from a population of cells after many generations of cells can be used, e.g., by asymmetric partitioning. The vector may be a viral or non-viral vector, and the nucleic acid molecule may be introduced into the cell in vitro, in vivo or ex vivo. Synthetic vectors are also included herein. The vector may be introduced into the desired host cell by well-known methods including, but not limited to, transfection, transduction, cell fusion, and lipofection. The vector may include various regulatory elements, including a promoter.
As used herein, the terms "exchangeable cassette", "expression cassette" and "cassette" are used interchangeably and refer to a mobile genetic element that contains a gene and may comprise RTS. In some embodiments, the exchangeable cassette may comprise multiple RTS and/or multiple genes. For example, the exchangeable cassette may comprise a GOI in combination with a reporter gene or a selection gene.
The GOI can include, but is not limited to, a reporter gene, a selection gene, a gene of therapeutic interest, an auxiliary gene, or a combination thereof.
As used herein, the term "reporter gene" refers to a gene whose expression confers a phenotype that can be readily identified and measured by a cell. For example, the reporter gene may comprise a fluorescent protein gene or a selection gene. In one example, the selection gene may encode a product that confers upon the cells the ability to survive in a medium lacking essential nutrients. In some embodiments, the selection gene may confer resistance to an antibiotic or drug to the cell. Selection genes can be used to confer a particular phenotype on a host cell. When a host cell expresses a selection gene for survival in a selective medium, the gene is considered a positive selection gene. The selection gene may also be used to select against host cells containing the particular gene; the selection gene used in this way is referred to as a negative selection gene.
As used herein, the term "therapeutic gene of interest" refers to any functionally related nucleotide sequence. Thus, a gene of therapeutic interest may comprise any gene encoding a protein whose expression is required for the preparation of a therapeutic recombinant protein. Representative (non-limiting) examples of suitable therapeutic genes of interest include monoclonal antibodies, bispecific monoclonal antibodies, and antibody drug conjugates (including coagulation factors, well-expressed mabs in which protein expression is restricted to transcription, hormones such as EPO, immune fusion proteins (Fc-fusions), trispecific mabs, and the like).
As used herein, the terms "accessory gene" or "accessory gene" are used interchangeably and refer to a first gene that contributes to the expression of a second gene or to the stabilization, folding, or post-translational modification of the product of a second gene or to the production of a cellular environment that promotes the production of the product of a second gene. In some embodiments, the second gene encodes DtE protein (or a portion thereof). The helper gene can encode, for example, an RNA (e.g., mRNA, tRNA, or miRNA), a transcription factor, a chaperone, a synthetase, an oxidase, a reductase, a glycosyltransferase, a protease, a kinase, a phosphatase, an acetyltransferase, a lipase, or an alkalase.
The GOI may comprise a gene encoding a well-expressed therapeutic protein in a desired copy number. For example, the copy number of the gene encoding a well-expressed therapeutic protein may be 2 copies, 3 copies, 4 copies, 5 copies, 6 copies, 7 copies, 8 copies, 9 copies, or 10 copies.
As used herein, the term "difficult-to-express protein" refers to a protein that is difficult to produce. For example, it can be difficult to produce DtE protein because the expression of the protein must be highly regulated, the protein is difficult to recover from the host cell, the protein is prone to misfolding, the protein is prone to cleavage, the protein is prone to degradation, the protein is prone to aggregation, the protein is poorly soluble, the protein is a membrane bound protein, the protein is difficult to purify, the protein is cytotoxic, the protein comprises multiple polypeptide chains, e.g., 2, 3, or 4 polypeptide chains, or any combination thereof. For example, the DtE protein can comprise multiple polypeptide chains that form homo-or hetero-oligomers to produce the DtE protein. In such embodiments, the chain of DtE protein may be encoded on one or more genes of interest, which may be associated with the same or different RTS of the recombinant cell. Homo-oligomers or hetero-oligomers may be formed by covalent interactions, non-covalent interactions, or combinations thereof. DtE protein may also be a protein that requires the expression of auxiliary genes to produce DtE protein, or a protein that requires post-translational modification to produce DtE protein.
DtE protein may be a monoclonal antibody, such as a bispecific monoclonal antibody or a trispecific monoclonal antibody. DtE further examples of proteins include an Fc-fusion protein, which is a fusion protein in which the Fc domain of an immunoglobulin is operably linked to a second peptide. DtE protein can be an enzyme, membrane receptor and bispecific T cell engager: (Micromet AG,Munich,Germany)。
In one example, the GOI can be located between two RTS, i.e., one RTS located 5 'of a gene and a different RTS located 3' of the gene. In some embodiments, the RTS is directly adjacent to the gene located between them. In some embodiments, the RTS is located at a defined distance from the gene located between them. In some embodiments, RTS is a directional sequence. In some embodiments, RTS 5 'and 3' of the genes located between them are directly oriented (i.e., they are oriented in the same direction). In some embodiments, RTS 5 'and 3' of the genes located between them are oriented in opposite directions (i.e., they are oriented in opposite directions).
In some embodiments, the cell may comprise one or more additional GOIs, and the one or more additional GOIs may be chromosomally integrated. The second gene of interest can be, for example, a reporter gene, a selection gene, a therapeutic gene of interest (e.g., a gene encoding DtE protein), an accessory gene, or a combination thereof. The additional GOI may be located within the same HI as the first GOI, within a second HI site, or within a separate locus.
The second GIO can be integrated into the cell by using the same or a different vector as that used to transfect the cell with the first GOI. For example, a cell can be transfected with a first vector comprising a first exchangeable cassette encoding a first gene of interest and a second vector comprising a second exchangeable cassette encoding a second gene of interest. The first cassette may be integrated into the HI locus and the second cassette may be integrated into the same HI locus, into the second HI locus, or into a separate locus. For example, the second cassette may be integrated into the Fer1L4 locus. A recombinant protein producing cell can then be selected that comprises the first exchangeable cassette and the second exchangeable cassette integrated into the chromosome at the desired location.
Advantageously, the use of SSI of a landing pad located in the HI locus in the preparation of rP expressing cells ensures that the genetic composition of the pool of rP expressing cells is homogenous. Furthermore, the use of landing pads located in the HI locus to prepare SSI of rP expressing cells may ensure that the efficiency of the pool of rP expressing cells is homogenous. For example, the production cell pool may be homogenous with respect to the ratio of the first helper gene to the second helper gene, and/or the production cell pool is homogenous with respect to the ratio of the helper gene to the gene of therapeutic interest. Therefore, the use of landing pads located in HI to prepare SSI of rP expressing cells can ensure more consistent rP product quality.
The cell lines described herein (including prokaryotic and/or eukaryotic cell lines) can be cultured using any suitable apparatus, facilities, and methods. Furthermore, in embodiments, the devices, facilities and methods are suitable for culturing suspension cells or anchorage-dependent (adherent) cells, and for production operations configured for the production of pharmaceutical and biopharmaceutical products, such as polypeptide products, nucleic acid products (e.g. DNA or RNA) or mammalian or microbial cells and/or viruses, such as those used for cell and/or virus and microbiota therapy.
The cell may express or produce a product, such as a recombinant therapeutic or diagnostic product. Examples of products produced by a cell may include, but are not limited to, antibody molecules (e.g., monoclonal antibodies, bispecific antibodies), antibody mimetics (polypeptide molecules that specifically bind to an antigen but are not structurally related to an antibody, such as, for example, DARPins, affibodies, mimobody proteins, or ignars), fusion proteins (e.g., Fc fusion proteins, chimeric cytokines), other recombinant proteins (e.g., glycosylated proteins, enzymes, hormones), viral therapeutic agents (e.g., anti-cancer oncolytic viruses, viral vectors for gene therapy and viral immunotherapy), cellular therapeutic agents (e.g., pluripotent stem cells, mesenchymal stem cells, and adult stem cells), vaccines or lipid encapsulated particles (e.g., exosomes, virus-like particles), RNA (such as, for example, siRNA), or DNA (such as, for example, plasmid DNA), antibiotics, or amino acids. In embodiments, the devices, apparatus and methods may be used to produce biosimilar drugs.
The disclosed methods may allow for the production of products, e.g. proteins, peptides, antibiotics, amino acids, nucleic acids (such as DNA or RNA) of eukaryotic cells, e.g. mammalian cells or lower eukaryotic cells, such as e.g. yeast cells or filamentous fungal cells, as well as prokaryotic cells, such as gram-positive or gram-negative cells and/or eukaryotic cells or prokaryotic cells, which are synthesized by eukaryotic cells in a large scale manner. In some embodiments, the use of microbial organisms and spores thereof for microbiota therapeutics is also disclosed. Unless otherwise indicated herein, the apparatus, facilities, and methods may comprise any desired volume or production capacity, including but not limited to small-scale, pilot-scale, and full-scale capacities.
Further, unless otherwise specified herein, the apparatus, facilities, and methods may comprise any suitable reactor or bioreactor, including but not limited to stirred tanks, airlift reactors, fibers, microfibers, hollow fibers, ceramic matrices, fluidized beds, fixed and/or spouted bed bioreactors. As used herein, a "reactor" or "bioreactor" may comprise a fermentor or a fermentation unit, or any other reaction vessel, and the term "reactor" is used interchangeably with "fermentor". The term fermentor or fermentation refers to both microbial and mammalian cultures. For example, in some aspects, an exemplary bioreactor unit may perform one or more or all of the following: feeding of nutrients and/or carbon sources, injection of suitable gases (e.g. oxygen), inlet and outlet flows of fermentation or cell culture media, separation of gas and liquid phases, maintenance of temperature, oxygen and CO2Maintenance of the horizonMaintenance of pH level, agitation (e.g., stirring), and/or cleaning/sterilization. Exemplary reactor units such as fermentation units may contain multiple reactors within a unit, for example the unit may have from 1 to about 100 or more bioreactors in each unit, for example from about 10 to about 90 or from about 20 to about 80 bioreactors in each unit, and/or a facility may contain multiple units with single or multiple reactors within the facility. The bioreactor may be adapted for batch, semi-fed batch, perfusion and/or continuous fermentation processes. Any suitable reactor diameter may be used. For example, the bioreactor may have a volume of about 100mL to about 50,000L. Non-limiting examples include volumes of about 250mL to about 10L, about 10L to about 500L, about 20L to about 200L, about 500L to about 5,000L, or in some embodiments, about 5,000L to about 50,000L. Further, suitable reactors may be multi-purpose, single-purpose, disposable, or non-disposable, and may be formed of any suitable material, including metal alloys, such as stainless steel (e.g., 316L or any other suitable stainless steel) and inconel, plastic, and/or glass.
In embodiments, and unless otherwise indicated herein, the apparatuses, facilities, and methods described herein may further comprise any suitable unit operations and/or equipment not otherwise mentioned, such as operations and/or equipment for isolating, purifying, and isolating such products. Any suitable facility and environment may be used, such as conventional fixed facilities, modular, mobile, and temporary facilities, or any other suitable structure, facility, and/or arrangement. For example, in some embodiments, a modular clean room may be used. Further, unless otherwise specified, the devices, systems, and methods described herein may be housed and/or executed in a single location or facility, or alternatively, may be housed and/or executed in separate or multiple locations and/or facilities.
By way of non-limiting example and not limitation, U.S. publication No. 2013/0280797; 2012/0077429 No; 2011/0280797 No; 2009/0305626 No; and U.S. patent No. 8,298,054; 7,629,167 No; and 5,656,491 (which are hereby incorporated by reference in their entirety) describe exemplary facilities, devices, and/or systems that may be suitable.
The recombinant cell can be a mammalian cell as previously described, and in one particular embodiment can be a CHO cell (e.g., CHO-K1 cell, CHO-DXB11 cell, CHO-DG44 cell, CHOK1SV cell including all variants)TMCells, CHO glutamine synthetase knockout cells containing all variants, etc.), but the disclosure is not limited to these cells. Other examples of cells that can integrate RTS in the HI locus may include HEK293 cells, including adherent and suspension adapted variants, HeLa, HT1080, H9, HepG2, MCF7, MDBK Jurkat, NIH3T3, PC12, BHK (baby hamster kidney cells), VERO, YB2/0, Y0, C127, L, COS (e.g., COS1 and COS7), QC1-3, HEK-293, VERO, per.c6, EB1, EB2, EB3, oncolytic or hybridoma cell lines. The eukaryotic cell may also be an avian cell, cell line or cell strain, such as for exampleCell, EB14, EB24, EB26, EB66 or EBvl 3.
In some embodiments, eukaryotic stem cells can be utilized. The stem cells can be, for example, pluripotent stem cells, including Embryonic Stem Cells (ESCs), adult stem cells, induced pluripotent stem cells (ipscs), tissue-specific stem cells (e.g., hematopoietic stem cells), and Mesenchymal Stem Cells (MSCs). Differentiated forms of any of the cells described herein are contemplated herein.
The eukaryotic cell can be a lower eukaryotic cell, such as, for example, a yeast cell (e.g., pichia methanolica, pichia kluyveri, and pichia angusta), pichia pastoris (e.g., pichia, Komagataella pseudopodocarsis, or foal), saccharomyces cerevisiae (e.g., saccharomyces cerevisiae, kluyveromyces, saccharomyces uvarum), kluyveromyces (e.g., kluyveromyces lactis, kluyveromyces marxianus), candida (e.g., candida utilis, candida cacao, candida boidinii), filariaceae (e.g., geotrichum), hansenula polymorpha, yarrowia lipolytica, or schizosaccharomyces pombe.
The eukaryotic cell can be a fungal cell (e.g., Aspergillus (such as Aspergillus niger, Aspergillus fumigatus, Aspergillus oryzae, Aspergillus nidulans), Acremonium (such as Thermoanaerobacter), Chaetomium (such as Thermomyces Chaetomium), Chrysosporium (such as Chrysosporium thermophilum), Cordyceps (such as Cordyceps militaris), Cladosporium, Chlamydia, Fusarium (such as Fusarium oxysporum), chaetomium (such as chaetomium graminum), hypocrea (such as hypocrea jecorina), chaetomium (such as pyretomium oryzae), myceliophthora (such as myceliophthora thermophila), chaetomium (such as gibberella erythraea), neurospora (such as neurospora crassa), penicillium, sporothrix (such as sporothrix thermophilus), thielavia (such as thielavia tairei, thielavia iso), trichoderma (such as trichoderma reesei), or verticillium (such as verticillium dahliae).
The eukaryotic cell can be an insect cell (e.g., Sf9, Mimic)TMSf9、Sf21、High FiveTM(BT1-TN-5B1-4) or BT1-Ea88 cells), algal cells (e.g., those of the genus Geotrichum, class Diatomae, genus Dunaliella, genus Chlorella, genus Chlamydomonas, phylum Cyanophyta (cyanobacteria), Nannochloropsis, Spirulina or genus Ochrostis) or plant cells (e.g., those from monocotyledonous plants (e.g., maize, rice, wheat or green bristlegrass), or from dicotyledonous plants (e.g., those from cassava, potato, soybean, tomato, tobacco, alfalfa, Physcomitrella or Arabidopsis thaliana).
The cell may be a bacterium or a prokaryotic cell. For example, gram-positive cells such as bacillus, streptomyces, staphylococcus, or lactobacillus may be used. The bacillus that may be used may comprise, for example, bacillus subtilis, bacillus amyloliquefaciens, bacillus licheniformis, bacillus natto, or bacillus megaterium. In embodiments, the cell is bacillus subtilis, such as bacillus subtilis 3NA and bacillus subtilis 168. Bacillus bacteria can be obtained, for example, from Bacillus Genetic Stock Center, Biological Sciences 556, 484West 12thAvenue.Columbus OH 43210-.
Gram-negative cells such as Salmonella or E.coli, such as, for example, TG1, TG2, W3110, DH1, DHB4, DH5a, HMS174(DE3), NM533, C600, HB101, JM109, MC4100, XL1-Blue and Origami, and those derived from E.coli B-strains, such as, for example, BL-21 or BL21(DE3), can be used, all of which are commercially available. Suitable host cells are commercially available, for example, from culture collections such as DSMZ (Deutsche Sammlung von Mikroorganismen and Zellkulturen GmbH, Braunschweig, Germany) or the American Type Culture Collection (ATCC). In some embodiments, the cells comprise other microbiota for use as therapeutic agents. These microbial groups include those present in the human microbial flora and belong to the phylum firmicutes, bacteroidetes, proteobacteria, verrucomicrobia, actinomycetes, clostridia and cyanobacteria. The microbial population may comprise aerobic, strictly anaerobic, or facultative anaerobic, and comprise cells or spores. The therapeutic microbiota may also comprise genetically manipulated organisms and vectors used in their modification. Other microbial community-related therapeutic organisms may include: archaea, fungi and viruses. See, for example, The Human Microbiome Project Consortium. Nature 486, 207-; weinstock, Nature, 489 (7415): 250- > 256 (2012); Lloyd-Price, Genome Medicine 8: 51(2016).
The rP producing cells can be cultured to produce peptides, amino acids, fatty acids, or other useful biochemical intermediates or metabolites. For example, molecules having a molecular weight of about 4000 daltons to greater than about 140,000 daltons may be produced. Molecules produced by a cell may have a range of complexities and may contain post-translational modifications (including glycosylation).
Proteins that can be produced can include, for example, BOTOX, Myobloc, Neurobloc, Dysport (or other serotype of botulinum neurotoxin), arabinosidase alpha, daptomycin, YH-16, chorionic gonadotropin alpha, filgrastim, cetrorelix, interleukin-2, aldesleukin, texileukin, dinil-toxin linker, interferon alpha-n 3 (injected), interferon alpha-nl, DL-8234, interferon, Suntory (gamma-1 a), interferon gamma, thymosin alpha 1, tamarind, DigiFab, ViperaTAb, Echitab, CroFab, nesiritide, abacteri, alfa, libi, etotetam alpha, teriparatide (osteoporosis), injectable calcitonin (calcitonin), bone disease (calcitonin), nasal calcitonin, etanercept, glutelin (250), bovine hemoglobin alpha (250), tagatosol, and alpha, Collagenase, capecitabine, recombinant human epidermal growth factor (topical gel, wound healing), DWP401, dabbepotin alpha, epoetin omega, epoetin beta, epoetin alpha, deshellidine, lepirudin, bivalirudin, nonaprothrombin alpha, Mononine, eptapigenin alpha (activated), recombinant factor VIII + VWF, recombinant factor VIII, factor VIII (recombinant), Alphnomate, octreotide alpha, factor VIII, palivimine, Inikinase, tenenaproxase, alteplase, pamiprep, reteplase, natenaproxase, neterproxase, hydependroxase, neterpsin alpha, rFSH, hpFSH, micafungin, pefilgrastimsin, letrothecin, sertraline, glucagon, exenatide, pramipetide, muipine, gallosin, valtrelin, leuprolide, leuclindamrelin, leuprolide, and subcutaneous tissue-droperide (subcutaneous), subcutaneous tissue-free, Histrelin, nafarelin, leuprorelin sustained release depot (ATRIGEL), leuprorelin implant (DUROS), goserelin, you trypan, KP-102 program, growth hormone, mecamylamine (growth deficiency), Enfuvirtide, Org-33408, insulin glargine, insulin (inhalation), insulin lispro, insulin detemir, insulin (buccal, RapidMist), mecamylamine-Rifampride, anakinra, simons, 99 mTc-asipeptide injection, surface antigenic peptide, beta-feron, glatiramer acetate, Gepon, samostin, Omepleren, human leukocyte derived alpha interferon, Billef, insulin (recombinant), recombinant human insulin, insulin aspart, semenlin, Roselain-A, interferon-alpha 2, interferon alfafenone, interferon alpha 1 alfafafar, interferon alpha 1 alfafar, Avonex recombinant human luteinizing hormone, deoxyribonuclease alpha, travermine, ziconotide, taltirelin, dibotmin alpha, atosiban, becaplamine, eptifibatide, Zemaira, CTC-111, Shanvac-B, HPV vaccine (tetravalent), octreotide, lanreotide, ancetin, galactosidase beta, galactosidase alpha, laronidase, cuprein (topical gel), labyrintitase, ranibizumab, Actimmune, PEG-intron, Tricomin, recombinant dermatophagoides pteronyssinus desensitizing injection, recombinant human parathyroid hormone (PTH)1-84 (subcutaneous, osteoporosis), epoetin delta, transgenic antithrombin III, Granditropine, Vitrase, recombinant insulin, interferon-alpha (oral lozenge), GEM-21S, globin, Epstein, Imidakojima, omatricin, recombinant serum, albumin, carboxypeptidase, and the like, Human recombinant C1 esterase inhibitor (angioedema), lanoteplase, recombinant human growth hormone, enfuvirtide (needleless injection, Biojector 2000), VGV-1, interferon (alpha), lucitin, aviptadil (inhaled, pulmonary), icatibant, icaritin, Omeganan, Aurograb, Pecesagan acetate, ADI-PEG-20, LDI-200, degarelix, Becininterleukin, Favld, MDX-1379, ISAtx-247, liraglutide, teriparatide (osteoporosis), tifaco, AA4500, T4N5 liposome emulsion, captovab, DWP413, ART-123, chrysin, desmoprase, andersoprase, mersalon, TH-9507, tenutide, Diamyld, DWP-412, CSF (injectable, sustained release insulin G, and inhaled G (AIR), sustained release insulin G (AIR), Insulin (inhalant, Technosphere), insulin (inhalant, AERx), RGN-303, DiaPep277, interferon beta (hepatitis C Virus infection (HCV)), interferon alpha-n 3 (oral), belicep, transdermal insulin patch, AMG-531, MBP-8298, Xerecept, Opababan, AIDSAX, GV-1001, LymphoScan, ranpirnase, Lipoxyysan, russulpan, MP52 (beta-calcium phosphate carrier, bone regeneration), melanoma vaccine, sipuleucel-T, CTP-37, Insega, Vetesbone, human thrombin (frozen, surgical hemorrhage), thrombin, TransMID, snake venom, PrekayKa, terlipressin (intravenous, hepatorenal syndrome), EUR-1008M, recombinant FGF-I (injectable, vascular disease), BDM-E, rotigat peptide, MBAN-216, ETC-113P, ETC-113-I-594, and AMG-531, Duramycin (inhaled, cystic fibrosis), SCV-07, OPI-45, endostatin, angiostatin, ABT-510, Bowman Birk inhibitor concentrate, XMP-629, 99 mTc-Hynic-annexin V, Carhalalide F, CTCE-9908, teverrelix (delayed release), ozarelix, romidepeptide, BAY-504798, interleukin 4, PRX-321, peptide scan, ibocadekin, rhactoferin, TRU-015, IL-21, ATN-161, Cijilen peptide, Albuferon, Biphasix, IRX-2, omega interferon, PCK-3145, CAP-232, paregoride, huN901-DMI, immunotherapeutic vaccines for ovarian cancer, SB-249553, Oncovax-CL, Vacovax-P, BLP-25, Cerx-16, multiple Vax-16 epitope peptide (Mary 1-100) vaccines, tyrosinase-100, Martin, Marx-100, Oncork-K-2, Oncork-A vaccine, Oncork-2, Oncork-A vaccine, and Oncork-3, Non-peptidic nanopeptide, rAAT (inhaled), rAAT (skin), CGRP (inhaled, asthma), pinacepin, thymosin beta 4, plipidepsin, GTP-200, ramoplanin, GRASPA, OBI-1, AC-100, salmon calcitonin (oral, eligen), calcitonin (oral, osteoporosis), esarelin, caprorelin, Cardova, velaferin, 131I-TM-601, KK-220, T-10, uracride, desloratadine, hematide, Chrysalin (topical), rNAPC2, recombinant factor V111 (pegylated liposomes), bFGF, pegylated recombinant staphylokinase variants, V-10153, SonoLysPro, NeuroVax, CN-ZErZErN, islet cell regeneration therapy, GLP-1, GLP-77, BIM-548806, GCSaorab (controlled release tablet), aR 0010, AVGA-960, AVGA-96B, Linaclotide acetate, CETi-1, Hemospan, VAL (injectable), fast acting insulin (injectable), Viadel, intranasal insulin, insulin (inhaled), insulin (oral, eligen), recombinant methionyl human leptin, petuniaxin (subcutaneous, eczema), petuniaxin (inhaled dry powder, asthma), Multikine, RG-1068, MM-093, NBI-6024, AT-001, PI-0824, Org-39141, Cpn10 (autoimmune/inflammatory), talferricin (topical), rEV-131 (eye), rEV-131 (respiratory tract disease), oral recombinant human insulin (diabetes), RPI-78M, Omepleren (oral), CYT-99007CTLA4-Ig, DTY-001, Varrat, interferon alpha-n 3 (topical), and, IRX-3, RDP-58, Tauferon, bile salt-stimulated lipase, meriispase, alkaline phosphatase, EP-2104R, Menanotan-II, Breleman's red, ATL-104, recombinant human microplasmin, AX-200, SEMAX, ACV-1, Xen-2174, CJC-1008, dynorphin A, SI-6603, LAB, AER-002, BGC-728, malaria vaccine (virosome, PevipOR), ALTU-135, parvovirus B19 vaccine, influenza vaccine (recombinant neuraminidase), malaria/HBV vaccine, anthrax vaccine, Vacc-5q, Vacc-4x, HIV vaccine (oral), HPV cream, Tat Toxoid, YSPSL, CHS-40, PTH (1-34) liposome (Novasome), Ostabolin-C, PTH analogue (topical psoriasis), MTBR-93.02, MTB 72F.72 vaccine (pulmonary tuberculosis vaccine), and vaccine (oral administration), MVA-Ag85A vaccine (tuberculosis), FARA04, BA-210, recombinant pestilence FIV vaccine, AG-702, OxSODrol, rBetV1, Der-P1/Der-P2/Der-P7 allergen-targeted vaccine (dust mite allergy), PR1 peptide antigen (leukemia), mutant ras vaccine, HPV-16E7 lipopeptide vaccine, labyrinthin vaccine (adenocarcinoma), CML vaccine, WT 1-peptide vaccine (cancer), IDD-5, CDX-110, Pentrys, Norelin, CytoFab, P-9808, VT-111, Artocam peptide, tipalmin (skin, diabetic foot ulcer), Lupingwei, reticose, rGRF, HA, alpha-galactosidase A, ACE-011, ALTU-140, CGX-1160, angiotensin-treating vaccine, D-4F, ETC-018, SCL-APP (oral administration), Sc-07, oral administration of tuberculosis, DRF-7295, ABT-828, ErbB 2-specific immunotoxin (anti-cancer), DT3SSIL-3, TST-10088, PRO-1762, Combotox, cholecystokinin-B/gastrin-receptor binding peptide, 111In-hEGF, AE-37, trastuzumab-DM 1, antagonist G, IL-12 (recombinant), PM-02734, IMP-321, rhIGF-BP3, BLX-883, CUV-1647 (topical), L-19-based radioimmunotherapeutic (cancer), Re-188-P-2045, AMG-386, DC/1540/KLH vaccine (cancer), VX-001, AVE-9633, AC-9301, NY-ESO-1 vaccine (peptide), 17.A2 peptide, melanoma (pulse antigen therapy), prostate cancer vaccine, and vaccine, CBP-501, recombinant human lactoferrin (dry eye), FX-06, AP-214, WAP-8294A (injectable), ACP-HIP, SUN-11031, peptide YY [3-36] (obesity, intranasal), FGLL, asecept, BR3-Fc, BN-003, BA-058, human parathyroid hormone 1-34 (nose, osteoporosis), F-18-CCR1, AT-1100 (celiac disease/diabetes), JPD-003, PTH (7-34) liposome cream (Novasome), duramycin (eye, dry eye), CAB-2, CTCE-0214, glycosylated polydiethanolated erythropoietin, EPO-Fc, CNTO-528, AMG-114, JR-013, factor XIII, aminoconmidine, PN-951, 716155, SUN-E7001, 7001, TH-0318, BAY-73-7977, teverrelix (immediate release), EP-51216, hGH (controlled release, Biosphere), OGP-I, Cifuwei peptide, TV4710, ALG-889, Org-41259, rhCC10, F-991, thymopentin (lung disease), r (m) CRP, liver-selective insulin, subalin, L19-IL-2 fusion protein, elafin, NMK-150, ALTU-139, EN-122004, rhTPO, thrombopoietin receptor agonist (thrombocytopenia), AL-108, AL-208, nerve growth factor antagonist (pain), SLV-317, CGX-1007, INNO-105, oral teriparatide (eligen), GEM-OS1, AC-162352, PRX-302, LFn-p24 (vaccine for pneumonia, EP-3, 104S vaccine, malaria vaccine, pediatric vaccine, and vaccine, Neisseria meningitidis group B vaccine, group B Streptococcus neonatus vaccine, anthrax vaccine, HCV vaccine (gpE1+ gpE2+ MF-59), otitis media therapy, HCV vaccine (core antigen + ISCOMATRIX), hPTH (1-34) (transdermal, ViaDerm), 768974, SYN-101, PGN-0052, aviscumnine, BIM-23190, tuberculosis vaccine, multiple epitope tyrosinase peptide, cancer vaccine, Enkasti, APC-8024, GI-5005, TTS-001, TTS-CD3, TNF (solid tumor) targeting blood vessels, desmopressin (buccal controlled release), onacept, and TP-9201.
Other examples of peptides that may be produced include, but are not limited to, adalimumab (HUMIRA), infliximab (REMICADE)TM) Rituximab (RITUXAN)TM/MABTHERATM) Etanercept (ENBREL)TM) Bevacizumab (AVASTIN)TM) Trastuzumab (HERCEPTIN)TM) Pielicidin (NEULASTA)TM) Or any other suitable polypeptide, including biosimics and modified biosimics.
Other suitable polypeptides are those listed in table 6 below and US 2016/0097074. One skilled in the art will appreciate that the disclosure of the present invention will also encompass combinations of products and/or conjugates as described herein [ (i.e., polyproteins, modified proteins (conjugated to PEG, toxins, other active ingredients)) ].
TABLE 6
In embodiments, the polypeptide may be a hormone, blood clotting/clotting factor, cytokine/growth factor, antibody molecule, fusion protein, protein vaccine, or peptide as shown in table 7.
TABLE 7
In embodiments, the protein is a multispecific protein, e.g., a bispecific antibody as shown in table 8.
TABLE 8
Example 1
Described is an example of a process of generating a multi-dimensional map of a genome by an orthogonal approach, and then using the map or maps to generate a list of candidate HI loci for targeted integration of transgenes with predicted high expression and stability. The filtering process or algorithm for obtaining a list of candidate loci using a multidimensional map is summarized in fig. 1 and described below.
First, a reference genome assembly is constructed, to which multi-level genetic and epigenetic data are subsequently appended.
Hi-C data derived from the Chinese Hamster Ovary (CHO) cell line CHO-K1SV 10E9 (Zhang et al, Biotechnol prog.2015: 31(6)1645-56) was used to inform de novo assembly of a CHO-K1SV (progenitor cell line 10E 9) sequencing scaffold originally constructed from Illumina short reading sequences. As a result of proximity-based ligation, Hi-C data are characterized by increased contact density between regions that are close to each other on a linear sequence and/or regions within the same chromosome. Thus, Hi-C can be used to determine the ligation between previously isolated sequence scaffolds within a fragmented reference assembly. An alignment of more than 3.1 million unique, valid pairs of Hi-C reads from three biological replicates was used to cluster, sort and orient CHO-K1SV sequence scaffolds by published LACHESIS algorithms (Burton, J.et. al. chromosome-scale scanning of nucleotide genome based on chromatography in interactions. Nat. Biotechnol.31, 1119-1125 (2013)). The LACHESIS assembly included 1146 input sequence scaffolds and contained 90.52% of the original CHO-K1SV sequence. The final assembly clustered the input sequence scaffolds into 13 high confidence groups, with a length distribution ranging from 12Mb to 455 Mb.
Alignment of Hi-C data from the 10E9 cell line with the LACHESIS assembly produced a genome-wide contact map similar to that associated with the more mature human and mouse reference assemblies (fig. 2A) and with a cis/trans ratio of valid read pairs consistent with equivalent Hi-C datasets derived from human embryonic stem cells and mouse fetal liver cells (fig. 2B).
Paired-end Hi-C sequence data and promoter from chinese hamster ovary SSI 10E9 cell line three replicates of Hi-C (PCHi-C) sequence data were captured (Zhang et al, Biotechnol prog.2015: 31(6)1645-56), treated alone by HiCUP version 0.5.9.dev under default parameters (Wingett S, et al, F1000Research 2015, 4: 1310)). A valid pair of reads for the unique alignment was mapped onto the sequence of interest using Bowtie version 1.1.0(Langmead B, et al, Genome biol. 2009; 10 (3): R25) as part of the HiCUP tube.
Three replicates of paired-end ATAC-Seq sequence data generated according to the protocol described in Buenrostro et al 2013(Nat Methods 10, 1213-. Before mapping to the sequence of interest in paired-end mode using Bowtie2 and a maximum fragment length of 2000 base pairs, all generated FASTQ files were trimmed to remove the sequencing adapter sequences in paired-end mode (Langmead B, Salzberg S.fast gapped-read alignment with Bowtie 2.Nature methods.2012, 9: 357-. Subsequent BAM files corresponding to The same sample are then merged using custom Perl scripts and alignments with mapping quality scores less than 20 are deleted from The sample merged BAM file using The Samtools view function (Li h., Handsaker b., Wysoker a., Fennell t., Ruan j., Homer n., Marth g., Abecasis g., Durbin r.and 1000Genome Project Data Processing summary (2009) The Sequence alignment/map (sam) format and Samtools, bioinformatics, 25, 2078-9).
Published histone modified ChIP-Seq sequence data sets derived from suspension adapted CHO-K1 cell lines (Feichtinger J, et al biotechnol bioeng.113 (10): 2241-53(2016) -access Code PRJEB9291) were downloaded and each FASTQ file was trimmed to remove sequencing adapter sequences in single-ended mode. The trimmed FASTQ file was then mapped to the target sequence using Bowtie2 in single-ended mode and with a maximum fragment length of 1,000 base pairs. BAM files at different time points corresponding to the same histone modification were merged using custom Perl script and alignments with mapping quality scores less than 20 were deleted from the sample merged BAM file again using Samtools view function.
FASTQ files from three replicates of paired-end total RNA-Seq data of chinese hamster ovary SSI 10E9 cell line (Zhang L, et al 2015) were trimmed to remove sequencing adapter sequences in paired-end mode. The trimmed FASTQ file is then mapped to the sequence of interest in pairing mode using HiSat2(Kim D, Langmead B and Salzberg SL. HISAT: a fast spliced aligner with low memory retrieval. Nature methods.2012, 12: 357-360) under default parameters. Alignments with mapping quality scores below 40 were removed and duplicate datasets were merged within Seqmonk. RNA-Seq quantification (RPKM values) was performed using the RNA-Seq quantification pipeline in SeqMonk (Babraham Bioinformatics-SeqMonk Mapped Sequence Analysis Tool by Simon Andrews), which specifies that the library is non-strand specific, paired-end, and should quantify only reads that overlap with the tagged exons. The resulting quantification was normalized for different transcript lengths and subjected to logarithmic transformation. Downstream analysis values for loci with negative log-RPKM values were all zero.
Hi-C analysis
The Hi-C BAM file from the filtering and mapping of the three replicas is merged using custom Perl scripts. Before the HOMER (Heinz S., et al., Mol Cell 2010May 28; 38 (4): 576-589. PMID: 20513432) tag Hi-C directory has been created, a Hi-C digest file is created from the merged BAM file using a custom Python script.
Topology-related domains (TAD) were identified by placing the above list of Hi-C tags in a 'findhi domains. pl' HOMER script with a resolution of 5Kb, a super resolution of 25Kb and a maximum interaction distance limit of 1 Mb. The TAD boundaries used in the algorithm are the base pair ends of the domains defined in the output file.
The principal component analysis was performed by placing the above Hi-C tag catalog under HOMER 'runHiCpca. pl' script with a resolution of 50Kb and a super resolution of 100Kb to regulate the identification of active genomic compartments. The first two major components were identified by selecting 152 ` actively expressed ` loci (determined by quantifying homeostatic RNA-Seq data from the Chinese hamster ovary 10E9 cell line) as seed regions. When the first principal component represents a separation of different chromosome arms, data from the second principal component is used. For all other 'chromosomes', the data from the first principal component is used. The 'active' domain used in the algorithm was identified by incorporating the principal component analysis data discussed above into the homo 'findhicompartments. pl' script.
The data input to the algorithm after this analysis comprises the TAD boundary positions identified within the sequence of interest and the coordinates of the active compartments identified within the sequence of interest.
ATAC-Seq analysis
Identifying peaks in accessible chromatin in all three replicated ATAC-Seq filtered, pooled BAM files mapped to the sequence of interest using the MACS2 'callpeak' function with the following parameters; q 0.01- -nola- -nomodel- -call-sumilits. The sum of the overlapping peaks in all three replicas, defined using the genomics ranges Bioconductor Software package (Lawrence M, Huber W, Pag θ s H, Aboyoun P, Carlson M, Gentleman R, Morgan M, Carey V (2013), "Software for Computing and Annotating Genomic ranges," ploS Computational Biology, 9), is then used in the algorithm.
PCHi-C analysis
Significant promoter interactions were identified using the CHiCAGO version 1.1.3(Cairns J, et al, Genome biology.2016.17: 127) to capture the Hi-C dataset from the promoter under default parameters. Promoter-trapping RNA decoy libraries were designed for the sequence of interest and a list of decoy promoters containing HindIII restriction fragments was created. Before running CHiCAGO, aligned PCHi-C BAM files were filtered using custom Perl scripts to remove read pairs that did not overlap with one of these decoy promoters (containing HindIII restriction fragments). The default parameters are then used to run CHiCAGO on the separate copied, filtered BAM file. The cis-interactions classified as statistically significant in at least two of the three replicates were extracted for further use.
ChromHMM analysis
The filtered, pooled ATAC-Seq and published ChIP-Seq BAM files consistent with the sequence of interest were used to inform generation of the 17-state ChromHMM model (Ernst and Kellis M. Nat Protoc.12: 2478-. States 2 and 3 are considered to be potential activity enhancing sub-regions, while states 11, 12, 14, 15 and 16 are designated as regions with potential inhibitory properties.
The list of potential activity enhancers HindIII restriction fragments is defined as those restriction fragments that first overlap at least one ChrommHMM state 2 or 3 region outside the 2Kb of the labeled TSS. These candidate restriction fragments were then filtered to remove those fragments that also overlapped any 'inhibitory' chromahmm status regions (11, 12, 14, 15 and 16) and/or the decoy promoter containing HindIII restriction fragments listed in the PCHi-C analysis section.
For algorithmic purposes, the list of cis PCHi-C interactions classified as statistically significant in at least two PCHi-C replicas is filtered against a list of potential activity enhancers HindIII restriction fragments to obtain a set of reproducible promoters: the predicted statistically significant enhancer cis-interaction used in the algorithm.
The resulting potential HI loci found by this version of the algorithm are described in table 1, where the HI loci contained comprise these sites +/-about 5,000 base pairs flanking the specifically identified sites. The sites in table 1 have been ranked according to predicted performance based on the non-weighted sum of the closeness of each site to the nearest TAD boundary, the reproducible number of predicted enhancer cis interactions and the ranking of the steady state mRNA levels of the 'related' genes.
Compared to the currently industry-relevant FerIL4 landing pad in fig. 3C, the sequence in fig. 3A for the candidate HI locus SEQ ID No.: 3 and 3B for the candidate HI locus SEQ ID No.: 2 provides an example in which the candidate HI loci are located within the 3D genomic map. Of particular note are the spatial positions relative to 1) the TAD boundary, 2) the mapped peak in open chromatin as determined by ATAC-Seq, 3) the promoter mapped to this region captures the Hi-C interaction, and 4) the mapped epigenetic signature.
Example 2
To demonstrate the ability of the method to identify HI loci using the procedure outlined in fig. 1 and described in example 1, five of the top ranked candidate loci and five of the bottom ranked loci were selected for empirical evaluation. This is achieved by measuring the expression of a reporter cassette for genomic integration at the identified locus. Evaluating the target locus together with two controls; heterochromatin region and 5' flanking sequence of Chinese hamster ovary SSI 10E9 cell line (Zhang et al, Biotechnol prog.2015: 31 (6)) 1645-56, Fer1l4 landing pad. Heterochromatin control zones represent peaks in accessible chromatin that do not overlap with the HindIII restriction fragments involved in any reproducible significant PCHi-C interactions. This peak was also located approximately 14kb upstream of the 'non-transcribed' Fbxl2 gene (Ref Seq ID NW _003613997.1, Genbank ID JH000418.1), in the inactive genomic compartment, and overlapping with the region filled with the constitutive heterochromatin histone marker H3K9me 3. Inclusion of these controls provides a direct reference point for evaluation of candidate loci.
To test candidate loci, custom designed GFP donor template plasmids were constructed consisting of an eGFP expression cassette under the control of a constitutive CMV promoter flanked by custom designed 'pseudo gRNA' recognition sites (fig. 4A). The premise of using custom designed pseudo gRNA sequences to mediate transfection following in vivo excision was obtained by published general gene labeling techniques (Lackner et al, 2015; Nat Commun.6: 10237). In addition to the reporter gene, the donor plasmid also contains a pseudo gRNA sequence and a locus-specific gRNA sequence (to target the CMV-eGFP cassette to the locus of interest), both under the control of the U6 promoter, and both contain gRNA scaffold sequences specified in Ran et al, 2013(Ran et al, 2013; Nat Protoc.8 (11): 2281-2308). Furthermore, the locus-specific gRNA cassette backbone consists of two BbsI restriction sites upstream of the gRNA scaffold sequence, allowing for the incorporation of locus-specific crRNA sequences using the cloning strategy outlined again in Ran et al, 2013(Ran et al, 2013). In all experiments, the pseudo grnas remained unchanged, while the site-specific grnas were varied to allow site-specific targeting of the CMV-eGFP cassette.
After co-transfection of the donor and Cas9 plasmids, Cas9 nuclease cleaves the CMV-eGFP cassette from the donor plasmid by binding the pseudo gRNA to the recognition sites flanking the CMV-eGFP cassette. Then, after Cas9 cleaves the target genomic DNA, the cassette should be integrated into the target genomic site by the cellular endogenous NHEJ (non-homologous end joining) mechanism in combination with the site-specific gRNA.
For each candidate locus, crRNA target sequences were identified using an internal CRISPR gRNA design tool that takes into account the propensity to mediate off-target genomic cleavage. The top three ranked crRNA target sequences were selected, each sequence being specific for a different region of the relevant candidate locus. These sequences were then cloned into donor plasmids downstream of the U6 promoter and upstream of the gRNA scaffold sequence at the BbsI site, respectively, to produce the final expressed grnas for the target locus, as outlined in Ranet al.2013. For each target locus, three independent donor plasmids were constructed, containing a separate crRNA sequence. A sterile 5. mu.g donor plasmid library was created for each candidate locus by mixing the three constructed donor plasmids in equimolar ratios. These libraries were then transfected into Chinese hamster ovary SSI 10E9 cells together with 5. mu.g of sterile Cas9-Puro plasmid (Dharmacon U-005100-120), which yielded a total of 10. mu.g of plasmid DNA upon transfection.
Chinese hamster ovary SSI 10E9 cells on day 2 or day 3 of subculture were transfected with donor and Cas9 plasmid by electroporation using the Bio-Rad Gene Pulser Xcell electroporation System with a cell to DNA transfection ratio of 1X10 in 0.7mL of CD-CHO medium7Live cells were incubated with 10. mu.g plasmid DNA in 100. mu.L TE buffer. Three transfection tubes were then mixed into 30mL of pre-warmed CD-CHO medium and left to stand for recovery. Cultures were left for a total of 13 days to recover before analysis. During this period, the medium was changed on day 4 and 1X10 on days 7 and 106Cell density of individual viable cells/mL subculture cultures.
On the day of analysis, the GFP production per cell from repeated injections of 20,000 cells from each cell pool was analyzed by flow cytometry using a Guava easyCyte 12HT bench top flow cytometer. In (fig. 4B), the average percentage of GFP + cells targeting a specific genomic locus in each transfection pool can be observed. For GFP expression by random, homology-independent genomic integration of the donor plasmid and/or expression from transient plasmids remaining after pool growth, a donor plasmid lacking any site-specific gRNA was included as a negative control ('plasmid control'). In (fig. 4C), the median GFP signal of GFP + cells for each pool is shown. From this locus sample it was observed that it was possible to identify a HI locus with roughly equivalent expression performance to the FerlL4 locus, which has previously been identified by large-scale, random, empirical screening as a high performance genomic locus ((Zhang et al, Biotechnol prog.2015: 31(6) 1645-56)).
To demonstrate that on-target integration of the CMV-eGFP cassette occurs in the pools of the above assay, genomic DNA from each cell pool was extracted using the GeneJET genomic DNA purification kit under the direction of the manufacturer. Targeted integration of the GFP expression cassette was determined by PCR using GFP-specific primers and primers specific for the upstream and downstream sequences of each candidate integration locus. Except for the locus Seq ID: outside of 4, targeted integration of all candidate loci was confirmed (fig. 4D). Using the primer combinations in this study, no sense amplicon from the Ferl14 locus was observed.
These and other modifications and variations to the present invention may be practiced by those of ordinary skill in the art, without departing from the spirit and scope of the present invention, which is more particularly set forth in the appended claims. Further, it should be understood that aspects of the various embodiments may be interchanged both in whole or in part. Furthermore, those of ordinary skill in the art will appreciate that the foregoing description is by way of example only, and is not intended to limit the invention so further described in such appended claims.

Claims (62)

1. A mammalian cell comprising a first Recombinant Target Site (RTS) chromosomally integrated at a first High Integration (HI) locus located within an active genomic compartment of accessible chromatin and within about 30,000 base pairs of a topologically related domain (TAD) boundary, the first HI locus overlapping a region in the genome of the cell that interacts with at least one enhancer element.
2. The cell of claim 1, wherein the first HI locus comprises SEQ ID NO: 1-125, or in SEQ ID No.: about 5,000 base pairs of the 5 'end or the 3' end of any one of 1-125 or overlap therewith.
3. The cell of claim 1, wherein the first HI locus overlaps with a Transcription Start Site (TSS) in the active genomic compartment.
4. The cell of claim 3, wherein the TSS is operably linked to an active gene, the expression or lack of expression of which is not essential to the mammalian cell.
5. The cell of claim 1, wherein the first HI locus does not overlap with a locus.
6. The cell of claim 1, wherein the first HI locus does not overlap with an endogenous promoter in situ of the locus.
7. The cell of claim 6, wherein the first HI locus is not within about 1,000 base pairs of the promoter.
8. The cell of claim 1, comprising a second, different RTS.
9. The cell of claim 8, wherein the first different RTS and the second different RTS are chromosomally integrated within the first HI locus.
10. The cell of claim 8, wherein the second, different RTS is chromosomally integrated within the second HI locus.
11. The cell of claim 8, wherein the second, different RTS is chromosomally integrated at a separate locus.
12. The cell of claim 11, wherein the separate locus is the Fer1L4 locus.
13. The cell of claim 1, comprising a plurality of additional different RTSs.
14. The cell of any one of claims 1-13, wherein at least one of the RTS is a frt site, a lox site, a rox site, or an att site.
15. The cell of any one of claims 1-14, wherein at least one of the RTS comprises an amino acid sequence selected from SEQ ID No.: 126 and 155.
16. The cell of any one of claims 1-15, wherein the mammalian cell is a mouse cell, a human cell, a Chinese Hamster Ovary (CHO) cell, a CHO-K1 cell, a CHO-DXB11 cell, a CHO-DG44 cell, a CHOK1SV cellTMOr a variant thereof, a CHO glutamine synthetase knockout cell or variant thereof, a HEK cell, a HEK293 cell or an adherence or suspension adapted variant thereof, a HeLa cell or an HT1080 cell.
17. The cell of any one of claims 1-16, further comprising a first gene of interest, wherein the first gene of interest is chromosomally integrated.
18. The cell of claim 17, wherein the first gene of interest comprises a reporter gene, a selection gene, a therapeutic gene of interest, an accessory gene, or a combination thereof.
19. The cell of claim 18, wherein the gene of therapeutic interest comprises a gene encoding a protein that is difficult to express.
20. The cell of claim 19, wherein the difficult-to-express protein is selected from the group consisting of an Fc-fusion protein, an enzyme, a membrane receptor, or a monoclonal antibody.
21. The cell of any one of claims 17-20, wherein the first gene of interest is located between two of the RTSs.
22. The cell of any one of claims 17-21, wherein the first gene of interest is located within the first HI locus.
23. The cell of any one of claims 1-22, further comprising a second gene of interest, wherein the second gene of interest is chromosomally integrated.
24. The cell of claim 23, wherein the second gene of interest is located within the first HI locus.
25. The cell of claim 23, wherein the first gene of interest is located within the first HI locus and the second gene of interest is located within the second HI locus or within a separate locus.
26. The cell of any one of claims 23-25, further comprising a third gene of interest, wherein the third gene of interest is chromosomally integrated.
27. The cell of claim 26, wherein the third gene of interest is located within the first HI locus or within the second HI locus or within the separate locus.
28. The cell of claim 27, wherein
a. At least one of the first, second and third genes of interest is located within the first HI locus, and
b. at least one of the first, second, and third genes of interest is located within the second HI locus.
29. The cell of any one of claims 1-28, further comprising a site-specific recombinase gene.
30. The cell of claim 29, wherein the site-specific recombinase gene is chromosomally integrated.
31. A method for producing a recombinant cell, comprising:
a. mapping peaks in accessible chromatin of a cellular genome;
b. identifying a first set of peaks in the active genomic compartment of the accessible chromatin and in about 30,000 base pairs of a topologically related domain (TAD) boundary among the mapped peaks;
c. defining a first High Integration (HI) locus within the first set of peaks, the first HI locus overlapping a region in the genome that interacts with at least one enhancer element; and
d. inserting a first Recombination Target Site (RTS) within the first HI locus.
32. The method of claim 31, wherein the first HI locus comprises SEQ ID No.: 1-125, or in SEQ ID No.: within or overlapping about 5,000 base pairs of the 5 'end or the 3' end of any of 1-125.
33. The method of claim 31, further comprising inserting a gene encoding a site-specific recombinase into the cell.
34. The method of claim 31, further comprising identifying those peaks in the first set of peaks that overlap with any transcription initiation site (TSS) of a gene (the expression product of the gene or its absence is not significant) and defining a second set of peaks that overlap with the gene and are located downstream of the TSS, wherein the first HI locus is defined in the second set of peaks.
35. The method of claim 31, further comprising identifying a third set of peaks in the first set of peaks that do not overlap with any gene, wherein the first HI locus is defined in the third set of peaks.
36. The method of claim 31, further comprising transfecting the cell with a first vector comprising an exchangeable cassette encoding a first gene of interest and integrating the first exchangeable cassette within the first HI locus.
37. The method of claim 36, further comprising selecting a recombinant protein producing cell comprising the first exchangeable cassette integrated into the chromosome.
38. The method of claim 36, wherein the first gene of interest comprises a reporter gene, a selection gene, a therapeutic gene of interest, an accessory gene, or a combination thereof.
39. The method of claim 38, wherein the gene of therapeutic interest comprises a gene encoding a protein that is difficult to express.
40. The method of claim 39, wherein the difficult-to-express protein consists of an Fc-fusion protein, an enzyme, a membrane receptor, or a monoclonal antibody.
41. The method of claim 31, further comprising identifying a second HI locus within the first set of peaks.
42. The method of any one of claims 31-41, further comprising inserting one or more additional RTS within the cell.
43. The method of claim 42, wherein the first gene of interest is located between two of the RTSs.
44. The method of any one of claims 31-43, further comprising transfecting the cell with a second vector comprising an exchangeable cassette encoding a second gene of interest and integrating the second exchangeable cassette within the cell.
45. The method of claim 44, wherein the second exchangeable cassette is integrated within the first HI locus.
46. The method of claim 44, wherein the second exchangeable cassette is integrated within the second HI locus.
47. A method for producing a recombinant cell, comprising:
a. mapping peaks in accessible chromatin of a cellular genome;
b. identifying a first set of peaks in the active genomic compartment of the accessible chromatin and in about 30,000 base pairs of a topologically related domain (TAD) boundary among the mapped peaks;
c. identifying a region of the genome within the accessible chromatin that interacts with at least one enhancer element;
d. defining a plurality of High Integration (HI) loci within the first set of peaks, each HI locus in the plurality overlapping with the identified region;
e. integrating a Recombinant Target Site (RTS) into a plurality of cells; and
f. selecting a cell from the plurality of cells that comprises the RTS integrated at the HI locus.
48. The method of claim 47, wherein the HI locus comprises SEQ ID No.: 1-125, or in SEQ ID No.: within or overlapping about 5,000 base pairs of the 5 'end or the 3' end of any of 1-125.
49. The method of claim 47, further comprising inserting a gene encoding a site-specific recombinase into the selected cell.
50. The method of claim 47, further comprising identifying those peaks in the first set of peaks that overlap with a Transcription Start Site (TSS) of an active gene (expression of the active gene or lack thereof having an unimportant function), and defining a second set of peaks that overlap with the active gene and are downstream of the TSS of the active gene, wherein the HI locus is defined in the second set of peaks.
51. The method of claim 47, further comprising identifying a third set of peaks in the first set of peaks that do not overlap with any gene, wherein the HI locus is defined in the third set of peaks.
52. The method of claim 47, further comprising transfecting a plurality of selected cells with a vector comprising an exchangeable cassette encoding a gene of interest and integrating the exchangeable cassette within the HI locus.
53. The method of claim 52, further comprising selecting a recombinant protein producing cell comprising the exchangeable cassette integrated into the chromosome.
54. The method of claim 52, wherein the gene of interest comprises a reporter gene, a selection gene, a therapeutic gene of interest, an accessory gene, or a combination thereof.
55. The method of claim 54, wherein the gene of therapeutic interest comprises a gene encoding a protein that is difficult to express.
56. The method of claim 55, wherein the difficult-to-express protein consists of an Fc-fusion protein, an enzyme, a membrane receptor, or a monoclonal antibody.
57. The method of claim 56, wherein the monoclonal antibody is a bispecific monoclonal antibody or a trispecific monoclonal antibody.
58. The method of any one of claims 47-57, further comprising inserting one or more additional RTS within the cell.
59. The method of claim 58, wherein the gene of interest is located between two of the RTSs.
60. The method of claim 47, wherein the RTS is integrated into the plurality of cells according to a random integration protocol.
61. The method of any one of claims 47 to 60, further comprising ranking the HI loci.
62. The method of claim 61, wherein the HI loci are ranked according to one or more of the expression level of one or more genes associated with each locus, the distance from each locus to the nearest TAD boundary, the number of enhancer interactions predicted at each locus, and the mRNA expression level of one or more genes associated with each locus.
HK62021042075.3A 2018-10-01 2019-10-01 Ssi cells with predictable and stable transgene expression and methods of formation HK40053255A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US62/739,546 2018-10-01

Publications (1)

Publication Number Publication Date
HK40053255A true HK40053255A (en) 2022-02-04

Family

ID=

Similar Documents

Publication Publication Date Title
CN111372946B (en) Multisite-specific integration of difficult-to-express proteins into cells
JP7549582B2 (en) SSI cells with predictable and stable transgene expression and methods of formation
JP7731388B2 (en) Mammalian cells for producing adeno-associated viruses
EP3510151B1 (en) High-throughput precision genome editing
ES2921137T3 (en) Carbon source-regulated protein production in a recombinant host cell
WO2023115732A1 (en) Single-pot methods for producing circular rnas
CN111971388B (en) Methods for cell selection and modification of cell metabolism
US20200263188A1 (en) Methods for genetic engineering kluyveromyces host cells
WO2021197342A1 (en) Active dna transposon systems and methods for use thereof
EP1098987B1 (en) Method for transformation of animal cells
US20240307558A1 (en) Gene therapy DNA vector based on gene therapy DNA vector VTvaf17 carrying the therapeutic gene selected from the group of KRT5, KRT14, LAMB3, and COL7A1 genes for increasing the expression level of these therapeutic genes, method of its production and use, Escherichia coli strain SCS110-AF/VTvaf17-KRT5, or Escherichia coli strain SCS110-AF/VTvaf17-KRT14, or Escherichia coli strain SCS110-AF/VTvaf17-LAMB3, or Escherichia coli strain SCS110-AF/VTvaf17-COL7A1 carrying the gene therapy DNA vector, m
CN120283052A (en) Compositions and methods for epigenetic regulation of HBV gene expression
WO2024240226A1 (en) Methods of forming circularized rna
HK40053255A (en) Ssi cells with predictable and stable transgene expression and methods of formation
WO2024198911A1 (en) Isolated transposase and use thereof
US20250320483A1 (en) Systems and methods for gene insertions
HK40032919A (en) Multi-site specific integration cells for difficult to express proteins
WO2025115876A1 (en) Cell production method, cells, and protein production method
CN1657628A (en) An expression vector for efficiently screening target proteins, its preparation method and use
WO2024092217A1 (en) Systems and methods for gene insertions
CN118638859A (en) Active DNA transposon system and method of using the same
JP2025514304A (en) Identifying tissue-specific extragenic safe harbors for gene therapy
CN118786222A (en) Polynucleotides with selectable markers
CN114026239A (en) MUT-Methanol Nutritional Yeast
HK40062436A (en) Mut- methylotrophic yeast