WO1997015588A9

WO1997015588A9 - Protective protein/cathepsin a and precursor: crystallization, x-ray diffraction, three-dimensional structure determination and rational drug design

Info

Publication number: WO1997015588A9
Application number: PCT/US1996/017325
Authority: WO
Filing date: 1996-10-25
Publication date: 1997-09-18

Abstract

The present invention provides crystallized protective protein/cathepsin A (PPCA), a precursor thereof (pPPCA) or at least one subdomain thereof; methods for x-ray diffraction analysis to provide x-ray diffraction patterns of sufficiently high resolution for three-dimensional structure determination of the protein, as well as methods for rational drug design (RDD), based on using amino acid sequence data and/or x-ray crystallography data provided on computer readable media, as analyzed on a computer system having suitable computer algorithms.

Description

Protective Protein/Cathepsin A and Precursor: Crystallization, X-Ray Diffraction, Three- Dimensional Structure Determination and Rational Drug Design

Background of the Invention

Statement as to Rights to Inventions Made Under Federally-Sponsored Research and Development

Part of the work performed during development of this invention utilized U.S Government funds The U S

Government has certain rights in this invention Field of the Invention The present invention is in the fields of molecular biology, protein purification, protein crystallization, x-ray diffraction analysis, three-dimensional structure determination and rational drug design (RDD) The present invention provides crystallized protective protein/cathepsin A (PPCA) and its precursor (pPPCA) The crystallized PPCA or pPPCA is analyzed by x-ray diffraction techniques The resulting x-ray diffraction patterns are of sufficiently high resolution to be useful for determining the three-dimensional structure of the PPCA or pPPCA protein, and for RDD Related Background Art

The human protective protein/cathepsin A (PPCA, also known as human protective protein or HPP) has been identified as the primary genetic defect underlying galactosialidosis (d'Azzo e/ a/ , Proc Natl Acad Sci US A 794535- 4539 ( 1982)), a lysosomal storage disease inherited as an autosomal recessive trait Patients with this disorder are diagnosed as having drastically reduced β-galactosidase and neuraminidase activities in their cell lysosomes Examples of lysosomal storage diseases are presented in Table 316- 1 of Braunwald et al . eds Harrison 's Principles of Internal Medicine, 1 1th Ed , pp 1661 -1671 , McGraw Hill Book Co , New York (1987), as well as Wenger et al , Biochem Biophys Res Commun #2:589-595 (1978), Tettamanti et al eds., Sialtdases and Sialidosis Perspectives in Inherited Metabolic Diseases, Vol 4, Edi. Ermes, Milano (1981), pp. 261-279 and 379-395, and van Diggelen et al Lancet 2 804( 1987), which references are entirely incorporated herein by reference. Researchers have proposed that one of PPCA 's functions is to stabilize β-galactosidase and neuraminidase in a multi-enzyme complex, which complex is deficient in galactosialidosis patients (d'Azzo et al (1982,), infra; Hoogeveen et al (1983 , infra) Evidence for this protective function comes from studies showing that PPCA is taken up from the culture medium by galactosialidosis fibroblasts and that PPCA restores both β-galactosidase and neuraminidase activities to these fibroblasts (d'Azzo et al ( I982A infra) The cD A for PPCA directs the synthesis of a 452 amino acid precursor PPCA (pPPCA) (Figure 13) with a molecular weight of 54 kDa (Galjart et al . Cell 54 755-764 (1988)) The amino acid sequences of PPCA (Figure 14) and pPPCA (Figure 13) contain two glycosylation sites (Asn 117 and Asn 305), both of which are glycosylated in cultured fibroblasts and cells over-expressing PPCA or pPPCA pPPCA dimeπzes soon after synthesis in the endoplasmιc retιcuιum (ER) (Zhou ef «/ EMBO J 70404-4048 (1991)) Lysosomal PPCA has cathepsin A/deamidase/esterase activities which are exerted in vitro on a specific subset of bioactive peptides Non-limiting examples of those hydrolyzed by PPCA are substance P and substance P-free acid, oxytocin and oxytocin-free acid, neurokinin A, angiotensin I, bradykinin (Jackman infra (1990) Furthermore, the enzyme inactivates endothe n I activity in rat smooth muscle cells and normal human tissues This activity was deficient in liver from a galactosialidosis patient (Itoh, infra 1995, Jackman et al J Biol Chem 267 2872-2875, ( 1992) Endothelins (ET-1. ET-2 and ET-3) are potent vasoconstrictors and elevate blood pressure in mammals The> also influence cell proliferation and hormone production and have been implicated in cardiovascular disorders, rangin from hypertension to stroke to ischemic heart disease (Rubanyi and Polokoff Pharmc Rev 46.325-415 (1994))

The three-dimensional structure of a PPCA or a pPPCA has not previously been published, which structure could delineate specific biological activities and ligands as therapeutics for PPCA-related pathologies Accordingly, there is a need to provide three-dimensional structures of at least one PPCA, pPPCA or ligands for diagnosis or therap> of PPCA-related pathologies Summary of the Invention

The present invention provides methods of expressing, purifying and crystallizing a human protective protein/cathepsin A (PPCA) and its precursor, precursor protective protein/cathepsin A (pPPCA) The present invention also provides methods for obtaining crystallized PPCA or pPPCA that can be analyzed to obtain x-ray diffraction patterns of sufficiently high resolution to be useful for three-dimensional structure determination of the protein

The x-ray diffraction patterns can be either analyzed directly to provide the three dimensional structure (if of sufficiently by high resolution), or atomic coordinates for the crystallized PPCA or pPPCA, as provided herein, can be used for structure determination The x-ray pattern/diffraction patterns obtained by methods of the present invention, and provided on computer readable media, are used to provide electron density maps The ammo acid sequence is also useful for three-dimensional structure determination The data is then used in combination with phase determination (eg , using multiple isomorphous replacement (MIR) molecular replacement techniques) to generate electron density maps of a PPCA or a pPPCA, using a suitable computer system

The electron density maps, provided by analysis of either the x-ray diffraction patterns or working backwards from the atomic coordinates, provided herein, are then fitted using suitable computer algorithms to generate secondary, tertiary and/or quaternary domains of a PPCA or a pPPCA, which domains are then used to provide an overall three- dimensional structure, as well as expected binding and active sites of the PPCA or pPPCA pPPCA his some of the active and binding sites of PPCA . except for changes in structure due to the presence of the portion of the pPPCA which is deleted during maturation to PPCA (e g , residues 285-298 of Figure 13)

Structure determination methods and computer systems are also provided by the present invention for rational drug design (RDD) These RDD methods use computer modeling programs to find potential ligands that are calculated to associate with, or bind to. sites or domains of a PPCA or a pPPCA Potential ligands are then screened for modulating or binding activity Such screening methods can be selected from assays for at least one PPCA-specifϊc structural feature or biological activity, preferably as associated with a PPCA- or pPPCA-related pathology, e g , protective activity (e g , modulation of β-galactosidase activity and neuraminidase (N A) activity), and peptide or enzyme modulating activity (eg of endotheiin I (serine carboxypeptidase), neuropeptides, cathepsin A, and the like), according to known assays The resulting ligands provided by methods of the present invention are synthesized and are useful for treating, inhibiting or preventing at least one of PPCA related pathology in a mammal

Other objects of the invention will be apparent to one of ordinary skill tn the art from the following detailed description and examples relating to the present invention Brief Description of the Figures

Figure 1 is a schematic ribbon diagram of the PPCA monomer (monomer 1), where Secondary structure assignments are according to DSSP (Kabsch and Sander, Bwpolymers 22.2577-2637 (1983)) The 'core' domain is shown in yellow The 'cap' domain consists of a 'helical' subdomam, in red, and a 'maturation' subdomain, in orange

The catalytic triad Ser 150, His 429 and Asp 372 (from right to left) is shown by small green spheres (Figure generated using MOLSCRIPT (Kraulis, J Appl Cryst 24 946-950 (1991)))

Figure 2 is stereo diagram is presented of the C. trace of the PPCA monomer 1 with numbering of selected residues The residues forming the α-helices and β-strands are as follows according to DSSP

Core domain Cβl (21-27), Cβ2(32-39), Cβ3(50-54) Cα I (63-67) Cβ4(73-75), Cβ5(82-84), Cβ6(94-98), Cα2( 1 18- 135). Cβ7( 144- 149). Cα3( 152- 163). Cβ8( 171 -177) Cα4(307-313). Cα5(316-321 ), Cα6(336-341 ), Cα7(350- 359), Cβ9(363-369) Cα8(377-386), Cβl0(391-401), Cβl 1(407-416) Cβl 2(419-424), Cα9(431 -434), Cαl0(436-447) Cap domain Hold 83-196), Hα2(202-212). Hα3 (226-240). Mβl(261 -264), Mβ2(267-270). Mαl(290-293), Mβ3(296-299) Note that for monomer 2 the secondary structure assignments in the cap domain are slightly different than in monomer 1 Residues in Hβl are in a region of poor density and Mo l is an extended coil (Figure generated using MOLSCRIPT (Kraulis (1991 ), infra) Figure 3 shows the density for the disulfide bridges Cys 212-Cys 228 and Cys 213-Cys 218 is presented as revealed in the SigmaA weighted 2mF₀-DF_c electron density map (Read, Ada Crvstallogr A 42 140-149 (1986)) calculated from the model refined to 2 2 A, the map has been contoured at l o (Figure drawn with the O computer program (Jones, Ada Crystallogr A47 U0-] \9 (1991))) Figure 4 is stereo diagram is presented of the superimposed C" traces from the two crystallographically independent PPCA monomers forming the dimer Monomer 1 is in blue, monomer 2 is in red Residues referred to in the text are labeled Residues 259 and 260 have not been incorporated in the model of monomer 2, since no electron density was observed for them Note the tremendous difference in conformation of the excision peptide located in the upper right corner of the proteins (Figure generated by MOLSCRIPT (Kraulis (1991), infra)) Figure 5 is a schematic ribbon diagram is presented of the PPCA dimer viewed approximately along the two¬ fold axis For monomer 1, the core domain is yellow while the cap domain consists of a helical subdomain in red and a maturation subdomain in orange For monomer 2, the core domain is green, while the cap domain consists of a blue helical subdomain and a light blue maturation subdomain (Figure generated using MOLSCRIPT (Kraulis (1991), infra)) Figure 6A-B is a representation of the molecular surface of the PPCA dimer The surface was calculated with GRASP (Nicholls, A , et al , Proteins //.281-296 (1991)) and colored according to the electrostatic potential Dark blue corresponds to positive potential > + 15 0 kT/e and dark red to a negative <-l 5 0 kT/e potential Figure 6A standard view, along the diad with the dimer oriented as in Figure 4 Figure 6B side view of the dimer, ninety degrees rotated with respect to 6 A

Figure 7A-F presents a topological comparison of 6 members of the hydrolase fold family The arrangement of structural elements in the central core domain (in green and yellow) of the different proteins is generally similar The cap domains (in red) vary greatly The following structures are shown starting from the top left hand corner (references and PDB entry codes are given in between brackets) Figure 7A shows the PPCA precursor cap domain that consists of two subdo ains one α-helical and the other mainly β-sheet, Figure 7B shows CPW (3SC2, Liao et al (1992) infra), cap domain helical, Figure 7C shows CPY (LYSC, Endπzzi et al (1994), infra), cap domain helical, Figure 7D shows dehalogenase (2HAD, Franken et al . J EMBO 10 1297-1302 (1991)), cap domain helical but quite different from the serine carboxypeptidases, Figure 7E shows lipase from Pseudomonas glumae (1TAH, Noble et al , FEBSLett 331 123- 128 (1993)), cap domain mixed α-helical and β-strands, and Figure 7F shows acetylcholine esterase (1 ACE, Sussman et al , Science 253 872-879 (1991)), cap domain large and predominantly α-helical The secondary structure assignments were generated with the computer program O using structures provided and/or available from the Brookhaven Protein Data Bank (This Figure was generated using MOLSCRIPT (Kraulis (1991 ), infra))

Figure 8A-B shows the superposition of the C traces from the PPCA and CPW monomers, showing that the major differences between the two enzymes are localized in the cap domain PPCA has a large 'maturation' subdomain and the 'helical subdomain' is rotated with respect to the CPW counterpart (Figure drawn with the O program (Jones (1991), infra)) Figure 8B shows the C traces from the PPCA and CPW di ers after the core domains from the subunits (shown on the right hand side of the two dimers) have been superimposed Notice the remarkable difference in mutual orientation (of 15°) of the two subunits on the left hand side of the two dimers, which has been accentuated by an arrow (Figure drawn with the O computer program (Jones ( 1991 ), supra))

Figure 9 is a stereo view of the Ca trace of PPCA monomer 1 highlighting regions involved in the maturation event Color scheme for the trace is as follows core domain in light blue, helical subdomain in red, maturation subdomain in orange with the exception of the excision peptide (residues 285-298) which is shown in blue Orange sphere mark the residues 272 and 277 marking the beginning and end of the blocking peptide The catalytic triad Ser 150. His 429 and Asp 372 is shown as light blue spheres Two cystemes Cys 253 and Cys 303 referred to in the discussion are colored green (This Figure generated using MOLSCRIPT (Kraulis (1991 ). infra))

Figure 10 is a close-up representation of the 'blocking' peptide (residues 272-277) bound in the active site rendering the catalytic triad solvent inaccessible Residues from the maturation subdomain are shown in orange residues fro the helical domain in magenta and residues from the core domain in cyan. The excision peptide is shown in blue. Side chains are shown for residues making extensive contacts with the blocking peptide or if mentioned in the text. The catalytic triad is shown in white. (Figure drawn with O (Jones (1991), infra)).

Figure 11 is a representation of elements proposed to be involved in the activation mechanism of the precursor form of PPCA as discussed in the text. The C'-trace of the core domain is shown in cyan, the helical subdomain in red, the maturation subdomain in orange, and the excision peptide is shown in blue. Relevant side chains are depicted and labeled. Rearrangement of the residues 254-302 limited by the disulfide Cys 253 and Cys 303 would free up the active site cleft. A charge cluster Arg 262, Glu 264, Arg 298 and Asp 300 occupies a strategic position within the maturation subdomain, possibly involved in pH dependent regulation of conformational changes. The solvent accessible surface was calculated and visualized with the atomic coordinates by BIOGRAF (BIOGRAF Construct Users Guide Version 3.2.1. , June 1993).

Figure 12 is a schematic representation of the proposed activation of PPCA. The active site cleft is formed by the core domain (indicated as 'core' in the above scheme) and the helical subdomain (indicated as 'o'). The maturation subdomain (indicated as 'm') contains the residues that block the active site cleft rendering the precursor enzymatically inactive, shown in structure 1. In the acidic endosome/lysosome, the precursor undergoes activation. In activation pathway 2a, conformational rearrangements induced by low pH might render the excision peptide more accessible to proteases as a first step, followed by cleavage of the polypeptide chain removing the excision peptide. Alternatively, in pathway 2b, proteolytic cleavage of the excision peptide might form the trigger for the total rearrangement, removing the blocking peptide from the active site and thus generating the fully active enzyme as shown in structure 3. Figure 13 shows the amino acid sequence of a human pPPCA. The underlined portion (residues 285-298) shows an excision peptide for conversion to the mature form, PPCA. Figure 14 shows the amino acid sequence of a human PPCA.

Figure 15 shows a sequence alignment between pPPCA, CPW and CPY (top three sequences shov/n). Identical residues among all three sequences are boxed. Residue numbering is included for the pPPCA amino acid sequence. The alignment was made using the GCG program PILEUP (GCG version 8), then manually adjusted using 3D-structural knowledge from the superposition of the CPW (Liao et al., 1992) and CPY (Endrizzi et al., 1994) atomic coordinates. The alignment was later used to design a multi-Ala search probe for molecular replacement calculations shown in the fourth sequence shown as 'model'. The structure determination of pPPCA subsequently revealed that the protein can be divided in two domains: a 'core' domain (residues 1-182 and 303-452) and 'cap' domain (residues 183-302). The secondary structure elements for the PPCA precursor are depicted with shaded bars (for details on the assignment and nomenclature, see Rudenko et al. Structure 3: 1249-1259 (1988) ).

Figure 16 shows a schematic representation of a 'bootstrapping' cycle as described in Example 2. Figure 17 is a representation of an initial molecular mask enlarged to accommodate missing area's in the model. The program MAMA (Kleywegt & Jones, 1994) was used to calculate the mask and mask editing options in O (Jones et al. , 1991 ) were used to extend the mask.

Figure 18 is a representation of an enlargement of the model during the bootstrapping procedure plotted as a function of the expansion step. The number of C atoms incorporated in the model per monomer is given ( — ° — ) as well as the number of correct side chains (-« -). Note that after the first round of building in the molecular replacement map (expansion step ' r").37 residues from the molecular replacement search probes had to be deleted from the model reducing the number of C* atoms to 294. Subsequent cycles allowed for the model to be expanded by small increments. Figure 19 is a representation of a comparison of the C" trace from a monomer core model (shown in magenta) and the complete PPCA monomer (shown in yellow). The core model contained only 294 C atoms. The 452 residue PPCA monomer consists of a core domain and a cap domain. The helical subdomain and the maturation subdomain forming the cap domain have been shown in the figure above. Figure 20A-D is a representation of the resolving power of the bootstrapping procedure showing three different stages in map quality The atomic coordinates of the refined model are visualized with the electron density in Figures 20B. 20C and 20D Figures 20A and 20B show the initial 2m|F_obs|-D|F_c,|_C| SigmaA weighted map calculated using phases from the molecular replacement solution The electron density is essentially untnterpretable Fig. 20C shows twofold averaged 2|F„ - 1 F,„_v | electron density map calculated using inverted phases from cycle bmc6 The density for β-strand Mβ2 (residues 266-271) has become clearly visible Fig. 20D shows unaveraged 2m|F_obJ|-D|F_c,,_c| SigmaA weighted map calculated using phases from the refined model The quality of the density is very good Density for the helix Mαl (residues 287-293) which assumes a different conformation in the two monomers is now also apparent

Figure 21 shows a Ramachandran plot calculated for one monomer from a refined model of a pPPCA Both monomers in the asymmetric unit give essentially equivalent plots

Figure 22 shows a schematic of a computer system for PPCA or pPPCA structure determination and/or rational drug design

Figure 23.1-52 lists the atomic coordinates for the active site of a pPPCA dimer having the ammo acid sequence presented as portions of at least one of 50-76, 144-155, 173-197, 226-253, 226-288, 294-310, 327-344, 338- 350, 366-381 and 423-436 of (Figure 23 1-23 26) 452 ammo acids (designated 1 -452) of monomer 1, as well as corresponding portions of (Figure 23 26-23 52) 452 amino acids (designated 1001-1452) of monomer 2

Detailed Description of the Preferred Embodiments

The present invention provides methods for expressing, purifying and crystallizing a protective protein/cathepsin A (PPCA) or a precursor protective protein/cathepsin A (pPPCA), where the crystals diffract x-rays with sufficiently high resolution to allow determination of the three-dimensional structure of the PPCA or pPPCA, or a portion or subdomain thereof The three-dimensional structure (e g ,as provided on computer readable media of the present invention) is useful for rational drug design of ligands of a PPCA or a pPPCA Such ligands can be synthesized or recombinantly produced and are useful as diagnostic agents or drugs for diagnosing, treating, inhibiting or preventing at least one PPCA- or pPPCA-related pathology The determined structure is made using the PPCA or pPPCA ammo acid sequences and or atomic coordmate/x- ray diffraction data, which are analyzed to provide atomic model output data corresponding to the three-dimensional structure, e g as provided on computer readable media The computer analysis of the atomic coordinate/x-ray diffraction data and/or the amino acid sequence allows the calculation of the secondary, tertiary and/or quaternary structures, domains, and/or subdomains of the protein These domains are combined and refined by additional calculations using suitable computer subroutines to determine the most probable or actual three-dimensional structure of the PPCA or pPPCA, including potential or actual active sites, binding sites or other structural or functional domains or subdomains of the protein.

Structure determination methods are also provided by the present invention for rational drug design (RDD) of PPCA or pPPCA ligands Such drug design uses computer modeling programs that calculate different molecules expected to interact with the determined active sites, binding sites, or other structural or functional domains or subdomains of a PPCA or a pPPCA These ligands can then be produced and screened for activity in modulating or binding to a PPCA or pPPCA, according to methods and compositions of the present invention

The actual PPCA or pPPCA-ligand complexes can optionally be crystallized and analyzed using x-ray diffraction techniques The diffraction patterns obtained are similarly used to calculate the three-dimensional interaction of the ligand and the PPCA or pPPCA, to confirm that the ligand binds to or changes the conformation of, particular domaιn(s) or subdomaιn(s) of the PPCA or pPPCA Such screening methods are selected from assays for at least one biological activity of a PPCA or a pPPCA The resulting ligands, provided by methods of the present invention, modulate or bind at least one PPCA or pPPCA and are useful for diagnosing treating or preventing PPCA- or pPPCA- related pathologies in animals, such as humans Ligands of a particular PPCA or pPPCA can similarly modulate o_ther PPCAs or pPPCAs from other sources, such as other eukaryotes A PPCA or pPPCA is also provided as a crystallized protein suitable for x-ray diffraction analysis. The x-ray diffraction patterns obtained by the x-ray analysis are of moderate, to moderately high, to high resolution, e.g.. 30-10, 10-3.5 or 1.5-3.5 A, respectively, with the higher resolutions included. These diffraction patterns are suitable and useful for three-dimensional structure determination of a PPCA or a pPPCA, domain or subdomain thereof. The determination of the three-dimensional structure of a PPCA or pPPCA has a broad- based utility.

Significant sequence identity and conservation of important structural elements are expected to exist among different PPCAs or pPPCAs. Therefore, the three-dimensional structure from one or few PPCAs or pPPCAs can be used to identify ligands that have diagnostic or therapeutic value for at least one PPCA- or pPPCA-related pathology that may involve PPCAs or pPPCAs having different amino acid sequences. Determination of Protein Structures

Different techniques give different and complementary information about protein structure. The primary structure is obtained by biochemical methods, either by direct determination of the amino acid sequence from the protein, or from the nucleotide sequence of the corresponding gene or cDNA. The quaternary structure of large proteins or aggregates can also be determined by electron microscopy. To obtain the secondary and tertiary structure, which requires detailed information about the arrangement of atoms within a protein, x-ray crystallography is preferred. See, e.g., Blundell, infra; Oxender, infra; McPherson, infra; Wyckoff, infra.

The first prerequisite for solving the three-dimensional structure of a protein by x-ray crystallography is a well- ordered crystal that will diffract x-rays strongly. The crystallographic method directs a beam of x-rays onto a regular, repeating array of many identical molecules so that the x-rays are diffracted from it in a pattern from which the structure of an individual molecule can be retrieved. Well-ordered crystals of globular protein molecules are large, spherical, or ellipsoidal objects with irregular surfaces, and crystals thereof contain large holes or channels that are formed between the individual molecules. These channels, which usually occupy more than half the volume of the crystal, are filled with disordered solvent molecules. The protein molecules are in contact with each other at only a few small regions. This is one reason why structures of proteins determined by x-ray crystallography are generally the same as those for the proteins in solution.

The formation of crystals is dependent on a number of different parameters, including pH, temperature, protein concentration, the nature of the solvent and precipitant, as well as the presence of added ions or ligands to the protein. Many routine crystallization experiments may be needed to screen all these parameters for the few combinations that might give crystal suitable for x-ray diffraction analysis. Crystallization robots can automate and speed up the work of reproducibly setting up large numbers of crystallization experiments.

A pure and homogeneous protein sample is important for successful crystallization. Proteins obtained from cloned genes in efficient expression vectors can be purified quickly to homogeneity in large quantities in a few purification steps. A protein to be crystallized is preferably at least 93-99% pure according to standard criteria of homogeneity. Crystals form when molecules are precipitated very slowly from supersaturated solutions. The most frequently used procedure for making protein crystals is the hanging-drop method, in which a drop of protein solution is brought very gradually to supersaturation by loss of water from the droplet to the larger reservoir that contains salt or polyethylene glycol solution.

Different crystal forms can be more or less well-ordered and hence give diffraction panerns of different quality. As a general rule, the more closely the protein molecules pack, and consequently the less water the crystals contain, the better is the diffraction pattern because the molecules are better ordered in the crystal.

X-rays are electromagnetic radiation at short wavelengths, emitted when electrons jump from a higher to a lower energy state, in conventional sources in the laboratory, x-rays are produced by high-voltage tubes in which a metal plate, the anode, is bombarded with accelerating electrons and thereby caused to emit x-rays of a specific wavelength, so-called monochromatic x-rays. The high voltage rapidly heats up the metal plate, which therefore has to be cooled Efficient cooling is achieved by so-called rotating anode x-ray generators, where the metal plate revolves during the experiment so that different parts are heated up

More powerful x-ray beams can be produced m synchrotron storage rings where electrons (or positrons) travel close to the speed of light These particles emit very strong radiation at all wavelengths from short gamma rays to visible light When used as an x-ray source, only radiation within a window of suitable wavelengths is channeled from the storage ring Polychromatic x-ray beams are produced by having a broad window that allows through x-ray radiation with wavelengths of 0 2 - 3 5 A

In diffraction experiments a narrow and parallel beam of x-rays is taken out from the x-ray source and directed onto the crystal to produce diffracted beams The incident primary beam causes damage to both protein and solvent molecules The crystal is, therefore, usually cooled to prolong its lifetime (e g , -220 to -50°C) The primary beam must strike the crystal from many different directions to produce all possible diffraction spots, and so the crystal is rotated in the beam during the experiment

The diffracted spots are recorded either on a film, the classical method, or by an electronic detector The exposed film has to be measured and digitized by a scanning device, whereas electronic detectors feed the signals they detect directly in a digitized form into a computer Electronic area detectors (an electronic film) significantly reduce the time required to collect and measure diffraction data

When the primary beam from an x-ray source strikes the crystal, some of the x-rays interact with the electrons on each atom and cause them to oscillate The oscillating electrons serve as a new source of x-rays, which are emitted in almost all directions, referred to as scattering When atoms (and hence their electrons) are arranged in a regular three- dimensional array, as in a crystal, the x-rays emitted from the oscillating electrons interfere with one another In most cases, these x-rays, colliding from different directions, cancel each other out, those from certain directions, however, will add together to produce diffracted beams of radiation that can be recorded as a pattern on a photographic plate or detector

The diffraction pattern obtained in an x-ray experiment is related to the crystal that caused the diffraction X- rays that are reflected from adjacent planes travel different distances, and diffraction only occurs when the difference in distance is equal to the wavelength of the x-ray beam This distance is dependent on the reflection angle, which is equal to the angle between the primary beam and the planes

The relationship between the reflection angle (θ), the distance between the planes (d), and the wavelength (λ) is given by Bragg's law 2d sin θ = λ This relation can be used to determine the size of the unit cell in the crystal Briefly, the position on the film of the diffraction data relates each spot to a specific set of planes through the crystal By using Bragg's law, these positions can be used to determine the size of the unit cell

Each atom in a crystal scatters x-rays in all directions, and only those that positively interfere with one another, according to Bragg's law, give rise to diffracted beams that can be recorded as a distinct diffraction spot above background Each diffraction spot is the result of interference of all x-rays with the same diffraction angle emerging from all atoms For example, for the protein crystal of myoglobin, each of the about 20,000 diffracted beams that have been measured contain scattered x-rays from each of the around 1500 atoms in the molecule To extract information about individual atoms from such a system requires considerable computation The mathematical tool that is used to handle such problems is called the Fourier transform

Each diffracted beam, which is recorded as a spot on the film, is defined bv three properties the amplitude which we can measure from the intensity of the spot, the wavelength, which is set by the x-ray source and the phase, which is lost in x-ray experiments All three properties are needed for all of the diffracted beams, in order to determine the position of the atoms giving rise to the diffracted beams

For larger molecules protein crvstallographers have determined the phases in manv cases using a method called multiple isomorphous replacement (MIR) (including heavy metal scattering), which requires the introduction of new x-ra> scatterers into the unit cell of the crystal These additions are usually heavy atoms (so that they make a significant contribution to the diffraction pattern), such that there should not be too many of them (so that their positions can be located); and they should not change the structure of the molecule or of the crystal cell, i.e., the crystals should be isomorphous. lsomorphous replacement is usually done by diffusing different heavy-metal complexes into the channels of the preformed protein crystals. The protein molecules expose side chains (such as SH groups) into these solvent channels that are able to bind heavy metals. It is also possible to replace endogenous light metals in metalloproteins with heavier ones, e.g., zinc by mercury, or calcium by samarium.

Since such heavy metals contain many more electrons than the light atoms (H, N, C, 0, and S) of the protein, they scatter x-rays more strongly. All diffracted beams would therefore increase in intensity after heavy-metal substitution if all interference were positive. In fact, however, some interference is negative; consequently, following heavy-metal substitution, some spots measurably increase in intensity, others decrease, and many show no detectable difference.

Phase differences between diffracted spots can be determined from intensity changes following heavy-metal substitution. First, the intensity differences are used to deduce the positions of the heavy atoms in the crystal unit cell. Fourier summations of these intensity differences give maps of the vectors between the heavy atoms, the so-called Patterson maps. From these vector maps the atomic arrangement of the heavy atoms is deduced. From the positions of the heavy metals in the unit cell, one can calculate the amplitudes and phases of their contribution to the diffracted beams of protein crystals containing heavy metals.

This knowledge is then used to find the phase of the contribution from the protein in the absence of the heavy- metal atoms. As both the phase and amplitude of the heavy metals and the amplitude of the protein alone is known, as well as the amplitude of the protein plus heavy metals (i.e., protein heavy-metal complex), one phase and three amplitudes are known. From this, the interference of the x-rays scattered by the heavy metals and protein can be calculated to see if it is constructive or destructive. The extent of positive or negative interference, with knowledge of the phase of the heavy metal, give an estimate of the phase of the protein. Because two different phase angles are determined and are equally good solutions, a second heavy-metal complex can be used which also gives two possible phase angles. Only one of these will have the same value as one of the two previous phase angles; it therefore represents the correct phase angle. In practice, more than two different heavy-metal complexes are usually made in order to give a reasonably good phase determination for all reflections. Each individual phase estimate contains experimental errors arising from errors in the measured amplitudes. Furthermore, for many reflections, the intensity differences are too small to measure after one particular isomorphous replacement, and others can be tried. The amplitudes and the phases of the diffraction data from the protein crystals are used to calculate an electron- density map of the repeating unit of the crystal. This map then has to be interpreted as a polypeptide chain with a particular amino acid sequence. The interpretation of the electron-density map is made more complex by several limitations of the data. First of all, the map itself contains errors, mainly due to errors in the phase angles. In addition, the quality of the map depends on the resolution of the diffraction data, which in turn depends on how well-ordered the crystals are. This directly influences the image that can be produced. The resolution is measured in A units; the smaller this number is. the higher the resolution and therefore the greater the amount of detail that can be seen.

Building the initial model is a trial-and-error process. First, one has to decide how the polypeptide chain weaves its ww through the electron-density map. The resulting chain trace constitutes a hypothesis, by which one tries to match the ' .nsity of the side chains to the known sequence of the polypeptide. When a reasonable chain trace has finally been obtained, an initial model is built to give the best fit of the atoms to the electron density. Computer graphics are used both for chain tracing and for model building to present the data and manipulated the models.

The initial model will contain some errors. Provided the protein crystals diffract to high enough resolution (e.g., better than 3.5 A), most or substantially all of the errors can be removed by crystallographic refinement of the model using computer algorithms. In this process, the model is changed to minimize the difference between the experimentally observed diffraction amplitudes and those calculated for a hypothetical crystal containing the model (instead of the real molecule) This difference is expressed as an R factor (residual disagreement) which is 00 for exact agreement and about 0 59 for total disagreement

In general, the R factor is preferably between 0 15 and 0 35 (such as less than about 0 24-028) for a well- determined protein structure The residual difference is a consequence of errors and imperfections in the data These derive from various sources, including slight variations in the conformation of the protein molecules, as well as inaccurate corrections both for the presence of solvent and for differences in the orientation of the microcrystals from which the crystal is built This means that the final model represents an average of molecules that are slightly different both in conformation and orientation

In refined structures at high resolution, there are usually no major errors in the orientation of individual residues, and the estimated errors m atomic positions are usually around 0 1-0 2 A, provided the am o acid sequence is known Hydrogen bonds, both within the protein and to bound ligands, can be identified with a high degree of confidence

Most x-ray structures are determined to a resolution between 1 7 A and 3 5 A Electron-density maps with this resolution range are preferably interpreted by fitting the known amino acid sequences into regions of electron density in which individual atoms are not resolved

An ammo acid sequence is preferred for accurate x-ray structure determination Thus, recombinant DNA techniques have had a double impact on x-ray structural work When a protein is cloned and overexpressed for structural studies, the ammo acid sequence, necessary for the x-ray work, is also quickly obtained via the nucleotide sequence Recombinant DNA techniques give us not only abundant supplies of rare proteins, but also their ammo acid sequence as a bonus See, e g Blundell, infra, Oxender, infra, McPherson, infra Wyckoff, infra Isolated PPCA and pPPCA Polypeptides

A PPCA or pPPCA polypeptide can refer to any subset of a PPCA or pPPCA as a domain, subdomain, fragment, consensus sequence or repeating unit thereof A PPCA or pPPCA polypeptide of the present invention can be prepared by, e g (a) recombinant DNA methods,

(b) proteolytic digestion of the intact molecule or a domain, subdomain or fragment thereof,

(c) chemical peptide synthesis methods well-known in the art, and/or

(d) by any other method capable of producing a PPCA or pPPCA polypeptide and having a conformation similar to a structural or functional subdomain of a PPCA or a pPPCA A biological activity of PPCA or pPPCA can be screened according to known screening assays The minimum peptide sequence to have activity is based on the smallest unit containing or comprising a particular domain subdomain, fragment, region, consensus sequence, or repeating unit thereof, having at least one biological activity of a PPCA or pPPCA, such as protecting activity, inhibiting activity or enzyme activity Non-limiting examples of such activities are protecting activity for β-galactosidase or neuraminidase (NA), modulating activity (inhibition, stimulation or activation) as an for endotheiin I (serine carboxypeptidase) or cathepsin A and peptide hydrolyzmg activity (e g substance P and substance P-free acid, oxytocin and oxytocin-free acid, neurokinin A, angiotensin I, and bradykinin

According to the present invention, a PPCA or pPPCA includes an association of two or more polypeptide subdomains, such as at least one 4 ammo acid portion of a core or cap domain of a PPCA or pPPCA This can include 1 -14 subdomains of the cap domain and/or 1-44 subdomains of the core domain (as monomers or dimers), or any range, value or combination thereof Preferably 1 -4 sets of each of at least one core or cap domains or subdomains are included

The structure of a monomer or domain of at least one PPCA includes at least one subdomain of a PPCA of a pPPCA of the present invention can include one or more of the following subdomains, as described herein Generally a PPCA or pPPCA consists of a dimer of a core domain and a cap domain having the following subdomains having the specified residues, e g . as presented in Figure 13 (pPPCA) or Figure 14 (PPCA) Core domain subdomains: Cβl, 21-27; Cβ2, 32-39; Cβ3, 50-54; Cαl , 63-67; Cβ4, 73-75; Cβ5, 82- 84; Cβ6, 94-98; Cα2, 1 18-135; Cβ7, 144-149; Cα3, 152-163; Cβ8, 171-177; Co4, 307-313; Cα5, 316-321 ; Cα6, 336-341 ; Cα7, 350-359; Cβ9, 363-369; Cα8, 377-386; CβlO, 391-401 ; Cβ 1 1 , 407-416; Cβl 2, 419-424; Cα9, 431-434; CαlO, 436-447; and Cap domain subdomains: Hal, 183-196; Ha2, 202-212; Ha 3, 226-240; Mβl, 261-264; Mβ2, 267-

270; Mai, 290-293; Mβ3, 296-299. Note that for monomer 2 the secondary structure assignments in the cap domain are slightly different than in monomer I .

A PPCA or pPPCA polypeptide of the invention can have at least 80% homology, such as 80-100% overall homology or identity, with one or more corresponding PPCA or pPPCA subdomains or fragments as described herein, such as a 4-542 amino acid fragment or portion of the amino acid sequence of Figures 13, 14 or 15. As would be understood by one of ordinary skill in the art, the above configurations of subdomains are provided as part of a PPCA or pPPCA polypeptide of the invention, when expressed in a suitable host cell, or otherwise synthesized, to provide at least one structural or functional feature of a native PPCA or pPPCA, such as at least one PPCA-related biological activity. Such activities can be assayed using a suitable assay, to establish at least one PPCA biological .activity of one or more PPCAs or pPPCAs of the invention. A PPCA or pPPCA polypeptide of the invention is not naturally occurring or is naturally occurring but is in a purified or isolated form which does not occur in nature. Examples of suitable PPCA activity assay include, e.g., cathepsin A activity (Galjart e/ a/., J. Biol. Chem. 266:14754-14762 (1991); Endotheiin I deamidase activity (Jackman, et al., J. Biol. Chem. 267:2872-2875(1992); and tachykinin deamidase activity (Jackman, et al.. J Biol Chem. 265:1 1265-1 1272 (1990)). Percent homology or identity can be determined, for example, by comparing sequence information using the

GAP computer program, version 6.0, available from the University of Wisconsin Genetics Computer Group (UWGCG).

The GAP program utilizes the alignment method of Needleman and Wunsch (J. Mol. Biol. 48:443 (1970), as revised by Smith and Waterman (Adv. Appl. Math. 2:482 (1981). Briefly, the GAP program defines similarity as the number of aligned symbols (i.e., nucleotides or amino acids) which are similar, divided by the total number of symbols in the shoπer ofthe two sequences. The preferred default parameters for the GAP program include: (1) a unitary comparison matrix (containing a value of 1 for identities and 0 for non-identities) and the weighted comparison matrix of Gribskov and Burgess, Nucl. Acids Res. 14:6745 (1986), as described by Schwartz and Dayhoff, eds., ATLAS OF PROTEIN

SEQUENCE AND STRUCTURE, National Biomedical Research Foundation, pp. 353-358 (1979); (2) a penalty of 3.0 for each gap and an additional 0.10 penalty for each symbol in each gap; and (3) no penalty for end gaps. Thus, one of ordinary skill in the art, given the teachings and guidance presented in the present specification, will know how to add, delete or substitute other amino acid residues in other positions of a PPCA or pPPCA to obtain substituted, deletional or additional variants thereof.

Non-limiting examples of substitutions of a PPCA or pPPCA domains or polypeptide of the invention are those in which at least one amino acid residue in the protein molecule has been removed and a different residue added in its place according to the following Table 2. The types of substitutions which can be made in the protein or peptide molecule of the invention can be based on analysis of the frequencies of amino acid changes between a homologous protein of different species, such those presented in Figure 15. Based on such an analysis, alternative substitutions are defined herein as exchanges within one of the following five groups.

I Small aliphatic, nonpolar or slightly polar residues Ala. Ser, Thr (Pro. Gly), 2 Polar, negatively charged residues and their amides Asp, Asn. Glu. Gin.

3 Polar, positively charged residues

4 Large aliphatic, nonpolar residues Mel. Leu. lie. Val (Cys), and 5 Large aromatic residues Phe. Tyr. Trp

Most deletions and additions, and substitutions according to the invention are those which do not produce radical changes in the characteristics of the protein or peptide molecule "Characteristics" is defined in a non-inclusive manner to define both changes in secondary structure, e g α-helix or β-sheet, as well as changes in physiological activity, e g m biological activity assays However, when the exact effect of the substitution, deletion, or addition is to be confirmed, one skilled in the art will appreciate that the effect of at least one substitution, addition or deletion will be evaluated by at least one PPCA or pPPCA screening assay, such as, but not limited to, immunoassays or bioassays, to confirm at least one PPCA or pPPCA biological activity

Surprisingly, a PPCA and or a pPPCA is now discovered to have serine carboxypeptidase activity and corresponding structural features, although having only about 30% sequence identity to wheat and yeast serine carboxypeptidases These carboxypeptidases are members of the hydrolase fold family (Liao et al Biochemistry 31 9796-9812 (1992), Endπzzi etal .Biochemistry 33 1 1106-11120 (1994), Ollis etal, Protein Eng 5 197-21 1 (1992)) The serine carboxypeptidases have peptidase activity at acidic pH ( pH 4 5-5 5) as well as deamidase and esterase activities at pH 7 (reviewed in Breddam et al Carlsberg Res Commun 51 83- 128 ( 1986), Raw ngs & Barrett, Methods in Enzymology 244 19-61 (1 94)) Mutagenesis studies and enzymatic assays have revealed that only the mature form of PPCA possesses a serine carboxypeptidase activity, which is similar to that of lysosomal cathepsin A, and has a preference for hydrophobic substrates such as the dipeptide Phe-Ala (Galjart et al , J Biol Chem 266 14754-14762 (1991)) On the basis of sequence alignments with members of the serine carboxypeptidase family, mutagenesis studies and the structure determination of pPPCA, the catalytic triad in PPCA has now been determined to be formed by the residues Ser 150, His 429 and Asp 372 PPCA andpPPCA Expression for Isolation and Purification

A nucleic acid sequence encoding a PPCA or a pPPCA (Galjart et al , Cell 54 755-764 (1 88)) can be recombined with vector DNA in accordance with conventional techniques, including blunt-ended or staggered-ended termini for gation, restriction enzyme digestion to provide appropriate termini, filling in of cohesive ends as appropriate, alkaline phosphatase treatment to avoid undesirable joining, and gation with appropriate ligases Techniques for such manipulations are disclosed, e g , in Sambrook et al , Molecular Cloning A Laboratory Manual, Second edition, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1989), and Ausubel et al ,Current Protocols in Molecular Biology, Wiley Interscience, N Y , ( 1988- 1995) and are well known in the art

A nucleic acid molecule, such as DNA, is said to be "capable of expressing" a polypeptide if it contains nucleotide sequences which contain transcriptional and translational regulatory information and such sequences are "operably linked" to nucleotide sequences which encode the polypeptide An operable linkage is a linkage in which the regulatory DNA sequences and the DNA sequence sought to be expressed are connected in such a way as to permit gene expression as a PPCA , pPPCA or fragment thereof, in recoverable amounts The precise nature of the regulatory regions needed for gene expression can vary from organism to organism, as is well known in the analogous art See, eg , Sambrook, infra and Ausubel, infra

The invention accordingly encompasses the expression of a PPCA or a pPPCA, in either prokaryotic or eukaryotic cells, although eukaryotic expression is preferred Preferred hosts are bacterial or eukaryotic hosts including bacteria, yeast, insects, fungi, bird and mammalian cells either in vivo, or in situ, or host cells of mammalian, insect, bird or yeast origin It is preferred that the mammalian cell or tissue is of human, primate, hamster, rabbit, rodent, cow, pig, sheep, horse, goat, dog or cat origin, but any other mammalian cell can be used

Eukaryotic hosts can include yeast, insects, fungi, and mammalian cells either in vivo, or in tissue culture Preferred eukaryotic hosts can also include, but are not limited to insect cells, mammalian cells either in vivo, or in tissue culture Preferred mammalian cells include Xenopus oocytes, HeLa cells, cells of fibroblast origin such as VERO or CHO- 1 or cells of lymphoid origin and their derivatives

Mammalian cells provide post-translational modifications to protein molecules including correct folding or giycosylation at correct sites Mammalian cells which can be useful as hosts include cells of fibroblast origin such as but not limited to NIH 3T3 VERO or CHO or cells of lymphoid origin, such as, but not limited to the hybridoma SP2/0-Agl4 or the muπne myeloma P3-X63Ag8 hamster cell lines (e , CHO-K I and progenitors, e g , CHO- DUXB 1 1 ) and their derivatives One preferred type of mammalian cells are cells which are intended to replace the function of the genetically deficient cells in vivo Neuronally derived cells are preferred for gene therapy of disorders of the nervous system For a mammalian cell host, many possible vector systems are available for the expression of at least one PPCA or pPPCA A wide variety of transcriptional and translational regulatory sequences can be employed, depending upon the nature of the host The transcriptional and translational regulatory signals can be derived from viral sources, such as, but not limited to, adenovirus, bovine papilloma virus, Simian virus, or the like, where the regulatory signals are associated with a particular gene which has a high level of expression Alternatively, promoters from mammalian expression products, such as, but not limited to, actm, collagen, myosin, protein production

When live insects are to be used, silk moth caterpillars and baculoviral vectors are presently preferred hosts for large scale PPCA or pPPCA production according to the invention Production of PPCA or pPPCA in insects can be achieved, for example, by infecting the insect host with a baculovirus engineered to express transmembrane polypeptide by methods known to those skilled in the related arts See Ausubel infra, §§ 16 8- 16 1 1

In a preferred embodiment, the introduced nucleotide sequence will be incorporated into a plasmid or viral vector capable of autonomous replication in the recipient host Any of a wide variety of vectors can be employed for this purpose See. e g , Ausubel er α/ , infra, §§ 1 5, 1 10, 7 1, 7 3, 8 1, 9 6, 9 7, 13 4, 16 2, 16 6, and 16 8- 16 1 1 Factors of importance in selecting a particular plasmid or viral vector include the ease with which recipient cells that contain the vector can be recognized and selected from those recipient cells which do not contain the vector, i he number of copies of the vector which are desired in a particular host and whether it is desirable to be able to "shuttle" the vector between host cells of different species Different host cells have characteristic and specific mechanisms for the translational and post-translational processing and modification (e g , glycosylation, cleavage) of proteins Appropriate cell lines or host systems can be chosen to ensure the desired modification and processing of the foreign protein expressed For example, expression in a bacterial system can be used to produce an unglycosylated core protein product Expression in yeast will produce a glycosylated product Expression in mammalian cells can be used to ensure "native" glycosylation of the heterologous PPCA or pPPCA Furthermore, different vector/host expression systems can effect processing reactions such as proteolytic cleavages to different extents

As discussed above, expression of PPCA or pPPCA in eukaryotic hosts requires the use of eukaryotic regulatory regions Such regions will, in general, include a promoter region sufficient to direct the initiation of RNA synthesis See e g , Ausubel, infra, Sambrook, infra Once the vector or nucleic acid molecule containing the consfruct(s) has been prepared for expression, the DNA construct(s) can be introduced into an appropriate host cell by any of a variety of suitable means, I e , tiansformation, transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate-precipitation, direct icroinjection, and the like After the introduction of the vector, recipient cells are grown in a selective medium, which selects for the growth of vector-containing cells Expression of the cloned gene molecule(s) results in the production of a PPCA or pPPCA This can take place in the transformed cells as such, or following the induction of these cells to differentiate (for example, by administration of bromodeoxyuracil to neuroblastoma cells or the like)

A PPCA or pPPCA, or fragments thereof, of this invention can be obtained by expression from recombinant DNA according to known methods Alternatively, a PPCA or pPPCA can be purified from biological material A PPCA or a pPPCA can be purified from different mammalian tissues (e g , human placenta, rat liver mouse liver, pig kidney, bovine testes, bovine liver, and the like) of various genus and species

The PPCA or pPPCA can be isolated and purified in accordance with conventional method steps, such as extraction precipitation, chromatography affinity chromatography electrophoresis, or the like For example cells expressing at least one PPCA or pPPCA in suitable levels can be collected by centrifugation, or with suitable buffers lysed and the protein isolated by column chromatography for example on DEAE-cellulose phcsphocellulose polynbocvtidylic acid-agarose, hvdroxyapatite or by electrophoresis or immunoprecipitation Alternativeh a pPPCA or PPCA can be isolated by the use of antibodies, such as, but not limited to, a PPCA- or pPPCA-specific antibody Such antibodies can be obtained by known method steps (see, e g , Harlow and Lane ANTIBODIES A LABORATORY MANUAL Cold Spring Harbor Laboratory (1988); Colligan et al , eds , Current Protocols in Immunology, Greene Publishing Assoc and Wiley Interscience, N Y , (1992, 1993), the contents of which references are entirely incorporated herein by reference)

A PPCA or a pPPCA can be purified from different mammalian tissues (e g , human placenta, rat liver, mouse liver, pig kidney, bovine testes, bovine liver, and the like) of various genus and species, using known techniques such as gel filtration, phase separation and affinity chromatography, e g .using polyclonal or monoclonal antibodies specific for a PPCA or pPPCA, according to known methods See . e g , Oxender et al , Protein Engineering, Liss, New York ( 1986)

Overview of PPCA orpPPCA Purification and Crystallization Methods

In general, a PPCA or pPPCA is isolated in soluble form in sufficient purity and concentration (e g , a monomer or dimer) for crystallization The PPCA or pPPCA is then isolated and assayed for biological activity (e g , cathepsin

A) and for lack of aggregation (which interferes with crystallization) The purified PPCA or pPPCA preferably runs as a single band for each monomer under reducing or nonreducing polyacrylamide gel electrophoresis (PAGE)

(nonreducing is used to evaluate the presence of cysteine bridges)

The purified PPCA or pPPCA is preferably crystallized under varying conditions of at least one of the following pH, buffer type, buffer concentration, salt type, polymer type, polymer concentration, other precipitating ligands and concentration of purified PPCA or pPPCA See, e g , known methods (Blundell et αl , Protein Crystallography, Academic Press, London (1976), Oxender, infra; McPherson, The Preparation and Analysis of Protein Crystals, Wiley Interscience, N Y (1982)) or methods provided in a commercial kit, such as CRYSTAL SCREEN (Hampton Research, Riverside, CA) The crystallized PPCA protein can optionally be tested for at least one PPCA activity and differently sized and shaped crystals are further tested for suitability for x-ray diffraction Generally, larger crystals provide better crystallographic data than smaller crystals, and thicker crystals provide better crystal lographic data than thinner crystals See, e g , Blundell, infra, Oxender, infra; McPherson, infra, Wyckoff et al , Diffraction Methods for Biological Macromolecules oh 1 14-1 15, Methods in Enzvmology, Academic Press, Orlando, FL (1985) Protein Crystallization Methods

The hanging drop method is preferably used to crystallize the purified protein See, e g , Blundell, infra, Oxender, infra, McPherson, infra; Wyckoff, infra, Taylor et al , J Mol Biol 226 1287-1290 (1992), Takimoto <?/ α/ ( 1992), infra, CRYSTAL SCREEN, Hampton Research

A mixture of the purified protein and precipitant can include the following

• pH (e g , 7-9),

• buffer type (e g , tromefhamine (TRIZMA), sodium azide (NaN₃), phosphate, sodium, or cacodylate acetates, imidazole, Tris HCI, sodium hepes), • buffer concentration (e g , 1 - 100 M),

• salt type (e g , sodium azide, calcium chloride, sodium citrate, magnesium chloride, ammonium acetate, ammonium sulfate, potassium phosphate, magnesium acetate, zinc acetate, calcium acetate)

• polymer type and concentration (e g , polyethylene glycol (PEG) 1 -50%, type 400-10,000),

• other additives (salts potassium, sodium, tartrate. ammonium sulfate. sodium acetate, lithium sulfate sodium formate sodium citrate, magnesium formate, sodium phosphate, potassium phosphate organics 2-propanol, non-volatile 2-methyl-2,4-pentanedιol), β-octyl glucoside and

• concentration of purified PPCA or pPPCA (e g I 0-100 mg/ml) See e g . CRYSTAL SCREEN. Hampton Research

A non-limiting example of such crystallization conditions is the following • purified PPCA or pPPCA protein (e g . 5 mg/ml), • (2) solutions in serial mixtures

(1) 40-80 mM TRIZMA, 0 05-2 0 mM NaN₃„

(2) 2-30% Polyethylene glycol (PEG) 8000 buffered with 40-80 M TRIZMA and 005-2 0 mM NaN₃ * o 05-0 5% β-octyl glucoside,

• at an overall pH of about 8 0-8 3

The above mixtures are used and screened by varying at least one of pH, buffer type, buffer concentration, precipitating salt type or additive or their concentrations, PEG type, PEG concentration, and protein concentration Crystals ranging in size from 0 1-09 mm are formed in 1 -14 days These crystals diffract x-rays to at least 10 A resolution, such as 0 15-100 A, or any range of value therein, such as 1 5, 1 6, 1 7 1 8, 1 9, 2 0, 2 1 , 2 2 2 3, 2 4, 2 5, 2 6, 2 7, 2 8, 2 9, 3 0, 3 1 , 3 2, 3 3, 34 or 3 5, with 3 5 A or higher being preferred for the highest resolution In addition to diffraction patterns having this highest resolution, lower resolution, such as 25-3 5 A can also be used See, e g , Blundell, infra, Oxender, infra, McPherson, infra, Wyckoff, infra, Protein Crystals Crystals appear after 1-14 days and continue to grow on subsequent days Some of the crystals can be optionally removed, washed, and assayed for biological activity (e g PPCA), which activity is preferred for using in further characterizations Other washed crystals are preferably run on a gel and stained, and those that migrate in the same position as the purified PPCA or pPPCA are preferably used From two to one hundred crystals are observed in one drop and crystal forms can occur, such as, but not limited to, orthorombic, bipyramidal, rhomboid, and cubic Initial x-ray analyses indicate that such crystals diffract at moderately high to high resolution When fewer crystals are produced in a drop, they can be much larger size, e g , 04-0 9 mm See, e g , Blundell, infra. Oxender, infra, McPherson, infra, Wyckoff, infra, PPCA andpPPCA X-ray Crystallography Methods

The crystals so produced for a PPCA or pPPCA are x-ray analyzed using a suitable x-ray source Diffraction patterns are obtained Crystals are preferably stable for at least 10 hrs in the x-ray beam Frozen crystals (e g , -220 to -50°C) are optionally used for longer x-ray exposures (eg , 5-72 hrs), the crystals being relatively more stable to the x-rays in the frozen state To collect the maximum number of useful reflections, multiple frames are optionally collected as the crystal is rotated in the x-ray beam, eg , for 5-72 hrs Larger crystals (>02 mm) are preferred, to increase the resolution of the x-ray diffraction patterns obtained Crystals are preferably analyzed using a synchrotron high energy x-ray source Using frozen crystals, x-ray diffraction data is collected on crystals that diffract to at least a relatively high resolution of 10- 1 5 A, with lower resolutions also useful, such as 25-IθA, sufficient to solve the three-dimensional structure of a PPCA or pPPCA in considerable detail, as presented herein

Passing an x-ray beam through a crystal produces a diffraction pattern as a result of the x-rays interacting and being scattered by the contents of the crystal The diffraction pattern can be visualized using, e g an image plate or film, resulting in an image with spots corresponding to the diffracted x-rays The positions of the spots in the diffraction pattern are used to determine parameters intrinsic to the crystal (such as unicell parameters) and to gain information on the packing of the molecules in the crystal The intensity of the spots contains the Fourier transformation of the molecules in the crystal, / e , information on each atom in the crystal and hence of the crvstallized molecule

After data collection of diffraction patterns the data is processed This includes measuring the spots on each diffraction pattern in terms of position and intensity This information is processed (; e mathematical operations are performed on the data (such as scaling, merging and converting the data from intensity of diffracted beams lo amplitudes)) to yield a set of data which is in a form as can be used for the further structure determination of the molecule crystallized The amplitudes of the diffracted x-rays are then combined with calculated phases to Droduce an electron density map of the contents of the crystal In this electron density map. the structure of the molecules (as present in the crystal) is built. The phases can be determined with various known techniques, one being molecular replacement.

For the molecular replacement technique one takes a known three dimensional structure thought to share structural homology with the structure to be determined, to generate after calculations a first set of initial phases. These phases are then combined with the diffraction information of the molecule for which you want to solve the structure of. The result is an electron density map of the molecules in the crystal from which the diffraction patterns originate.

The phases can be further optimized using a technique called density modification, which allows electron density maps of better quality to be produced facilitating interpretation and model building therein. The atomic model is then refined by allowing the atoms in the model to move in order to match the diffraction data as well as possible while continuing to satisfy stereochemical constraints (sensible bond lengths, bond angles and the like). See, e.g., Blundell, infra; Oxender, infra; McPherson, infra; Wyckoff, infra; Computer Related Embodiments

An amino acid sequence of a PPCA or pPPCA and/or atomic coordinate/x-ray diffraction data, useful for computer structure determination of a PPCA, pPPCA or a portion thereof, can be "provided" in a variety of mediums to facilitate use thereof. As used herein, provided refers to a manufacture, which contains a PPCA or pPPCA amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present invention, e.g., the amino sequence provided in Figures 13-15, a representative fragment thereof, or an amino acid sequence having at least 80-100% overall identity to a 5-542 amino acid fragment of an amino acid sequence of Figures 13-15. Such a method provides the amino acid sequence and/or atomic coordinate/x-ray diffraction data in a form which allows a skilled artisan to analyze and determine the three- dimensional structure of a PPCA, a pPPCA or a subdomain thereof.

In one application of this embodiment, PPCA, pPPCA, or at least one subdomain thereof, amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present invention is recorded on computer readable media. As used herein, "computer readable media" refers to any medium which can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as optical discs or CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. A skilled artisan can readily appreciate how any of the presently known computer readable media can be used to create a manufacture comprising computer readable medium having recorded thereon an amino acid sequence and or atomic coordinate/x-ray diffraction data of the present invention. As used herein, "recorded" refers to a process for storing information on computer readable medium. A skilled artisan can readily adopt any of the presently known methods for recording information on computer readable medium to generate manufactures comprising an amino acid sequence and/or atomic coordinate/x-ray diffraction data information of the present invention.

A variety of data storage structures are available to a skilled artisan for creating a computer readable medium having recorded thereon an amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present invention. The choice of the data storage structure will generally be based on the means chosen to access the stored information. In addition, a variety of data processor programs and formats can be used to store the sequence and x-ray data information of the present invention on computer readable medium. The sequence information can be represented in a word processing text file, formatted in commercially-available software such as WordPerfect and MICROSOFT Word, or represented in the form of an ASCII file, stored in a database application, such as DB2, Sybase, Oracle, or the like. A skilled artisan can readily adapt any number of dataprocessor structuring formats (e.g. text file or database) in order to obtain computer readable medium having recorded thereon the information of the present invention.

By providing on computer readable media having stored therein a PPCA or pPPCA sequence and/or atomic coordinates based on x-ray diffraction data, a skilled artisan can routinely access the sequence and atomic coordinate or x-ray diffraction data to model a PPCA, pPPCA. a subdomain thereof, or a ligand thereof. Computer algorithms are publicly and commercially available which allow a skilled artisan to access this data provided on a computer readable medium and analyze it for structure determination and/or RDD See, e g Biotechnology Software Directory Mary Ann Liebert Publ , New York (1995)

The present invention further provides systems, particularly computer-based systems, which contain the sequence and/or diffraction data described herein Such systems are designed to do structure determination and RDD for a PPCA, pPPCA or at least one subdomain thereof Non-limiting examples are microcomputei workstations available from Silicon Graphics Incorporated and Sun Microsystems running Unix based Windows NT or IBM OS/2 operating systems

As used herein, "a computer-based system" refers to the hardware means, software means, and data storage means used to analyze the sequence and/or atomic coord mate/x-ray diffraction data of the present irvention The minimum hardware means of the computer-based systems of the present invention comprises a central processing unit

(CPU), input means, output means, and data storage means A skilled artisan can readily appreciate which of the currently available computer-based system are suitable for use in the present invention A monitor is optionally provided to visualize structure data As stated above, the computer-based systems of the present invention comprise a data storage means having stored therein a PPCA, pPPCA or fragment sequence and/or atomic coordinate/x-ray diffraction data of the present invention and the necessary hardware means and software means for supporting and implementing an analysis means

As used herein "data storage means" refers to memory which can store sequence or atomic coordinate/x-ray diffraction data of the present invention, or a memory access means which can access manufactures having recorded thereon the sequence or x-ray data of the present invention

As used herein, "search means" or "analysis means" refers to one or more programs which are implemented on the computer-based system to compare a target sequence or target structural motif with the sequence or x-ray data stored within the data storage means Search means are used to identify fragments or regions of a PPCA or pPPCA which match a particular target sequence or target motif A variety of known algorithms are disclosed publicly and a variety of commercially available software for conducting search means are and can be used in the computer-based systems of the present invention A skilled artisan can readily recognize that any one of the available algorithms or implementing software packages for conducting computer analyses that can be adapted for use in the present computer- based systems

As used herein, "a target structural motif," or "target motif," refers to any rationally selected sequence or combination of sequences in which the sequence(s) are chosen based on a three-dimensional configuration or electron density map which is formed upon the folding of the target motif There are a variety of target motifs known in the art Protein target motifs include, but are not limited to, enzymic active sites, structural subdomains, epitopes, functional domains and signal sequences A variety of structural formats for the input and output means can be used to input and output the information in the computer-based systems of the present invention A variety of comparing means can be used to compare a target sequence or target motif with the data storage means to identify structural motifs or interpret electron density maps derived in part from the atomic cocrdiπate/x-ray diffraction data A skilled artisan can readily recognize that any one of the publicly available computer modeling programs can be used as the search means for the computer-based systems of the present invention

One application of this embodiment is provided in Figure 22 Figure 22 provides a block diagram of a computer system 102 that can be used to implement the present invention The computer system 102 includes a processor 106 connected to a bus 104 Also connected to the bus 104 are a mam memory 108 (preferably implemented as random access memory RAM) and a variety of secondary storage memory 1 10, such as a hard drive 1 12 a removable storage medium 1 14 and a monitor 120 The removable medium storage device 1 14 may represent, for example, a floppy disk drive, a CD-ROM drive a magnetic tape drive, etc A removable storage medium 1 16 (such as a flopp\ disk a compact disk a magnetic tape etc ) containing control logic and/or data recorded therein mav be inserted into the removable medium storage medium 1 14 The computer system 102 includes appropriate software for reading the control logic and/or the data from the removable medium storage device 1 14 once inserted in the removable medium storage device 114

Ammo acid, encoding nucleotide or other sequence and/or atomic coordinate/x-ray diffraction data of the present invention may be stored in a well known manner in the main memory 108, any of the secondary storage devices 1 10, and/or a removable storage device 1 16 Software for accessing and processing the ammo acid sequence and/or atomic coordinate/x-ray diffraction data (such as search tools, comparing tools, etc ) reside in main memory 108 during execution The monitor 120 is optionally used to visualize the structure data Structure Determination One or more computational steps, computer programs and/or computer algorithms are used to build a molecular

3-D model of a PPCA or pPPCA, using ammo acid sequence data from Figures 13-15 (or variants thereof) and/or atomic coordmate/x-ray diffraction data, as presented herein

In x-ray crystallography, x-ray diffraction data and phases are combined to produce electron density maps in which the three-dimensional structure of a PPCA or pPPCA is then built or modeled This structure can then be used for RDD of modulators of at least one PPCA- or pPPCA-related activity that is relevant to at least one PPCA- or pPPCA-related pathology

Density Modification and Map Interpretation Electron density maps can be calculated using such programs as those from the CCP4 computing package (SERC (UK) Collaborative Computing Project 4, Daresbury Laboratory,

UK, 1979) Cycles of two-fold averaging can further be used, such as with the program RAVE (Kleywegt & Jones, Bailey et al eds , First Map to Final Model, SERC Daresbury Laboratory, UK, pp 59-66 (1994)) and gradual model expansion For map visualization and model building a program such as "O" (Jones (1991), infra) can be used

Refinement and Model Validation Rigid body and positional refinement can be carried out using a program such as X-PLOR (Brϋnger (1992), infra), e g, with the stereochemical parameters of Engh and Huber (Ada Cryst

A47 392-400 (1991)) If the model at this stage in the averaged maps still misses residues (e g , at least 5-10 per subunit), the some or all of the missing residues can be incorporated in the model during additional cycles of positional refinement and model building The refinement procedure can start using data from lower resolution (e g , 25- 10A to

10-3 0 A and then gradually extended to include data from 12-6A to 3 0-1.5 A B-values (also termed temperature factors) for individual atoms can be refined once data of 2 8A or higher (e g , up to 1 5 A) has been added Subsequently waters can be gradually added A program such as ARP (La zin and Wilson, Ada Cryst D49 129-147 (1993)) can be used to add crystallographic waters and as a tool to check for bad areas in the model Programs such as PROCHECK

(Lackowski et al J Appl Cryst 25283-291 (1993)), WHATIF (Vπend, J Mol Graph 852-56 (1990)) and PROFILE

3D (Luthy et al , Nature 356 83-85 (1992)), as well as the geometrical analysis generated by X-PLOR can be been used to check the structure for errors A program such as DSSP can be used to assign the secondary structure elements

(Kabsch and Sander (1983), infra) The structure of a PPCA or pPPCA can thus be solved with the molecular replacement procedure such as by using X-PLOR (Brunger (1 92), infra) A partial search model for the monomer can be constructed using a related protein, such as wheat serine carboxypeptidase structure (Liao et al (1 92), infra) The rotation and translation function can be solved to yield orientations and positions for the subunits in the crystallographic asvmmetπc unit This allows phases to be determined that when combined with information from the x-ray diffraction patterns, allows electron density maps of a PPCA or pPPCA to be calculated The atomic model is then built using these electron density maps

Cyclical two-fold density averaging can also be done to improve the electron density maps using a suitable program

(e g RAVE) and model expansion can also be used to add missing residues for each monomer, resulting in a model with

95-99 9% of the total number residues The model can be refined in a program such as X-PLOR (Brunger ( 1992)_, supra) to a suitable crystallographic R_(ta„ The model data is then saved on computer readable media for use in further analysis such as rational drus design Rational Design of Drugs that Interact with the PPCA orpPPCA

The determination of the three-dimensional structure of a PPCA or pPPCA, as described hen in, provides a basis for the design of new and specific ligands for the diagnosis and/or treatment of at least one PPCA- or pPPCA- related pathology Several approaches can be taken for the use of the crystal structure of a PPCA or pPPCA in the rational design of ligands of this protein A computer-assisted, manual examination of the active site structure is optionally done The use of software such as GRID ( Goodford, J Med Chem 28 849-857 (1985)) a program that determines probable interaction sites between probes with various functional group characteristics and the enzyme surface — is used to analyze the active site to determine structures of inhibiting compounds The program calculations, with suitable inhibiting groups on molecules (e g , protonated primary amines) as the probe, are used to identify potential hotspots around accessible positions at suitable energy contour levels Suitable ligands, as inhibiting or stimulating modulating compounds or compositions, are then tested for modulating activities of at least one PPCA or pPPCA

A diagnostic or therapeutic PPCA or pPPCA modulating ligand of the present invention can be, but is not limited to, at least one selected from a nucleic acid, a compound, a protein, an element, a lipid, an antibody, a saccharide, an isotope, a carbohydrate, an imaging agent, a lipoprotein, a glycoprotein, an enzyme, a detectable probe, and antibody or fragment thereof, or any combination thereof, which can be detectably labeled as for labeling antibodies Such labels include, but are not limited to, enzymatic labels, radioisotope or radioactive compounds or elements, fluorescent compounds or metals, chemiluminescent compounds and bioluminesceπt compounds Alternatively, any other known diagnostic or therapeutic agent can be used in a method of the invention After preliminary experiments are done to determine the K_m of the substrate with each enzyme activity of a

PPCA or pPPCA. the time-dependent nature of modulation of ligand K values are determined, (e g , by the method of Henderson (Biochem J 127321-333 (1972)) For example, the substrate (or blank where appropriate) and enzyme are pre-mcubated in buffer Reactions are initiated by the addition of substrate Aliquots are removed over a suitable time course and each quenched by addition into the aliquots of suitable quenching solution (e g , sodium hydroxide in aqueous ethanol) The concentration of product is determined, e g , fluorometrically, using a spectromeler Plots of fluorescence against time can be close to linear over the assay peπod, and are used to obtain values for the initial velocity in the presence (V,) or absence (V₀) of hgand Error is present in both axes in a Henderson plot, making it inappropriate for standard regression analysis (Leatherbarrow, Trends Biochem Sci 15455-458 (1990)) Therefore, K, values are obtained from the data by fitting to a modified version of the Henderson equation for competitive inhibition _Qr 2 _{+ (}£ _ Q - ι - ε_t = 0

where (using the notation of Henderson (Biochem J 127321-333 (1972))

( A K \ V a i

This equation is solved for the positive root with the constraint that

0 = K,((A₁ + K,) / K.) using PROCNLIN from SAS (SAS Institute Inc , Cary, North Carolina, USA) which performs nonlinear regression using least-square techniques The iterative method used is optionally the multivaπate secant method, similar to the Gauss- Newton method except that the derivatives in the Taylor series are estimated from the histogram of iterations rather than supplied analytically A suitable convergence criterion is optionally used, e g , where there is a change in loss function of less than 10⁸ Once modulating ligands are found and isolated or synthesized, crystallographic studies of the compounds complexed to a PPCA or pPPCA can be performed As a non-limiting example, PPCA or pPPCA crystals are soaked for 2 days in 0 01-100 mM ligand and x-ray diffraction data are collected on an area detector and/or an image plate detector (e g , a Mar image plate detector) using a rotating anode x-ray source Data are collected to as high a resolution as possible, e g , an inner limit of diffraction of 1 5-3 5A An atomic model of the inhibitor is built into the difference Fourier map F^,_{baof O0m}^, -F_nuιvc) The model can be refined to adjust the atomic positions to improve the fit with the electron density maps, while maintaining correct stereochemical constraints The model will preferably have low r m s deviations from the ideal bond lengths, as well as for the angles, respectively, as well as a low R-factor (preferably less than about 25-35%, such as less than about 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, or 25% Direct measurements of enzyme inhibition provide further confirmation that the modeled ligands are modulators of at least one biological activity of a PPCA or a pPPCA As a non-limiting example, a modification (Chong et al . Biochim Biophys Acta 107765-7 '1 (1991 )) of the fluorometπc assay of Potιer (e/ al , Analyt Biochem 94 287- 296 (1979)) is optionally used to measure neuraminidase inhibition or stimulation, optionally including determination of inhibition constants (K,) Other suitable PPCA activity assay include, e g cathepsin A activity (Galjart et al J Bio! Chem 266 14754-14762 (1991), Endotheiin I deamidase activity (Jackman, et al J Biol Chem 2672872-2875(1 92), and tachykinin deamidase activity (Jackman, et al , J Biol Chem 265 1 1265-1 1272 (1990))

Ligands of a PPCA or pPPCA, based on the crystal structure of this enzyme, are thus also provided by the present invention A PPCA or pPPCA ligand is any molecule, compound or composition that is capable of associating with a PPCA or pPPCA and optionally modulating at least one function or structural feature of a PPCA or pPPCA Preferably, a PPCA or pPPCA ligand modulates at least one biological activity of a PPCA or pPPCA Demonstration of clinically useful levels, e g , in vivo activity is also important In evaluating PPCA or pPPCA inhibitors for biological activity in animal models (eg , rat, mouse, rabbit) using various oral and parenteral routes of administration are evaluated Using this approach, it is expected that modulation of a PPCA or pPPCA occurs in suitable animal models, using the ligands discovered by structure determination and x-ray crystallography Evaluation of Therapeutic Potentials of Compositions via a PPCA Animal Model

The present invention also provides methods for identifying diagnostic or therapeutic ligands of PPCA or pPPCA via computer RDD, to treat a PPCA-related pathology Generally, a method for determining the therapeutic or diagnostic use of a PPCA or pPPCA modulating ligand, to treat a PPCA related pathology, comprises the steps of administering a known dose of at least one ligand containing compositions to an animal model having a phenotype corresponding to a PPCA-related pathology, monitoring the appropriate biological or biochemical parameters, and comparing the results with treated animals to those of untreated animals Results indicating the onset or presence of a PPCA related pathology are generally referred to herein as "symptoms" of the disease See , e g , U S Appl No 08/397,693, filed March 2, 1995, which is entirely incorporated herein by reference

Appropriate biological and biochemical parameters that reflect the onset and progression of a PPCA related pathology include, but are not limited to, (1) gross biological parameters, e g , physical appearance (i e , flattening of the face, rough haircoat and/or subcutaneous swelling in affected animals) or growth (reduced weight gam), (2) gross behavioral parameters, e g , lack of coordination, (3) biochemical assays, e g , assays of cathepsin A, N-acetyl-α- neuraminida " or β-galactosidase activities in primary cultures of skin fibroblasts or tissue homogenates, (4) histopatholo-' al studies (visceromegaly, l e , enlarged liver and spleen accumulation of secondary vacuoles in kidney tissues, etc )

A first method of evaluating the therapeutic potential of a composition using the trans enic non-human animals of the invention comprises the steps of

(1) Administering a known dose of the composition to a first non-human animal havinc a phenotype corresponding to a human PPCA related pathology, (2) Detecting the time of onset of symptoms in the first non-human animal and (3) Comparing the time of onset of symptoms in the first non-human animal to the time of onset of symptoms in a second non-human animal having a phenotype corresponding to a human PPCA related pathology, which has not been exposed to the composition, wherein a statistically significant delay in the time of onset of symptoms in the first non-human animal relative to the time of onset of the symptoms in the second non-human animal indicates the potential of the composition for treating a PPCA related pathology

A second method of evaluating the therapeutic potential of a composition using the non-human animals of the invention comprises the steps of

(1) Administering a known dose of the composition to a first non-human animal having a phenotype corresponding to a human PPCA related pathology at an initial time, to,

(2) Determining the extent of symptoms in the first non-human animal at a latei time, t,, and

(3) Comparing, at t„ the extent of symptoms in the first non-human animal to the extent of symptoms in a second non-human animal having a phenotype corresponding to a human PPCA related pathology, which has not been exposed to the composition at to, wherein a statistically significant decrease in the extent of symptoms at t, in the first non-human animal relative to the extent of the symptoms at t, in the second non-human animal indicates the potential of the composition for treating a PPCA related pathology

In the above methods, the composition being tested may comprise a chemical compound administered by circulatory injection or oral ingestion The composition being evaluated may alternatively comprise a polypeptide administered by circulatory injection of an isolated or recombinant bacterium or virus that is live or attenuated wherein the polypeptide is present on the surface of the bacterium or virus prior to injection, or a polypeptide administered by circulatory injection of an isolated or recombinant bacterium or virus capable of reproduction within a non-human animal, and the polypeptide is produced within a non-human animal by genetic expression of a DNA sequence encoding the polypeptide Alternatively, the composition being evaluated may comprise one or more nucleic acids, including a gene from the human genome or a processed RNA transcript thereof Similarly, the composition being evaluated may comprise cells removed from a mammal and genetically engineered to overexpress a lysosomal protein or some other therapeutic polypeptide

Once the PPCA modulating ligand has been shown to be effective in an animal model, it can then be tested in human clinical trials, according to known method steps In the above methods, delivery of the composition being tested to non-human animals is achieved via means appropriate for the composition being tested, e g , by diet, by intermittent or continuous intravenous injection of one or more of the compositions or of a liposome (Rahman and Schein, in Liposomes as Drug Carriers, Gregoπadis, ed , John Wiley, New York (1988), pages 381-400, Gabizon, A . in Drug Carrier Systems, Vol 9, Roerdink et al , eds , John Wiley, New York (1989), pages 185-212) or microparticle (Tice et al , U S Patent 4,542,025 (Sep 17, 1985)) formulation comprising one or more of the compositions, via subdermal implantation of drug-polymer conjugates (Duncan, R , Anti-Cancer Drugs 3 175-210 ( 1992) via microparticle bombardment (Sanford fl a/ , U S Patent 4 945,050 (Jul 31 , 1990)) via infusion pumps (Blackshear and Rohde, in Drug Carrier Systems, Vol 9, Roerdink et al , eds , John Wiley, New York (1989), pages 293-310) or by other appropriate means known in the art (see, generally, Remington's Pharmaceutical Sciences. 18th Ed Gennaro, ed , Mack Publishing Co , Easton, PA (1990)) Pharmaceutical/Diagnostic Administration

Using compounds or compositions comprising at least one PPCA or PPCA modulating ligand the present invention further provides a method for modulating the activity of a PPCA or pPPCA protein in a eel In general ligands (antagonists or agonists) which have been identified to inhibit or enhance the activity of at least one PPCA or pPPCA ligand can be formulated so that the ligand can be contacted with a cell expressing at least one PPC A or pPPCA protein in vivo The contacting of such a cell with such a ligand results in the in vivo modulation of at least one biological activity of a PPCA or pPPCA

At least one PPCA or pPPCA modulating compound or composition of the invention can be administered by any means that achieve the intended purpose, using a suitable pharmaceutical composition or formulation For example, administration can be by various parenteral routes such as subcutaneous, intravenous, intradermal, intramuscular, intraperitoneal, lntranasal, mtracranial, transdermal, or buccal routes Alternatively, or concurrently, administration can be by the oral route Parenteral administration can be by bolus injection or by gradual perfusion over time

A typical regimen for treatment or prophylaxis comprises administration of an effective amount over a period of one or several days, up to and including between one week and about six months It is understood that the dosage of a diagnostic/pharmaceutical compound or composition of the invention administered in vivo or m vitro will be dependent upon the age, sex, health, and weight of the recipient, kind of concurrent treatment, if any, frequency of treatment, and the nature of the diagnostic/ pharmaceutical effect desired The ranges of effective doses provided herein are not intended to be limiting and represent preferred dose ranges However, the most preferred dosage will be tailored to the individual subject, as is understood and determinable by one skilled in the relevant arts See, e g , Berkow et al , eds , The Merck Manual, 16th edition, Merck and Co , Rahway, N J , 1992, Goodman et al , eds , Goodman and

Gilman's The Pharmacological Basis of Therapeutics, 8th edition, Pergamon Press, Inc , Elmsford, N Y , (1990), Avery's

Drug Treatment Principles and Practice of Clinical Pharmacology and Therapeutics, 3rd edition, ADIS Press, LTD ,

Williams and Wilkms. Baltimore MD (1987). Ebadi Pharmacology. Little, Brown and Co , Boston, (1985), Osol et al , eds . Remington's Pharmaceutical Sciences, 18th edition. Mack Publishing Co , Easton, PA (1990), Katzung, Basic and Clinical Pharmacology, Appleton and Lange, Norwalk, CT (1992) which references are entirely incorporated herein by reference

The total dose required for each treatment can be administered by multiple doses or in a single dose The diagnostic/pharmaceutical compound or composition can be administered alone or in conjunction with other diagnostics and or pharmaceuticals directed to the pathology, or directed to other symptoms of the pathology Effective amounts of a diagnostic/pharmaceutical compound or composition of the invention are from about 0 1 μg to about 100 mg kg body weight, administered at intervals of 4-72 hours, for a period of 2 hours to 1 year, and/or any range or value therein

The recipients of administration of compounds and/or compositions of the invention can be any mammals

Among mammals, the preferred recipients are mammals of the Orders Pπmata (including humans, apes and monkeys),

Arteriodactyla (including horses, goats, cows, sheep, pigs), Rodenta (including mice, rats, rabbits, and hamsters), and Carnivora (including cats, and dogs) The most preferred recipients are humans

Having now generally described the invention, the same will be more readily understood through reference to the following example which is provided by way of illustration, and is not intended to be limiting of the present invention

Example I: Preparation, Purification and Crystallization of PPCA orpPPCA from Human Cells

The present invention provides in one aspect, the determination of the three-dimensional structure of the human protective protem/cathepsm A (PPCA) in the precursor form (pPPCA) bv a combination of molecular replacement and twofold density averaging The structure presented here is the first of an enzvme associated with a human PPCA related pathology and the third human lysosomal enzyme structure determined The structure gives us insight into the zvmoαen activation mechanism of pPPCA as well as the expected 3-D structure of PPCA and its specific and new enzymatic activities PPCA andpPPCA Expression and Purification

Plasmid Constructs. AcMΗPV transfer-plasm ids pJR2 and pBC3 (Figure 1 ) were derivatives of plasmid pAc373 carrying the entire polyhedπn gene (Smith et al , 1985) In pJR2 a polylinker with a number of multiple cloning sites (MCS) was inserted directly 3 of the polvhedπn promoter and substituted a 33-nucleotιde deletion of the polyhedrin gene, starting with the ATG. pBC3 had the polylinker situated in a similar position as pJR2, but instead of the 33-nt deletion this plasmid featured an ATG codon mutated in ACG Full-length human PPCA cDNA, PPCA54 (Galjart et al , 1988). and the two deletion cDN A mutants, 32(Δ20) and 20(Δ32) (Galjart et al. , 1991 ), were subcloned either in pJR2 or pBC3 as EcoRI fragments, using standard procedures (Sambrook et al., 1989). (Figure 1 ). The 20(Δ32) deletion mutant was tagged with the human PPCA signal sequence, as reported earlier (Galjart et al , 1991). All cDNA fragments were engineered to have short 3' and 5' untranslated regions (< 10 bp).

Transfection and Selection of Recombinant Baculovirus. Spodoptera frugiperda insect cells (IPLB-SF21 ) were cultured in monolayers at 27°C in TNM-FH medium (Hink, 1970), supplemented with 10% FBS and antibiotics (complete medium). Wild-type (wt) AcMNPV virus strain E2 (Smith and Summers, 1978) and recombinant baculoviruses were propagated on confluent monolayers of Sf21 cells. Recombinant constructs AcPPCA54, AcPPCA32 and AcPPCA20 were generated by cotransfecting Sf21 cells with 1 μg wt-AcMNPV DNA and 10 μg plasmid DNA, using the calcium phosphate method, modified for insect cells (Graham et al , \ 973; Carstens et al , 1980; Summers et al , 1987). Recombinant polyhedrin-negative recombinant baculoviruses were then selected and purified by sequential plaque assays, and verified by dot blot and southern blot analysis (Summers et al, 1987). Large quantities of inoculum were produced by infection of insect cells at 25-50 % confluency, with recombinant virus at a multiplicity of infection (MOI) of < 1 pfu/cell. After 3 to 6 days at 27⁰C, when all cells appeared infected, the medium was harvested and centrifuged for 5 m at 1000 rpm to remove detached cells. The titre of the inoculum was determined by plaque assay analysis.

Protein purification and western blotting. Sf21 cells were cultured in either 175 CM² or 500 CM² flasks (triple flask, Nunc) to near confluency, and infected with recombinant baculoviruses at a MOI of 5- 10 pfu/cell After 1.5 h incubation at 27 "C, the inoculum was replaced with complete medium for additional 8 to 10 hrs. Cell monolayers were then rinsed with PBS and cultured further for 38 h in unsupplemented Grace's medium. After infection the medium was collected, centrifuged for 5 m at 1500 g, and for 1 h at 100.000 g (Beckmann SW-28 rotor) to remove virus particles.

After centrifugation the supernatant was concentrated 20-fold, in an Amicon stirred cell. Glycoproteins were purified -60% using a concanavalin A-SEPH AROSE affinity chromatography column, as described earlier (Ven eijen et al.,

1982). Total protein concentration was measured using the method of Smith et al., (1985). Aliquots of the purified preparation were resolved on 12.5% SDS-polyacrylamide gels under reducing and non-reducing conditions. Gels were either Coomassie brilliant blue- or silver stained (Sambrook et al. , 1989). For western blotting, proteins were transferred from gels to IMMOBILON PVDV membranes (Millipore Corp.), using a semidry blotter (The W.E.P. company). Development and Use of pPPCA antibodies. A 15 ammo acid peptide (NH₂-Cys-Met-Trp-His-Gin-Ala-Leu-

Leu-Arg-Ser-Glu-Asp-Lys-Ala-Arg-COOH) (Figure 5), based on the C-terminal sequence of the 34-kDa PPCA subunit

(amino acid 285-298, Galjart et al., 1988), was synthesized on a peptide synthesizer (Applied Biosystems), and covalently linked to the carrier protein Keyhole Limpet Hemocyanin, using the IMJECT ACTIVATED IMMUNOGEN

CONJUGATION KIT (Pierce). Polyclonal antibodies against the conjugated product were raised in rabbit, by multiple subdermal injections of the protein (40-125 μg) mixed with incomplete Freunds adjuvant (Pierce). Rabbiis were bled

34 days after the first injection The antibodies, designated anti-pep, were tested on immunoblots and by immunoprecipitations of baculovirus produced PPCA.

Blot* were incubated for at least 12 h in blocking buffer (0.01 M tris-buffered saline pH 8.0 (TBS), 0.05%

Tween 20. aι ' ι% (w/v BSA). and subsequently probed for 2 h with polyclonal PPCA antibodies, anti-54, d luted 1 :200 in fresh blocking buffer They were then washed for 1 h in TBS. 0.05% Tween 20, and incubated for 2 h with alkaline phosphatase conjugate anti-rabbit igG (Sigma, 1 : 1000 in blocking buffer) Proteins were visualized using alkaline phosphatase substrate (Sigma, 4-aminodiphenylamiπe diazonium sulfate, naphtol as-mx phosphate).

Crystallization of PPCA. Fractions containing the precursor form of the protein as assayed on an SDS-PAGE gel were pooled. Subsequently the protein was concentrated to 5 mg/ml and the buffer exchanged to 50 mM NaAc pH 5.2 or 50 mM MES pH 6.5 using a CENTRICON-I 0. Crystals were grown using the hanging drop vapor diffusion - - technique Crystals suitable for data collection were grown using a reservoir solution containing 2- 10 % PEG 8000, pH 8 0 - 8 3, 50mM TRIZMA, ImM NaN₃, 025 % β-octyl glucoside at 4-12"C Mixing non-equal volumes of protein solution (in the range 5-lOμl) and reservoir solution ( in the range 2-6 W) enhanced the occurrence of single large crystals per drop under these crystallization conditions The concentration of the protein solution before mixing was 5 mg/ml Crystal growth was enhanced by macrocrystallization techniques (anything that promotes growth of big crystals) and in some cases by micro- and macroseeding techniques

Example 2: Structure Determination of a pPPCA Crystallized from Human Cells Data Collection, Data Processing and Reduction

To allow for data collection at cryotemperatures, the crystals were cryoprotected by adding glycerol in 5% -10% steps to a solution of about 12% PEG 8000, 50 mM TRIZMA, pH 8 0, ImM NaN₃, 025% β-octyl glucoside, which served as an artificial mother liquor The crystals were incubated for half an hour at 40°C after each addition of glycerol The final mother liquor contained 30% glycerol Gradually increasing the glycerol was needed to help keep the crystals from cracking

Diffraction data was collected at the Stanford Synchrotron Radiation Laboratories (SSRL) to 2 0 A at -178 °C on a MAR imaging plate at a wavelength of 1 08 A on beam-line 7- 1 The diffraction coordinate data (corresponding to atomic coordinates monomer 1 , the other monomer coordinates are provided by matrix conversion of these coordinates, as presented herein) was processed and reduced using MOSFLM version 5 2 from the CCP4 program package (SERC (UK) Collaborative Computing Project 4, Daresbury Laboratory UK, 1979) The program REFIX

(Kabsch (1993), infra) was used for auto-indexing Using the CCP4 program suite (SERC (UK) Collaborative Computing Project 4, Daresbury Laboratory UK, 1979), the intensities were scaled (ROTAVATA), merged

(AGROVATA) then converted to amplitudes and truncated with the program TRUNCATE Statistics of the data collected are given in Table I The V_m (Matthews, B W , J Mol Biol 33491 -497 ( 1968)) is 3.2 AVDa for 2 monomers in the asymmetric unit, corresponding to a solvent content of 62%

Molecular Replacement Search Model: The best molecular replacement results were obtained using a multi-Ala core as a search probe

The 'multi-Ala core' search model was constructed from the atomic coordinates of the CPW monomer (Liao et al , 1992), based on the sequence alignment as presented in Figure 15 Regions expected to deviate in structure between PPCA and

CPW were deleted from the model (i e with low sequence identity or located in loops) The 125 residues identical in

PPCA and CPW were left in the model, 1 12 residues were truncated to alanine The remaining 94 residues through differing between CPW and PPCA, were considered sufficiently similar in size and the CPW residue left as such in the model The resulting 'multi-Ala core' monomer consisted of 331 residues, constituting a large portion of the core domain and little atomic information for the 'cap' domain (see Figure I) The model contained 30% of the expected protein scattering mass given the fact that there are two monomers in the asymmetric unit The sequence identity between this search model and the true PPCA structure was 37 7% Rotation Function, PC Refinement and Translation Function: Native data of 8 - 4A was used in the molecular replacement calculations The rotational searches utilized a real space Patterson search method, as implemented m X-PLOR (Steigeman, 1974, Huber, 1985, Brunger 1992a) with a Patterson vector cutoff of 21 A The self-rotation function failed to reveal any non-crystallographic two-fold symmetry relating two monomers in the asymmetric unit In addition the native self Pattersons did not reveal the presence of a non-crystallographic two-fold axis parallel to a crystallographic axis These results indicated that the two monomers m the asymmetric unit might not form a dimer together The cross-rotation function was carried to find the orientation of the two monomers in the asvmmetnc unit as follows Patterson vector sets were calculated for the search model and the native data and the 8000 strongest Patterson vectors were used in the rotation function The rotational space restricted to the asymmetric unit of the rotation function according to Rao et al , 1980 was sampled by rotating the Patterson vectors from the search model around Euleπan angles θl Θ2, and Θ3, while sampling Θ2 in angular grid intervals of 2 5° The 5000 highest rotation function grid points were selected resulting from the product function of the two Patterson vector sets. The grid points (differing less than 8° around any given axis) were then clustered. The result was a list of 169 possible solutions for the rotation function, each corresponding to a set of three angles describing an orientation. The two top solutions were 3.9 and 3.8 sigma above the mean. PC-refinement (Brunger, 1990) was carried out to optimize each of the 169 possible solutions using the complete search model as a single rigid body. This yielded two orientations with a PC-index of 0.043 and 0.051 respectively. The orientations of these solutions were (D, = 261.4, D₂ = 36.22, D, = 147.28); and (D , = 18.52, D. = 47.40, £>₃ = 23.22), respectively. In contrast, the rest of the possible solutions yielded an average PC-index of 0.022.

Individual translation function calculations were performed on a I A grid. A translational solution was found for each orientation at positions (x=33.30, y=51.97, and z=12.79) and (x=25.23, y=28.58, and z=22.02), with respect to the crystallographic center, as 7.7 and 8.8σ, respectively, above the mean. The

for the individual solutions was 55.6% and 54.8% in the resolution range 8.0 to 4.0A, with a correlation coefficient (CC) of 0.095 and 0.1 14. A combined translation function was calculated to place each solution relative to the same crystallographic origin, resulting in an R_fιαor of 52.8% for data between 8.0 and 4.0A, bringing the R_r,_clor down to 51.3% and increasing the CC to 0.22. The molecular packing was assessed on a graphics workstation, which revealed no clashes between the placed search probes. However, a very large amount of empty space was present. The packing showed that the asymmetric unit contained two half dimers, each forming a dimer with another monomer in a neighboring unit cell. The two cores in the asymmetric unit were related by κ=73° around an axis tilted 15.5° off the crystallographic a axis lying in the a.c plane. Iterative Model Building and Two-fold Averaging

Initial Electron Density Map: A 2m|F_obJ| -D|F_M,_C| SigmaA weighted map (Read, 1986) was calculated using

|F_a|_C|'s and phases from the molecular replacement solution. The map was contoured at lσ and showed good density for most of the core. Density emerged for many side chains where the input model residue had been an Ala, indicating that the molecular replacement solution was correct. First Model Built: The two rotated and translated search probes formed the starting point for model building of the PPCA precursor. The non-crystallographic symmetry (NCS) matrix was determined between the two cores using the "Lsq_explicit" option in the computer program O (Jones et al., 1991 ). Subsequently a 'best monomer' was built by superimposing the electron densities from each monomer core, and adjusting the model accordingly. Residues were only incorporated in the model where the electron density was visible for the complete side chain. Residues from the search model for which no density was visible were removed. An alanine was built in the model at places where electron density for a side chain was partial. In this manner 294 residues, i.e. 65% of the C* atoms were built in the 'best monomer' core. The second monomer was generated from the 'best monomer' model using the NCS operator relating the two monomers in the asymmetric unit. At this point the data set was partitioned in a working set and a test set consisting of 5% of the reflections between 8 - 2.2A to monitor the R_frM (Brunger et al. 1992b). The working data set was used for rigid body and positional refinement. For averaging and map calculations the unpartitioned data set was used. Twenty-five cycles of refinement using the two 'best monomers cores' positioned in the asymmetric unit as rigid bodies and data from 8.0 - 3.0A, resulted in an R^, of 53.5% for this resolution range. The atomic coordinates of this partial model were used to calculate a new 2m |F_obs| -

SigmaA weighted map which we called the 'best monomer map'. Averaging: Search for Missing Density: The phasing power from the rigid body refined 'best monomer cores', consisting of 294 residues per core was insufficient to bring back interpretable electron density for the missing part of the model. 158 residues per monomer. To overcome this a 'bootstrapping' procedure was appl ied, entailing density averaging using RAVE (Kleywegt & Jones, 1994a) and model expansion. The 'best monomer map' and the rigid body refined 'best monomer cores' served as the starting point for this procedure. Six bootstrapping cycles were carried out, called bmcl through bmcό, allowing for the model to be extended in stepwise increments Figure 16 shows a scheme of the steps incorporated in one bootstrapping cycle After a cycle in which the model had undergone ma_jor expansion, a new molecular mask was calculated with MAMA (Kleywegt & Jones, 1994b) for use in the subsequent bootstrapping cycle No phase recombination was applied between bootstrapping cycles At the end of each cycle the inverted phases α_mv and inverted amplitudes F,_rv 's were discarded The NCS operator was re-optimized after cycle bmc3 The resolution range of the data included in the bootstrapping cycle started with 15 - 3 0 A for bmc 1 and was gradually extended to 15 - 2 7 A in bmcό The bootstrapping procedure is summarized in Table 2 To optimize the bootstrapping procedure, consideration was given to the molecular mask used in the averaging, the model building strategy and the refinement procedure Molecular masks: Four different masks were constructed in total The atomic radius of all atoms was set to

4A to calculate each mask The masks were then manually modified using mask editing options in O (Jones et al 1991) Mask 1 , was constructed around the 'best monomer core' Subsequently it was greatly enlarged by multiple blocks of 10 - 15 A³ in the regions where the model was incomplete (Figure 17) This was crucial to prevent the density in the insertion area's from being flattened during the averaging step Approximately one half of the dimer interface was estimated to be formed by regions from the missing cap domain Major expansions of the mask in this area were made to accommodate for this This resulted in a serious overlap problem when the mask was duplicated to cover a complete dimer The mask was reduced where overlap occurred with the "overlap_tπm" option of MAMA After several bootstrapping cycles, new incorporated polypeptide fragments were carefully assigned to one of the two monomers forming the dimer and the mask at the dimer interface area's was manually adjusted accordingly Essentially the masks were kept far too large in regions where the model was missing in order to avoid erroneous flattening of electron density In contrast the masks were tightened around the area's of the molecule where the model was complete

Model Building: A conservative model building strategy was adopted Initially only side chains were mutated in the core region to fit the PPCA am o acid sequence and where the density was clear, poly-alanme fragments were built in the insertion area's (loops and the cap domain) Newly included atoms were given a B-factor of 20 A² Only once models bmc5 and bmcό were obtained, was the electron density of sufficient quality to allow side chains to be incorporated confidently in the cap domain (residues 190 - 303) At this stage the C" trace was virtually complete for the whole dimer and the sequence could be fit unambiguously

Refinement: Positional refinement was postponed until after 3 cycles of bootstrapping resulting in a final model containing 91% of the C^α atoms Forty steps of positional refinement were then earned out to improve the geometry of the model Subsequently only one of the refined monomer was taken and the other generated using NCS operators The rational for delaying the positional refinement is addressed in the discussion

Completing the model: deviations from two-fold symmetry. It was possible to add 148 residues and 185 side chains per monomer after a total of 6 bootstrapping cycles At this stage, each subunit contained 442 residues and 413 side chains, l e 98% of the C* and 91% of the side chains atoms The gradual model expansion as a function of the bootstrapping cycle is shown in Figure 18

Twenty residues were still missing in the asymmetric unit at this stage These were localized to two stretches per monomer (260 - 262 and 287-292) With most of the scattering mass incorporated, the monomers from model bmcό was refined individually with X-PLOR (Brunger, 1992a) in an attempt to retrieve electron density for the still missing residues After 40 steps of positional refinement using data from 8 0 - 2 6 A, the R_fiαor dropped significantly from 402% to 33 2% The model was further positionally refined using a full weight W_A on the crystallographic term The data included in the refinement was gradually extended to 2 2 A At 2 4 A resolution individual B-factors were refined and the distribution checked as a function of atom location (/ e , low B-factors in the core and high B-factors on the surface) Cycles of refinement and refining allowed for 18 missing residues to be added Essentially almost the complete cap domain was retrieved using the bootstrapping procedure as shown in Figure 19 It became apparent from the refined maps that the two stretches of missing amino acids adopted a very different conformation in the two monomers (with as much as an average r.m.s.d. of 7.9 A for the C's of residues 287 - 292). For this reason electron density for these regions had not been retrieved in the two-fold averaging process. The stepwise improvement of the electron density maps along with averaging, model expansion and refinement is shown in Figure 6.

The program ARP was used to check our model, in particular the region at the dimer interface (Lamzin & Wilson, 1993). Prior to the final round of positional refinement, an IF_ob!I/σ cutoff was applied to reject 10% of the weakest data as well as an anisotropic scale factor to offset the decreased resolution along the crystallographic a axis. The final model is of good geometry with a final R,,^ of 21.3% (R_fr« of 26.8 %) for data between 8.0 and 2.2 A (see Table 3). A Ramachandran plot is given in Figure 21. The r.m.s. coordinate error is 0.282 as calculated by SigmaA (Read, 1986). The average phase difference between the initial molecular replacement model and the currently refined model is calculated to be 71 ° for data between 10 - 2.2 A.

The structure determination of PPCA is special in that two-fold averaging could be applied to refine very poor molecular replacement phases, enabling us to retrieve electron density for 148 residues and 185 side chains per monomer. In total 314 complete residues were added per asymmetric unit, equivalent to about 35 kDa of protein. In retrospect we feel that a number of factors contributed to a successful structure determination. Crystal Packing. Each monomer in the crystal is interacting with four non-crystallographically related monomers. By far the most extensive contact is with a non-crystallographically related monomer generating the physiological dimer. Three additional contacts are extensive crystal contacts ranging from 200-800 A ^! averaged per monomer. The largest nondimer crystal contact involves the precursor loops from two crystallographically independent monomers ( region 265-267, 281-295 from monomer I with residues 281-293 from monomer 2) making intimate contact with each other. Summed together these loops create an intermolecular buried surface of 1680 A². We believe that this stabilizes an otherwise very flexible area, possibly explaining the good diffraction qualities of the P2,2,2 crystals.

It is also in this crystal contact that we find deviating spacial conformation and secondary structure between the two monomers as mentioned before. The electron density in this region is of very good quality with average temperature factors of 16.6 A² for main chain and 18.3 A² for side chains. pPPCA and the Hydrolase Family. The fold of pPPCA belongs to the large hydrolase fold family containing enzymes such as the serine carboxypeptidases, dehalogenase, various lipases and acetylcholine esterase (Ollis et al. (1992), infra), having various different catalytic functions. Though the central core is the same (a central β -sheet flanked by α-helices on both sides) the proteins in this family all seem to have different 'cap' domains, both with respect to fold as well as size (Figure 7A-F). pPPCA has one of the largest cap domains comprising 121 residues forming the three helical bundle of the helical subdomain and a three stranded β-sheet of the maturation subdomain.

Major Differences and Comparison With the Serine Carboxypeptidases. The overall fold of the pPPCA monomer is similar to that of the wheat and yeast serine carboxypeptidases (Endrizzi et al. (1994), infra; Ollis et al. (1992), infra). The complete core domains of pPPCA and CPW superimpose with an r.m.s. deviation of ] .7 A for 302 Cα atoms and 38% sequence identity. Deleting major deviating loops from the core domain allows for pPPCA to superimpose with an r.m.s. deviation of 1.2 A onto CPW and CPY (293 equivalent C's with 40 % sequence identity for CPW/pPPCA and 271 equivalent C*'s for CPY/pPPCA with 42.2% identity).

The cap domain in pPPCA differs significantly from the CPW and CPY counterparts. The pPPCA structure reveals a large maturation subdomain not present in the structure of CPW and CPY for which the structures of the enzymatically active forms are known. All three enzymes contain a 3 helical bundle in the cap domain. The sequence identity between the three proteins in this region is very low (ca. 12 %). In contrast, PPCA shows a much greater deviation. Hal superimposes reasonably well with the CPW counterpart maintaining the same general orientation with respect to the core domain (requiring a rotation of only 7.4°). But helices Hα2 and Hα3 have undergone major rotations with respect to Hal and the core domains by K = 28.5° and K = 93.4°, respectively (Figure 8A).

Due to the integral role of the cap domain in forming the dimer interface, the dimers of PPCA and CPW were compared. In the pPPCA and CPW dimers the monomers are oriented differently with respect to each other. Supeφosition of the core domain of one monomer from each dimer shows that the second pair of monomers (forming the respective dimers) differ by a remarkable 15° in orientation (Figure 8B). Thus, it appears that the extensive differences in the cap domains lead to a different arrangement of the subunits in the dimers of PPCA and CPW.

Catalytic Triad and Enzymatic Mechanism. Our structure shows that the precursor PPCA has all the elements proposed for the enzymatic machinery of the serine carboxypeptidase family (Liao et al. (1992), infra: Endrizzi et al. (1994), infra), and is now discovered to be the third structure elucidated belonging to this family of enzymes after CPW and CPY. The catalytic triad in the active site of pPPCA is formed by residues Ser 150, His 429 and Asp 372. The O^γ of Ser 150 forms a good hydrogen bond with the N'l of His 429 with a N to O distance of 2.8 A. The N*l of His 429 is 2.7 A removed from the 0*2 and 3.3 A from the 0*1 of Asp 372. Further, two backbone amides appear to orient the carboxylate group of Asp 372. The N of Ala 374 is at a distance of 3.0 A to the O*¹ of Asp 372 and the N of Cys 375 is at a distance of 2.9 A to the 0^β2 of Asp 372.

The oxyanion hole proposed to stabilize the negatively charged tetrahedral intermediate in serine carboxypeptidases is formed by the backbone amides of Gly 57 and Tyr 151 in PPCA. The 32 atoms of the catalytic triad residues plus the oxyanion hole amides from PPCA, CPY and CPW superimpose with an r.m.s. deviation of 0.4 A indicating the very high degree of structural similarity of the active site in the PPCA precursor with those in the fully active enzymes CPY and CPW, (see Table 4). The carboxylate of Asp 372 and the imidazole of His 429 in PPCA are non-planar, making an angle of approximately 60° between the imidazole and the carboxylate. A similar non-planarity has been observed in CPW and CPY, in contrast to the planar orientation found in subtilisin-.and trypsin-type serine proteases (McPhalen et al.. Biochemistry 27:6582-6598 (1988)). In pPPCA, a pair of glutamic acid residues (Glu 69 and Glu 149) is positioned near the catalytic triad, with their carboxylate groups interacting with each other. The carboxylate groups are located at approximately 8 A from the 0^γ of Ser 150, and lie at the bottom of the active site. An asparagine (Asn 55) is orientated such that it forms a hydrogen bond to each of the two carboxylate groups of the glutamic acid pair, at an N⁴² (Asn) to C'/C ² (Glu) distance of 3.0 and 3.6 A, respectively. In addition the two carboxylates interact with each other via hydrogen bonds. This configuration of two glutamic acid residues and an asparagine, is conserved between pPPCA, CPW and CPY (see Table 4), and has been implicated in regulating the low pH optimum for the carboxypeptidase activity found in the serine carboxypeptidases (Liao et al. (1992), infra). Biochemical data has suggested that a functional group with an apparent pK, value of pH 5.5, functions to bind the C-terminal carboxylate group of peptide substrates and is responsible for the observed pH optimum of 5.5 (reviewed in Breddam et al (1986), infra; Rawlings & Barrett ( 1994), infra). Together with their structural data, Liao and colleagues (Liao et al. ( 1992), infra) have suggested that at pH 5.5 or below, one or both glutamates must be uncharged, while at a pH higher than 5.5 one or both of the carboxylates which are orientated opposite to each other, may become deprotonated resulting in unfavorable electrostatic interactions. This would disturb the hydrogen bonding pattern or result in structural perturbations causing the observed increase in K_m for peptide substrates at high pH. In pPPCA the orientation of this pair of glutamic acids as well as that of the asparagine is essentially identical in structure to the equivalent residues in CPW and CPY (see Table 4), even though the structure has been determined at pH 8. The CPW and CPY structures have been determined at pH 5.7 and at pH 6.5-7.0. Thus, our structure appears to rule out large pH induced conformational changes of these three residues at least up to a pH value 2.5 units above that optimal for carboxypeptidase activity. However the high degree of conservation of these residues does indicate some role in a characteristic shared by all three enzymes. From our comparison it is clear that the enzymatic machinery in the PPCA precursor form is in a conformation virtually identical to that found in the fully active CPW and CPY enzymes. On this basis, the conformation of the enzymatic machinery found in pPPCA is expected to faithfully represent the conformation that will be found in the active PPCA. ActiveSite, Substrate Specificity. PPCA has a substrate preference for hydrophobic residues in the PI and/or PI' binding pockets (Jackman et al, Hypertension 2/:925-928 (1993)). In CPW the PI' pocket was identified to consist of two tyrosine residues (Tyr 60 and Tyr 239) which form a long channel, capped by two acidic residues (Glu 272 and Glu 398) at the end (Liao et al. (1992), infra). This explains the highest preference of this enzyme for Arg and Lys as the leaving group (Breddam et al, Carisberg Res. Commun. 52:297-31 1 (1987)). In CPY a similarly shaped pocket is formed by the residues Thr 60, Tyr 256, Leu 272 and Met 398 (Endrizzi et al. (1994), infra). In PPCA the analogous residues are Tyr 247 and Asp 64, forming the sides of the pocket with at the far end Met 430 and Thr 304. This is reasonably consistent with an overall preference of PPCA for a hydrophobic leaving group.

Inactivation Mechanism of the Precursor Form. During the maturation step of the PPCA precursor form, at maximum residues 285-298 forming the 'excision' peptide, are removed by an as yet unidentified protease(s). In vitro, the maturation event can be mimicked by digestion with trypsin utilizing probably positions Arg 284, as v/ell as Arg 292 and/or Arg 298. The residues forming the 'excision' peptide adopt distinctly different conformations in the two crystallographically distinct monomers forming the PPCA dimer in our crystal structure. Yet in both monomers this polypeptide region extends out from the protein surface and is virtually completely solvent and protease accessible (Figure 9). Arg 284 and Arg 292 are particularly well exposed. The main chain atoms of Arg 298 are less accessible, being sandwiched between the strand Mβ2 and a loop N-terminal to helix Cα6, while a salt bridge with Glu 264 renders the side chain atoms of Arg 298 partially solvent inaccessible.

The active site cleft is blocked by numerous residues from the maturation subdomain in the precursor form of PPCA. The catalytic triad is rendered solvent inaccessible by residues Asn 275, lie 276 and Phe 277. These residues are part of the polypeptide Asp 272-Phe 277 which we call the 'blocking' peptide. This peptide is held down predominantly by hydrophobic contacts of Leu 273, He 276, and Phe 277 to the core domain residues Gly 57, Cys 60, Leu 180, Leu 190, Val 191 , Leu 232, Val 235, He 246, Leu 280, Leu 282, Met 299 and Ala 373 (Fig 10). In addition residue Asn 275 of the blocking peptide appears to fill what might be part of the PI binding pocket in the mature form. Further inspection of the blocking peptide suggests that Gly 274 with Ramachandran angles φ = 66° and φ = 28°, might play a central role in the strand blocking the active site. A glycine at this position appears critical to allow the polypeptide chain to adopt a conformation with its main chain at a safe distance from the catalytic triad. This might aid in allowing the blocking peptide to assume a conformation resistant to autocatalysis. The PI ' binding pocket seems to be beautifully filled by Pro 301 interacting with Thr 304, Tyr 247, Cys 60 and Cys 334. Thus substrate binding is not possible in the precursor form due to the inaccessibility of the substrate binding pockets. We conclude that the inactivation mechanism of PPCA is based on blocking of the active site, and not upon changes in the position of functional groups involved in catalysis/transition state stabilization. Both the PI, P2 and PI' binding pockets are rendered solvent inaccessible. The function of the blocking peptide seems to be to render the catalytic triad as well as the region around the PI and P2 binding pockets solvent inaccessible. The blocking peptide, however, does not assume a conformation that a peptide substrate would adopt. It is carefully positioned in a manner which is different from that of a productive substrate, thereby avoiding being by the nearby catalytic residues which are correctly poised for catalysis. A crucial observation is that the excision peptide itself does not bind in the active site cleft. Hence, mere removal of the excision peptide alone is not sufficient to allow solvent or substrate access to the active site.

Proposed Maturation Event and Extent of Conformational Rearrangement. The active site of the precursor of PPCA appears to be fully blocked by 49 residues of the maturation subdomain, as shown in Figure 1 1. Based on the precursor structure and the comparison with CPW and CPY it is proposed that a region comprising approximately residues 254-284 rearranges to free the PI. P2 binding sites, while the residues 299-302 rearrange to free the PI' binding pocket. The linker connecting these two segments of polypeptide chain is the 14 amino acid excision peptide Met 285- Arg 298. The extent of the residues rearranging is likely to be limited by a disulfide bridge Cys 253 and Cys 303, which is conserved in the serine carboxypeptidase family. This critical disulfide serves to keep the secondary structure elements together at the far end of the PI' pocket.

An interesting pair of salt bridges is observed between Arg 262, Asp 300, Glu 264 and Arg 298, four residues located on strands Mβl and Mβ3 of the mixed β-sheet found in the maturation subdomain. This cluster of residues is strategically positioned at the base of the excision peptide, close the core domain and 'shielding' the mixed β-sheet via side chain interactions (see Figure 1 1 ). These residues are strictly conserved among the human, mouse and chicken PPCAs (Galjart et al. (1991), infra). This charge cluster may be effected by a shift from neutral to acidic pH. Arrival in the endosome/lysosome is expected to result in protonation of either the Asp or the Glu residue or both, resulting in unfavorable electrostatic interactions and destabilization of this charge cluster. This in turn is expected to promote partial unfolding of maturation subdomain, allowing easier access to additional potential cleavage sites, and stimulating removal of the 'blocking' peptide which fills the active site in the precursor.

A similar double salt bridge has been observed in the aspartic proteinase zymogen pepsinogen between the proenzyme segment (Arg 8P) and the enzyme (Arg 308, Glu 13, Asp 304).

The maturation mechanism for pPPCA appears to be novel among proteases for which the three-dimensional structure of the zymogen is known. The catalytic triad in the precursor form is in a catalytically competent conformation. Enzymatic activity is prevented by a 'blocking^' peptide. The blocking peptide is however different from the excision peptide and does not get excised from the mature enzyme. This leads to the distinct difference with the other known maturation mechanisms in that, after disappearance of the excision peptide, up to 35 residues filling the active site cleft in the PPCA precursor must rearrange to render the catalytic triad solvent accessible (see Figure 12), but do not get cleaved off. Removal of the excision peptide, and possibly a shift to lower pH in the endosome/lysosome, appears to be a trigger for this event. The mechanism does not appear to be autocatalytic, as uptake experiments with cultured galactosialidosis fibroblasts, have shown that a mutant PPCA with the catalytic Ser 150 mutated to Ala, is properly targeted and processed. It retains its protective function and except for the loss of catalytic activity is biochemically indistinguishable from the wild type enzyme (Galjart et al. ( 1991 ), infra). Surprisingly, the maturation mechanism of the serine carboxypeptidases PPCA, CPW and CPY may all differ from each other as well. This is clearest for CPY, in which a 91 residue polypeptide is cleaved off N -ter inally to convert the zymogen to an active enzyme (Winther and Sorensen, Proc. Natl. Acad. Sci. USA 55:9330-9334 (1991 )), as opposed to the excision of a peptide from within the zymogen generating a two chain active form as is the case for PPCA and CPW.

Looking at the hydrolase fold family, the catalytic triad is housed in the core domain and the various cap domains attenuate the biological function by influencing entirely different properties such as: (I) enzyme kinetics exemplified by the interfaciai activation of lipases (Smith et al. Curr. Opinion in Structural Biology 2:490-496 ( 1992)); (ii) substrate channeling as is proposed for acetylcholine esterase (Sussman et al. (1991 ), infra); (iii) substrate recognition, proposed for dehalogenase by (Franken et al. (1991 ), infra) and for CPY and CPW by (Endrizzi et al. (1994), infra); and (iv) enzyme inactivation in the case of PPCA. Biological Implications. Deficiency of the protective protein/cathepsin A (PPCA) in humans results in the lysosomal storage disease galactosialidosis. PPCA is thought to form a multi-enzyme complex with β-galactosidase and neuraminidase in the lysosomes protecting the latter glycosidases in their harsh acidic and proteases-rich environment. PPCA has a 30% sequence identity to the wheat serine carboxypeptidase (CPW) and yeast serine carboxypeptidase (CPY). It has been show that PPCA in the precursor form is inactive, but upon maturation, entailing excision of a 2 kDa peptide, carboxypeptidase activity is released.

The precursor structure reveals an inactivation mechanism that has not been seen before in any of the other known zymogen structures of proteases (available for the serine-. metallo- and aspartic protease classes). The catalytic triad seems to have an arrangement poised for catalysis. However, the triad is rendered solvent and substrate inaccessible by a strand from the maturation subdomain binding in the active site cleft. Surprisingly, this strand called the 'blocking' peptide does not overlap with the 2 kDa "excision' peptide. Hence, after removal of the excision peptide up to 35 additional residues must rearrange in order to unblock the active site cleft. A strategically positioned pair of salt bridges, comprising Arg 262, Arg 298, Glu 264, and Asp 300 at the base of the excision peptide, are expected to optionally become destabilized at low pH, unraveling this region of the structure, allowing easier access to cleavage sites and/or promoting the rearrangement event. A number of research groups are currently involved in designing enzyme and gene therapy procedures for several lysosomal storage diseases. Insight into the three-dimensional structure, protein functioning and stability of PPCA, the first enzyme of known structure associated with a lysosomal storage disease and the third human lysosomal structure to be determined, may prove useful in future designs of an adequate therapy procedure for galactosialidosis. Information from the three-dimensional structure of PPCA, might also aid in designing an engineered form of PPCA with increased stability and a longer half-life.

Table 1: X-ray Data Collection Statistics

Table 2: Course of Model Building

Rfactor Rfrec CC cc_free nr. of nr. of side * muk

Model C's chains (io⁴ A³) {statistics using data between 8.0 and 3.θA} mol. repl. mrl rigid body ref. (rmr) 331 125 - 54.2 55.3 0.243 0.244 calculate NCS matrix 52.6 52.9 0.287 0.318 bes monomer (bm) rigid body ref. 294 228 - 55.9 57.4 0.228 0216 update NCS matrix 53.5 55.0 0.320 0328 bmcl (mask 1) 373 258 10.8 49.9 51.3 0.403 0.424 bmc2 (mask 1) 405 277 10.8 48.6 48.4 0.443 0.478 bmc3 (mask 2) rigid body ref. 411 307 9.99 47.1 48.6 0.471 0.491 positional ref. (pbmc3) 46.9 48.4 0.476 0.492 update NCS matrix 39.4 44.7 0.622 0.562 bmc4 (mask 1) 412 327 10.8 41.7 43.1 0.584 0.585 bmc5 (mask 3) 435 387 8.88 39.8 40.6 0.621 0.623 bmcό (mask 4) 442 413 9.11 38.4 40.2 0.647 0.637

Summary of the bootstrapping procedure. The resulting models have been listed chronologically starting with the molecular replacement solution, i.e. mr (molecular replacement), bm (best monomer core), and the bootstrapping cycles bmcl through bmcό. The following statistics are given for the various models: the number of C" atoms built per monomer; the number of correct side chains incorporated per monomer and the volume of the molecular mask used during the averaging if applicable. The quality of each model is assessed using the R_f,,-,_*, R„_e, CC and CC_fm calculated by X-PLOR for data between 8.0 and 3.0 A. After positional refinement of model bmc3. both monomers were made equivalent by taking one monomer and generating the non-crystallographically related one.

Table 3: Current Statu. r of the Model statistics for the data used in refinement: resolution (A) Rfactor (%) completeness (%)

8.0 - 4.3 22.4 85.7

4.3 - 3.5 19.0 89.1

3.5 - 3.0 20.6 89.1

3.0 - 2.8 21.3 87.9

2.8 - 2.6 22.3 86.1

2.6 - 2.4 22.2 84.0

2.4 - 2.3 22.7 81.3

2.3 - 2.2 24.0 78.3

8.0 - 2.2. A 21.3%

model: molecules in the asymmetric unit: 2 residues (out of 904 possible): 902 sugars: 6 waters: 296

r.m.s.d. bond length (A): 0.012 r.m.s.d. bond angles ("): 1.72

average B-values for main chain atoms (A²): 16.6 side chain atoms (A²): 18.3

Table 4

Superposition of the proposed catalytic machinery of the serine carboxypeptidases with known

Ausubel i . I , eds , Current Protocols in Molecular Biology, Greene Publishing Assoc ana Wiley Interscience, N Y , (1987, 1992, 1993. 1994) Aymard-Henry et al , Bulletin of the World Health Organization 48 199-202 (1973) Bailey et al , Improving Protein Phases, Proceedings of the CCP4 Study Weekend,

(1988) Baldwin et al , Proc Natl Acad Sci USA 90 6796-6800 (1993) Bousse et al , Virology 204 506-514 (1994) Breddam et al Carlsberg Res Commun 57:83- 128 (1986) Breddam ef -j/ Cansberg Res Commun 52 297-31 1 (1987) Brϋnger et al, J. Mol. Biol. 203:803-816 (1987)

Brunger, A.T., Acta Cryst. A46:46-57 (1990)

Brunger, X-PLOR: Version 3.1, "A system for X-ray crystallography and NMR," Yale University Press, New Haven, CT (1992) Chong et al, Biochim. Biophys. Ada 7077:65-71 (1991)

Chong et al, Eur. J. Biochem. 207:335-343 (1992)

Colligan et al, eds., Current Protocols in Immunology, Greene Publishing Assoc. and Wiley Interscience, N.Y., (1992, 1993)

Cowtan and Main, Acta Crystallogr. D 49: 148- 157 (1993) Creighton, Catalysis in Proteins Structures and Molecular Principles, W.H. Freeman and Company (1984), pp. 439-443

Crennell et al, Structure 2:535-544 (Jun. 1994) d'Azzo et al, Proc. Natl. Acad. Sci. U.S.A. 79:4535-4539 (1982) d'Azzo et al, "Galactosialidosis," in The Metabolis and Molecular Bases of Inherited Disease, Scriver et al., eds., McGraw Hill Inc., New York (1994), pp. 2785-2832.

Endrizzi et al.Biochemistry 33:11106-11 120 (1994).

Engh and Huber,Λc/σ Cryst. A47:392-400 (1991)

Franken, S.M., et al, J. EMBO 70:1297-1302 (1991)

Franken, S.M., et al, J. EMBO 70:1297-1302 (1991) Fujinaga and Read, J. Appl. Cryst 20:517-512 ( 1987)

Galjart et al, J. Biol. Chem. 266:14754-147 '62 (1991)

Goodford, J. Med. Chem. 25:849-857 (1985)

Guasch, A., et al, J. Mol Biol 224: 141-157 (1992)

Hanna et al, J. Immunol. 153:4663-4672 (1994) Harlow and Lane, Antibodies: a Laboratory Manual, Cold Spring Harbor Laboratory (1988)

Henderson, Biochem. J. 727:321-333 (1972)

Hoogeveen et al, J. Biol. Chem. 255:12143-12146 (1983)

Hubbes et al, J. Biochem. 285: 827-831 (1992) ltoh et al, J. Biol Chem. 270:515-518 (1995)

Jackman etal, J. Biol. Chem. 265: 1 1265- 11272 (1990)

Jackman et al, J. Biol. Chem. 267/2872-2875, (1992)

Jackman et al, Hypertension 27:925-928 (1993)

James and Sielecki, Nature 37°:33-38 (1986) Jones, et al, Acta Crystallogr. A47: 110-119 (1991)

Jones, etal, Acta Crystallogr. A47- 53-770 (1991)

Kabsch and Sander, Biopolymers 22:2577-2637 (1983)

Kabsch, J. Appl. Crystallogr. 2(5:795-800 (1993)

Kase et al, Biochem. Biophys. Res. Commun. 172: 1 175- 1179 ( 1990) Kaufman et al, eds. Handbook of Molecular and Cellular Methods in Biology and Medicine, CRC Press, Inc., Boca Raton (1995)

Kleywegt & Jones, Bailey et al. eds.. First Map to Final Model, SERC Daresbury Laboratory, UK, pp 59-66 ( 1994)

Kohler and Milstein, Nature 256:495-497 (1975) Kraulis, J. Appl Cryst. 24:946-950 ( 1991 )

Lackowski et al, J. Appl. Cryst. 26:283-291 (1993)

Lamzin and Wilson, Acta Cryst. D49Λ29Λ47 (1993)

Laver, Virology 56:78-87 (1978)

Leatherbarrow, Trends Biochem. Sci. 75:455-458 (1990) Liao et al. Biochemistry 37:9796-9812 (1992)

Lϋthy et al., Nature 356:83-85 (1992)

Luthy et al. Nature 356:83-85 (1992)

Matthews, B.W.. J. Mol. Biol 33:491 -497 (1968)

McPhalen and James et al. Biochemistry 27:6582-6598 (1988) Metcalf & Fusek, EMBO 72:1293-1302 (1993)

Molecular Replacement. "Proceedings of the CCP4 Study Weekend," Machin, (1985)

Molecular Replacement. "Proceedings of the CCP4 Study Weekend," Dodson et al, (1992) Murti et al. , Proc. Natl. Acad. Sci. USA 90: 1523-1525 ( 1993)

Musil et al, EMBO 70:2321-2330 (1991)

Nicholls, A., et al, Proteins 77:281-296 (1991)

Noble et al, FEBSLett. 337:123-128 (1993)

Oho, B-H, Acta Cryst. D51: \ 40- 144 ( 1995) Okamura-Oho, Y. et al, Biochim. Biophys. Acta 7225:244-254 (1994)

Ollis et al, Protein Eng. 5:197-21 1 (1992)

Potier et al, Analyt. Biochem. 94:2 7-296 (1979)

Potier et al, J. Biochem. 267:197-202 (1990)

Pshezhetsky et al, Biochem. Biophys. Acta 1122: 154- 160 ( 1992) Pshezhetsky et al, Biochemistry 34:2431 -2440 ( 1995)

Rao, et al, Acta Cryst. A36: 878-884 (1980)

Rawlings & Barrett, Methods in Enzymology. 244:19-61 (1994)

Read, R.J., Acta Crystallogr. A 42: 140- 149 (1986)

Rossmann, "Improving Protein Phases" Proceedings of the CCP4 Study Weekend. (Feb. 5-6, 1988) Ruban i and Polokoff, Pharmc.Rev. 46:325-415 ( 1994)

Rudenko et al. , Structure 3: 1249- 1259 (1995).

Sambrook et al., Molecular Cloning: A Laboratory Manual, Second edition, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1989)

Sawyer el al, eds., Proceedings ofCCP4 Study Weekend, pp. 56-62, SERC Darsbary Lab.,UK (1993)

Scheibe et al, Biomed. Biochim. Acta 49:547-556 (1990)

Scriver, CR. et al, eds., The Metabolic and Molecular Bases of Inherited Disease, Vol 11, 7th Ed., McGraw Hill Inc., p. 2825-2837 (1994)

Smith et al, Curr. Opinion in Structural Biology 2:490-496 (1992) Steigeman, W., PhD Thesis, Technical University, Munich, Germany

Sussman et al, Science 253: 872-879 (1991)

Takimoto e α/., J. Virol. 66:7597-7600 (1992)

Taylor et al.J. Mol Biol 226:1287-1290 (1992)

Tete-Favier, et al, Acta Cryst. D49.246-256 (1993) Tettamanti et al eds., Sialidases and Sialidosis. Perspectives in Inherited Metabolic Diseases, Vol. 4, Edi. Ermes, Milano, pp. 261-279 and 379-395 (1981)

Thompson et al, J. Virol. 62:4653-4660 (1988)

Tronrud et al.,Acta Crystallogr. A 43:489-501 (1987)

Varghese et al, Proteins 74:327-332 (1992) Verheijen β/ α/., Biochem. Biophys. Res. Commun. 705:868-875 (1982)

Vriend, J. Mol. Graph. 5:52-56 (1990) van Diggelen et al, J. Biochem. 200: 143-151 (1981) van Diggelen et al, Biochem. Biophys. Acta. 703:69-76 (1982) van Diggelen et. al. Lancet 2:804(1987) Wahl et al, J. Nucl. Med. 24:316-325 ( 1983)

Wenger et al, Biochem. Biophys. Res. Commun. 52:589-595 (1978)

Winther and Sorensen, Proc. Natl. Acad. Sci. USA 55:9330-9334 (1991 ) Wolf et al, eds., Isomorphous Replacement and Anomalous Scattering: Proceedings of CCP4 Study Weekend, pp. 80-86, SERC Daresbury Lab., UK (1991) Yamamoto & Nishi ura, J. Biochem. 79:435-442 (1987) Yamamoto et al. J. Biochem. 92:13-21 (1982) Zhou et al. J. EMBO. 70:404-4048 (1991 )

Claims

What Is Claimed Is:

1. A method for crystallizing a human protective protein cathepsin A (PPCA) or precursor human protective/cathepsin A protein (pPPCA). comprising (a) providing a purified PPCA or pPPCA; (b) crystallizing the purified PPCA or pPPCA using a hanging drop or diffusion method, to provide crystallized PPCA or pPPCA having biological activity, wherein the crystallized PPCA or pPPCA is resolvable using x-ray crystallography to obtain x-ray diffraction patterns suitable for three-dimensional structure determination of the PPCA or pPPCA.

2. A method according to claim 1, wherein said PPCA or pPPCA has at least one biological activity selected from the group consisting of enzyme protecting activity, enzyme modulating activity and peptide hydrolyzing activity.

3. A method according to claim 1, wherein said crystallization step is done under conditions of purified PPCA or pPPCA; 2-30% PEG400-10,000; precipitating salt; buffers, and pH 7-9.

4. A method according to claim 3, wherein the crystallization conditions are PPCA or pPPCA; 5-14% PEG8000, 40-80 mM tromethamine, 0.05-2.0 mM NaN₃ and pH 8.0-8.3.

5. A crystallized PPCA or pPPCA, or at least one subdomain thereof, provided by a method according to claim 1.

6. A method for providing an atomic model of a PPCA or pPPCA, comprising

(a) providing a computer readable medium having stored thereon atomic coordinate/x-ray diffraction data of said PPCA or pPPCA in crystalline form, said data sufficient to model the three-dimensional structure of said PPCA, said pPPCA, or at least one subdomain thereof;

(b) analyzing, on a computer using at least one subroutine executed in said computer, the atomic coordinate/x-ray diffraction data from (a) to provide data output defining an atomic model of said PPCA or said pPPCA. said analyzing utilizing at least one computing algorithm selected from the group consisting of data processing and reduction, auto-indexing, intensity scaling, intensity merging, amplitude conversion, truncation, molecular replacement, molecular alignment, molecular refinement, electron density map calculation, electron density modification, electron map visualization, model building, rigid body refinement, positional refinement; and

(c) obtaining atomic model output data defining the three-dimensional structure of said PPCA, pPPCA or at least one subdomain thereof.

7. A method according to claim 6, wherein said computer readable medium further has stored thereon data corresponding to a nucleic acid sequence or an amino acid sequence data comprising at least one structural domain or a functional domain of a PPCA or pPPCA corresponding to a portion of the amino acid sequences of Figures 13 or 14, and wherein said analyzing step further comprises analyzing said sequence data.

8. A computer readable medium having stored thereon atomic model data of said PPCA or pPPCA as the model output data produced by a method according to claim 6.

9. A computer-based system for providing atomic model data of the three dimensional structure of a PPCA or a pPPCA, comprising the following elements; (a) a computer readable medium having stored thereon atomic coordinate/x-ray diffraction data of said PPCA or pPPCA or at least one subdomain thereof;

(b) at least one computing subroutine, that when executed in a computer, causes the computer to analyze the atomic coordinate/x-ray diffraction data from (a) to provide data output defining an atomic model of said PPCA or pPPCA, said analyzing utilizing at least one computing subroutine selected from the group consisting of data processing and reduction, auto-indexing, intensity scaling, intensity merging, amplitude conversion, truncation, molecular replacement, molecular alignment, molecular refinement, electron density map calculation, electron density modification, electron map visualization, model building, rigid body refinement, positional refinement; and (c) retrieval means for obtaining atomic model output data defining the three- dimensional structure of said PPCA, pPPCA or at least one subdomain thereof.

10. A computer-based system according to claim 9, wherein said computer readable medium further has stored thereon data corresponding to a nucleic acid sequence or an amino acid sequence data comprising at least one structural domain or a functional domain of a PPCA or pPPCA corresponding to a portion of the amino acid sequences of Figures 13 or 14, and wherein said at least one subroutine further includes analyzing said sequence data.

11. A computer readable medium, having stored thereon atomic model data of a PPCA, pPPCA, or at least one subdomain thereof, produced by a computer system according to claim 9.

12. A method for providing an computer atomic model of a ligand of a PPCA or pPPCA, comprising

(a) providing a computer readable medium according to claim 1 1 , having stored thereon atomic model data of a PPCA, a pPPCA or at least one subdomain thereof;

(b) providing a computer readable medium having stored thereon atomic model data sufficient to generate atomic models of potential ligands of PPCA or pPPCA; (c) analyzing on a computer, using at least one subroutine executed in said computer, the atomic model data from (a) and the ligand data from (b), to determine binding sites of PPCA or pPPCA and to provide data output defining an atomic model of a ligand of said PPCA, pPPCA, or at least one subdomain thereof, said analyzing utilizing computing subroutines selected from the group consisting of data processing and reduction, auto-indexing, intensity scaling, intensity merging, amplitude conversion, truncation, molecular replacement, molecular alignment, molecular refinement, electron density map calculation, electron density modification, eleciron map visualization, model building, rigid body refinement, positional refinement; and (d) obtaining atomic model output data defining the three-dimensional structure of a ligand of said PPCA, pPPCA or at least one subdomain thereof.

13. A computer readable medium having stored thereon the model output data produced by a method according to claim 12.

14. An isolated PPCA or pPPCA ligand, corresponding to the physical molecule of the atomic model of the ligand model produced by a method according to claim 12.

15. A computer-based system for providing an atomic model of a ligand of a PPCA or pPPCA, comprising the following elements;

(a) a computer readable medium having stored thereon atomic model data of a PPCA or pPPCA;

(b) a computer readable medium having stored thereon atomic model data sufficient to generate atomic models of potential ligands of PPCA or pPPCA;

(c) at least one computing subroutine for analyzing on a computer the atomic model data of PPCA or pPPCA from (a) and the ligand data from (b), to determine binding sites of PPCA or pPPCA and to provide data output defining a atomic models of potential ligands of PPCA or pPPCA, said analyzing utilizing at least one computing subroutine selected from the group consisting of data processing and reduction, auto-indexing, intensity scaling, intensity merging, amplitude conversion, truncation, molecular replacement, molecular alignment, molecular refinement, electron density map calculation, electron density modification, electron map visualization, model building, rigid body refinement, positional refinement; and

(d) retrieval means for obtaining atomic model output data defining the atomic models of potential ligands of PPCA or pPPCA.

16. A computer readable medium, comprising atomic model output data of a potential ligand of PPCA or pPPCA, said data produced by a method according to claim 15.

17. An isolated PPCA or pPPCA ligand, corresponding to the physical molecule of the atomic model of a ligand produced by a computer system according to claim 15.

18. A crystallized pPPC A, having the atomic coordinates presented in Figure 23.1-23.41.