[go: up one dir, main page]

CN102411572B - Efficient sharing method for biomolecular data - Google Patents

Efficient sharing method for biomolecular data Download PDF

Info

Publication number
CN102411572B
CN102411572B CN201010288419.3A CN201010288419A CN102411572B CN 102411572 B CN102411572 B CN 102411572B CN 201010288419 A CN201010288419 A CN 201010288419A CN 102411572 B CN102411572 B CN 102411572B
Authority
CN
China
Prior art keywords
data
biomolecular
field
file
data structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201010288419.3A
Other languages
Chinese (zh)
Other versions
CN102411572A (en
Inventor
陈平
宋立宇
鲁方
孔令印
王敏
王翊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiaxing Anyu Biotechnology Co ltd
Original Assignee
CHONGQING NUOJING BIOLOGICAL INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHONGQING NUOJING BIOLOGICAL INFORMATION TECHNOLOGY CO LTD filed Critical CHONGQING NUOJING BIOLOGICAL INFORMATION TECHNOLOGY CO LTD
Priority to CN201010288419.3A priority Critical patent/CN102411572B/en
Publication of CN102411572A publication Critical patent/CN102411572A/en
Application granted granted Critical
Publication of CN102411572B publication Critical patent/CN102411572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an efficient sharing method for biomolecular data, which comprises the following steps of: selecting meaningful fields in the data structure of each kind of biomolecular data file; combining the fields until all of the meaningful fields in the data structure of the biomolecular data file are selected; carrying out permutation and combination on the fields according to the field expression information and forming a new field set; using the fields in the set for generating a new data structure of the biomolecular data file; and using the new biomolecular data file with the new data structure for carrying the data of the read biomolecular data file. When the method is adopted for analyzing the data, the data structure adopted by the existing biomolecular data file does not need to be considered, and researchers with different backgrounds and different skills can favorably obtain the self required information from the existing biomolecular data file, so the data processing speed and the data sharing efficiency of the biomolecular data can be further improved.

Description

The efficient method of sharing of biomolecular data
Technical field
The present invention relates to a kind of efficient method of sharing of data.
Background technology
Bio-diversity is biological base attribute, can study from different directions such as molecule, cell, bions.Along with the development of sequencing technologies and computer technology, produced the biomolecular data of magnanimity.Due to the complicacy of field of biological molecule, make the research in this field present the features such as the cycle is long, result of study is complicated, research difficulty is large, data volume is large.At present, thousands of researcher carried out research to biomolecule in the world, the result data obtaining in order to preserve research, and the researcher who is conducive to all over the world uses for reference achievement in research and the research of continuation for a long time mutually, just need to carry out tissue manipulation to result data, i.e. the operations such as functional description, adjustment order, repairing shearing, storage.The computer system adopting due to different researchers varies, to difference in functionality software to know degree industry different, for processing the software data structure of biomolecular data, also vary, therefore produce the biomolecular data of the dissimilar and form that is stored in a large number different computer systems, for sharing biomolecular data, brought huge trouble.
For example: scientist 1 passes through certain means in A research, obtained No. 1 chromosomal result of study of certain living species--No. 1 chromosomal growth hormone gene data, after the GenBank data description form that this result of study is recommended according to NCBI (the state-run biological data of U.S. center) or data structure are described, with the form of file, be submitted to ncbi database issue.
Scientist 2 passes through certain means in B research, obtained this living species 2, No. 3 chromosomal results of study--2, No. 3 chromosomal growth hormone gene data, after the EMBL data description form that this result of study is recommended according to EMBL (European biological data center) is described, with the form of file, be submitted to EMBL Database Publishing.
Scientist 3, in C research, wishes to use scientist 1 and scientist's 2 achievement in research to carry out follow-up study.But scientist 1 and scientist's 2 result of study has been submitted in the database that data structure in two kinds of different computer systems is different, data description form due to above-mentioned database, be that data structure there are differences, scientist 3 requires a great deal of time and resolves above-mentioned scientist 1 and scientist 2 is stored in the data file in ncbi database and EMBL database, and directly reference utilizes the above-mentioned associated data files that derives from disparate databases to carry out follow-up analysis and research.Even if scientist completes the parsing of data in file automatically for the different Data Analysis program of biomolecular data documentation that adopts different pieces of information structure, relative independentability due to analysis program, also be difficult to improve on the whole Data Analysis speed and implementation trouble, the efficiency of data sharing simultaneously does not still improve, and this has brought difficulty to scientist 3 research.
Visible, the existing data structure complexity for tissue element biological data is various, use the data that adopt these data structure organization to get up, will understand concrete data structure, obviously, the data of its tissue could be resolved and use to the data structure of understanding large amount of complex, greatly the data processing speed of restriction biomolecular data and reduction data sharing efficiency.
Summary of the invention
The problem to be solved in the present invention is, a kind of efficient method of sharing that can efficiently share the biomolecular data of biomolecular data is provided.
The efficient embodiment of the method for sharing of biomolecular data provided by the invention, comprising:
Select significant field in the data structure of each biomolecular data file;
Significant field in the data structure of acquisition the first biomolecular data file, form field groups, for significant field in the data structure of the second obtaining and later each biomolecular data file, the not identical field of field with field groups is supplemented and entered field groups, until significant field is all selected in the data structure of each biomolecular data file;
According to the logic arrangement of described field expressing information, combine described field, form new set of fields;
By the field in described set, generate the new data structure of biomolecular data file.
The neoformation molecular data file that use has described new data structure carries the data in the biomolecular data file reading.
The method providing according to the embodiment of the present invention, how the data structure no matter existing biomolecular data file adopts has difference and quantity have how many, owing to forming the set of fields of new data structure, to take the data structure of existing biomolecular data file be basis, can the content of different biological molecules data file is automatically unified in new file, because the data structure of new file has predefined characteristic, like this with regard to the energy data processor based on new data structure prepared in advance, each biomolecular data file is resolved, therefore can accelerate the resolution speed of biomolecular data file, during resolution data also without the data structure of considering that more existing biomolecular data file adopts, by unification, originate different, the biomolecular data file that data structure is different, rambling biological data is converted to the common data structure of easy operating, be conducive to background difference, the biomolecule researcher that level is different can be from existing biomolecular data file, obtain own required information, thereby data processing speed and the data sharing efficiency of biomolecular data have further been improved.
Accompanying drawing explanation
Below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation.
Fig. 1 is the flow chart of steps of first embodiment of the method for the invention;
Fig. 2 is the flow chart of steps of first embodiment of the method for the invention.
Embodiment
In computer realm, conventionally according to the characteristic that is organized data, build for organizing the data structure (also can be described as data model or data layout) of described data, after using these data structure organization data, will these data be stored with the form of file.When the shared biomolecular data by different pieces of information structure organization, in order to eliminate large quantity and the complicacy of the data structure that file adopts of carrying biomolecule number pick, the present embodiment be take the data structure of other various known organism molecular data files and is basis, according to the feature of the biomolecular data objectively existing, build the new data structure that is conducive to share biomolecular data, for the new computer model of tissue biological's molecular data, thereby the different biomolecular data in various sources is organized and is stored as biomolecular data file, for researcher, share.
According to the flow process of the first embodiment described in Fig. 1, can find out that this embodiment mainly comprises four main steps.
First in step 1, select significant field in the data structure of each biomolecular data file.The data structure of biological data file described here has been studied clear in advance, therefore can determine wherein significant field from the angle of biomolecular science.Due to the difference of research contents, the degree of depth, bio-diversity etc., complexity, the structure of different biological molecules data file data structure also have larger difference.For example the concrete field quantity of data structure is, the implication of field expression data is not identical, described significant field, exactly from the general character of biomolecule research, select the field of information value, could, in follow-up step, organize to be out conducive to continue to study, have and share the biomolecular data being worth like this.Unworthy field, the field that for example indicates individual research process can vary with each individual, for follow-up research, just nonsensical, also has in addition data storage to enter the serial number field etc. of file.Need explanation, the present embodiment is applicable to adopt the data store organisation of two-dimentional tabular form, is also applicable to adopt text structure, i.e. the file of .TXT structure.For .TXT file, if a certain line character does not identify the key word of biomolecular data, be exactly insignificant row, i.e. insignificant field, wherein, key word is field name.
Significant field in the data structure of each biomolecular data file of determining for step 1, in step 2, organize, be about to these fields according to the synthetic field groups of the der group of selecting, until significant field is all selected in the data structure of each biomolecular data file.
In step 3, according to the logic arrangement of described field expressing information, combine the field in described field groups, form newer field set.The present embodiment, when building new data structure, according to the feature of biomolecular data, rearranges and combines the field in new set of fields.The foundation of described permutation and combination, be the logic association that biomolecular data self objectively exists, that is to say, if there is relevance between the biomolecular data that field is expressed, just think that these fields have logical relation, the biomolecular data that these fields are expressed objectively subsistence logic is associated.These logic associations make the data of field after permutation and combination sequentially have following relation, the field that order is arranged is formerly the basis in rear field, be conducive to like this by the more information of limited data representation, thereby can obtain the technique effect that uses less data that more information is provided, and then obtain more quantity of information and technique effect that less storage space takies, the example of this respect word illustrated afterwards.Described combined type refers to expressing difference, but the identical field merging of field contents essence, concrete merging completes according to the meaning of data, has multiple concrete implementation in reality, for example:
First obtain significant field in the data structure of the first biomolecular data file, form field groups; For significant field in the data structure of the second obtaining and later each biomolecular data file, the not identical field of the field with field groups is supplemented and entered field groups.
This less data scale that the embodiment of the present invention provides reaches the scheme of more information, and new data structure can contain the data structure of existing biomolecular data file, when sharing the data of this biological data file, can consider the data structure of the biological data file that is shared, therefore adopt the embodiment of the present invention can to reach the efficient technique effect of sharing biomolecular data, solved researcher run into the biomolecular data file that is stored in different computer systems, adopts different pieces of information structure cannot handled easily and the problem of analysis and research.By the way, the present embodiment is specially adapted to the data of the biological data file that organising data amount is larger, if order is arranged the explanation of the data after data formerly need to be arranged in file, in order to share the high-level efficiency of data, spend the data that a large amount of time is adjusted or inquiry needs, the redundancy that relies on data solves this time and spends large problem, and adopts the present embodiment just can address this problem.The most important significance of this step has been to utilize the logic behavior of data itself, improves the efficiency of organising data, uses the problem of the less more information of data representation, and improves the efficiency of sharing data.
Then in step 4, with newer field set, generate the new data structure of biomolecular data file, generate and can be used in the file of sharing data in existing biomolecular data file.
Finally in step 5, use the neoformation molecular data file that has described new data structure to carry the data in the biomolecular data file reading.In the present embodiment, step 5 realizes according to following sub-step: first read in and need shared biomolecular data file in calculator memory, then judge that can the data structure that this biomolecular data file adopts be correctly validated, if can not, feedback None-identified information, end operation, otherwise use neoformation molecular data file data structure and the Related fields relation that is read into the biomolecular data file data structure in calculator memory, the data correspondence of data structural field in this document is filled into data space corresponding to neoformation molecular data file data structure respective field.Can the data structure that judgement biomolecular data file adopts be correctly validated, and can complete by the extension name of file or the special identifier in file content, and this does not repeat.
The example that is shared as with .TXT biomolecular data file illustrates embodiment illustrated in fig. 1 below..TXT the content of file is generally following form:
The first row lteral data;
The second style of writing digital data;
The third line lteral data;
The N digital data of composing a piece of writing.
Wherein, the first row lteral data is the first trip of this biomolecular data file, for describing this biomolecular data file second row to structure and the implication of N style of writing digital data.The second style of writing digital data is used for describing the concrete biomolecule information of this biomolecule file to N style of writing digital data.
For example, adopt instance data form with .TXT file in computing machine of the biomolecule file of GenBank data structure to preserve, its first trip lteral data is:
LOCUS?LISOD?756bp?DNA?linear?BCT30-JUN-1993;
After the second row, data are:
DEFINITION?Listeria?ivanovii?sod?gene?for?superoxide?dismutase;
KEYWORDS?sod?gene:superoxi?de?dismutase;
ACCESSION?X64011?S78972;
Wherein, the implication of first trip lteral data is:
Note: bp also can be written as BP, full name base pair, and base-pair namely, the 756bp meaning is exactly 756 base-pairs.For describing the long measure of DNA sequence dna.
After supposing to adopt second row of biomolecule file of EMBL data structure, data are identical, and first trip lteral data is:
ID?X64011;SV1;linear;genomic?DNA;STD;PRO;756BP.
The implication of above-mentioned first trip lteral data is:
Suppose take that above-mentioned two kinds of biomolecular data files generate the new data structure of biomolecular data file as basis, according to embodiment illustrated in fig. 1, through step 1, to step 4, obtain following new data structure:
< molecule title >, < sequence length >, < source database name >, < database access >, < Data Update time >, the > of < version number, < sequence type >, < molecular classification >, < molecule is described >, < key word >, whether < is ring molecule >, visible, new data structure is characterised in that: set of fields is the union of known data structure set of fields, and field name has been carried out standard and unification according to the meaning of biomolecule, and field has been carried out new sequence and combination.For example, molecular conformation field and whether ring molecule field combines and unifies.
When use the neoformation molecular data file that has described new data structure to carry the data in the GenBank biomolecule file reading through step 5, the partial content of neoformation molecular data file is:
< molecule title >LISOD;
< sequence length >756;
< source database name >BCT;
< database access >X64011 S78972; I.e. " ACCESSION " field;
< Data Update time >30-JUN-1993;
The >NULL of < version number; //NULL field represents sky;
< sequence type >DNA;
< molecular classification >NULL;
< molecule is described >Listeria ivanovii sod gene for superoxide dismutase; I.e. " DEFINITION " field;
< key word >sod gene; Superoxide dismutase; I.e. " KEYWORDS " field;
Whether < is that ring molecule >false//when the content of this field is false (false), molecular conformation is linear, if circular molecular conformation is annular (true).
Suppose that EMBL biomolecule file is also with the preservation of .TXT document form, ID is the sign of this biomolecule file.Its first trip lteral data is:
ID?X64011;SV1;linear;genomic?DNA;STD;PRO;756BP;
After the second row, data are:
XX;
AC?X64011;S78972;
XX;
SV?X64011.1:
XX;
DE?Listeria?ivanovii?sod?gene?for?superoxide?dismutase;
XX;
KW?sod?gene;superoxide?dismutase;
Note, in this document, " DE " is equivalent to " DEFINITION ", and " SV " is equivalent to " version number ", and " AC " is equivalent to " ACCESSION ", and " KW " is equivalent to " KEYWORDS ".
When use the neoformation molecular data file that has described new data structure to carry the data in the EMBL biomolecule file reading through step 5, the partial content of neoformation molecular data file is:
< molecule title >X64011;
< sequence length >756;
< source database name >STD; Explanation is from EMBL database;
< database access >X64011; S78972;
< Data Update time >NULL;
The >X64011.1 of < version number;
< sequence type >genomic DNA:
< molecular classification >PRO;
< molecule is described >Listeria ivanovii sod gene for superoxide dismutase;
< key word >sod gene; Superoxide dismutase;
Whether < is ring molecule >false;
As from the foregoing, the embodiment of the present invention builds a kind of new data model structure based on biomolecular data characteristic, this structure comprises the data structure of the biomolecular data file employing that existing researcher uses, the identical information of biological meaning essence in the different biological molecules data file that so just different researchers can be used, by other, adopt the field and the field mappings relation that adopts the biomolecule file of new data structure of the biomolecule file of traditional data structure, just by two kinds of separate sources, adopt data stuffing in the biological data file of different pieces of information structure to adopting data space corresponding to respective field in the biomolecular data file of new data structure.For example, adopt the 765bp in the biological data file of GenBank data structure and adopt the 765BP in the biological data file of EMBL data structure, be filled into the data space of the < molecular length > Related fields in the biological data file based on new data structure.The unique biological meaning data " 30-JUN-1993 " that occur in GenBank file will directly be filled into the data space of < Data Update time > Related fields in biological data file, and in EMBL, do not possess such biological meaning data, the data space of the < Data Update time > Related fields in biological data file is filled " NULL " (sky), such data stuffing does not need user's manual intervention of biological data file, visible, adopting method that the embodiment of the present invention provides to facilitate researcher to realize biomolecular data section shares and unified operation, for next step provides possibility to the efficient analysis of data and research.
In embodiment illustrated in fig. 1, the realization of step 5 is with reference to figure 2.
Step 21: read in biomolecular data file in calculator memory.
Step 22: judge that can the data structure that this biomolecular data file adopts be correctly validated.
What suppose to read in is GenBank biomolecular data file (.TXT file), and its partial data is:
LOCUS?LISOD?756bp?DNA?linear?BCT30-JUN-1993;
DEFINITION?Listeria?ivanovii?sod?gene?for?superoxide?dismutase;
ACCESSION?X64011?S78972:
VERSION?X64011.1GI:44010:
KEYWORDS?sod?gene;superoxide?dismutase;
The data of the first row are the description line of this document content, and the later data of the second row are that concrete biomolecule information is described.By identification this document the first row, whether there is " LOCUS " sign, can confirm that whether this document is GenBank formatted file, adopts the file of GenBank format data structure.
If what read in is EMBL biomolecular data file (.TXT file), its partial data is:
ID?X64011;SV1;linear;genomic?DNA;STD;PRO;756BP;
XX;
AC?X64011;S78972:
XX;
SV?X64011.1:
XX;
DE?Listeria?ivanovii?sod?gene?for?superoxide?dismutase:
XX;
KW?sod?gene;superoxi?de?di?smutase;
The data of the first row are the description line of this document content, and the later data of the second row are that concrete biomolecule information is described.XX is an insignificant null.By identification this document the first row, whether there is " ID " sign and can confirm that with " SV " sign afterwards whether this document is the file that adopts EMBL data structure, adopts the file of EMBL format data structure.Wherein, " SV " is the unique identification character of EMBL file.
Step 23: if current file None-identified sends a None-identified feedback, and the relevant information of notifying computer system to add new biomolecular data file, then end operation.That is to say, the present embodiment adopts for storing the pattern database of biomolecular data file-related information, when running into the biomolecular data file of unknown data structure, feedback can not automatic identification information, after the data structure identification that need to adopt this biomolecular data file, to the relevant information of adding this biomolecular data file in pattern database, biomolecular data file name for example, the field that data structure title, data structure adopt etc.If current file can correctly be identified, enter step 24.
Step 24: use neoformation molecular data file data structure and the Related fields relation that is read into other biomolecular data file data structure in calculator memory, the data correspondence of data structural field in the biomolecular data file reading in is filled into data space corresponding to neoformation molecular data file data structure respective field, completes data in the biomolecular data file that adopts traditional data structure to adopting map operation in the biomolecular data file of new data structure.
It is the concrete number pick map operation of example explanation that the file of following employing GenBank format data structure is take in the operation of concrete field mappings, supposes that the partial data of this document is:
LOCUS?LISOD?756bp?DNA?linear?BCT30-JUN-1993;
DEFINITION?Listeria?ivanovii?sod?gene?for?superoxide?dismutase;
ACCESSION?X64011?S78972;
VERSION?X64011.1GI:44010:
KEYWORDS?sod?gene;superoxide?dismutase;
The Related fields relation that adopts the field of GenBank format data structure file and the biomolecular data file of the employing new data structure that the embodiment of the present invention provides, data-mapping relation and map operation are:
< molecule title >=LISOD, illustrate: LOCUS is the file identification that adopts GenBank data structure in ncbi database, also be the key word before name, there is corresponding relation with < molecule title > in data model, in the data of the first row of this document, after LOCUS, it is exactly the title of current biomolecule, when running into this mark of LOCUS, it by Context resolution thereafter, is just the content of < molecule title > field in new data structure, further " LOCUS " correspondence is filled into the data space of neoformation molecular data file data structure < molecule title > Related fields.
< molecule is described >=DEFINITION; Illustrate: when running into this key word of DEFINITION, by Context resolution thereafter, be just the content that in new data structure, < molecule is described > field, further " Listeria ivanovii sod gene for superoxide dismutase " correspondence be filled into the data space that neoformation molecular data file data structure < molecule is described > Related fields.
The >=VERSION of < version number; Illustrate: when running into this key word of VERSION, by Context resolution thereafter, be just the content of the > of < version number field in new data structure, further " X64011.1GI:44010 " correspondence be filled into the data space of the neoformation molecular data file data structure < > of version number Related fields.
< database access >=ACCESSION; Illustrate: when running into this key word of ACCESSION, by Context resolution thereafter, be just the content of < database access > field in new data structure, further " X64011.1GI:44010 " correspondence be filled into the data space of neoformation molecular data file data structure < database access > Related fields.
< key word >=KEYWORDS; Illustrate: when running into this key word of KEYWORDS, by Context resolution thereafter, be just the content of < database access > field in new data structure, further by " sod gene; Superoxide dismutase " correspondence is filled into the data space of neoformation molecular data file data structure < key word > Related fields.
The rest may be inferred for the data-mapping relation of all the other fields and map operation.
Implementing the embodiment of the present invention travels through the biomolecular data file being read in calculator memory from the beginning to the end line by line, adopt the data of regular expression identification this document data structural field, according to the corresponding relation of field, realize the map operation of data from the biomolecular data file that reads in to new biomolecular data file.
Regular expression recognition methods is a kind of character match method general in computing machine, for example mate " ACCESSION ", regular expression is exactly that { ACCESSION}, utilizes this regular expression can identify ACCESSION character field and position, further finds the position of the concrete data of this field.
Adopt the file of other format data structure, for example, adopt the data-mapping operation of the file of EMBL data structure, identical with the data-mapping operating process of the file of above-mentioned employing GenBank format data structure, be not repeated herein.
According to embodiment illustrated in fig. 1, during molecular length field in will accessing above-mentioned 2 files, just only need to be to adopting the < sequence length > field value of biomolecular data file of new data structure just passable.Thus, the biomolecular data that is mapped to the biomolecular data file that adopts new data structure for all, can carry out value according to corresponding field, realizes unified operation easily.
Maximum feature embodiment illustrated in fig. 1, be exactly to implement the method all will generate the new data structure of biomolecular data file at every turn, then could use the neoformation molecular data file that has described new data structure to carry the data in the biomolecular data file reading.In order to simplify execution embodiment illustrated in fig. 1 and to improve execution efficiency, set in advance a model database, the new data structure that storage generates first, without all generate new data structure at every turn, step 1 arrives step 4 without all carry out at every turn thus, thereby improves the execution efficiency of the present embodiment.When running into a new biomolecular data file that can be correctly validated, will the data structure with this biomolecular data file in significant field supplement new data structure.In addition, the data structure that also can adopt with the biomolecular data file that this model database storage can be correctly validated, can the data structure that so just can use the data judgement biomolecular data file in described model database to adopt be correctly validated.For the biomolecular data file that can be correctly validated newly increasing, also the number pick structure of its employing to be supplemented and stored model database into, thereby increase the quantity of the biomolecular data file that uses this model database identification.
Above the efficient embodiment of the method for sharing of a kind of biomolecular data provided by the present invention is described in detail, applied specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment is just for helping to understand method of the present invention and core concept thereof; , for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention meanwhile.

Claims (6)

1. a shared method for biomolecular data, is characterized in that:
Select significant field in the data structure of each biomolecular data file;
Significant field in the data structure of acquisition the first biomolecular data file, form field groups, for significant field in the data structure of the second obtaining and later each biomolecular data file, the not identical field of field with field groups is supplemented and entered field groups, until significant field is all selected in the data structure of each biomolecular data file;
According to the logic arrangement of described field expressing information, combine described field, form new set of fields;
By the field in described set, generate the new data structure of biomolecular data file;
Read in biomolecular data file in calculator memory;
Judge that can the data structure that this biomolecular data file adopts be correctly validated, if can not, feedback None-identified information, end operation, otherwise,
Use neoformation molecular data file data structure and the Related fields relation that is read into the biomolecular data file data structure in calculator memory, the data correspondence of data structural field in this document is filled into data space corresponding to neoformation molecular data file data structure respective field.
2. the method for claim 1, is characterized in that: described biomolecular data file is the file of text structure file or database structure.
3. method as claimed in claim 2, it is characterized in that: if described biomolecule file is text structure file, line by line scan and be read into the biomolecular data file in calculator memory, adopt the data of regular expression identification this document data structural field.
4. the method for claim 1, is characterized in that: according to field described in the logic association permutation and combination between field, the field that order is arranged is formerly the basis in rear field.
5. the method for claim 1, it is characterized in that: the data structure storage that the biomolecular data file that can be correctly validated adopts is in model database, and can data structure that use the data judgement biomolecular data file in described model database to adopt be correctly validated.
6. method as claimed in claim 5, it is characterized in that: storage new data structure, if the data structure of the employing of a biomolecular data file that can be correctly validated is stored in model database, in the data structure with this biomolecular data file, significant field is supplemented new data structure.
CN201010288419.3A 2010-09-21 2010-09-21 Efficient sharing method for biomolecular data Active CN102411572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010288419.3A CN102411572B (en) 2010-09-21 2010-09-21 Efficient sharing method for biomolecular data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010288419.3A CN102411572B (en) 2010-09-21 2010-09-21 Efficient sharing method for biomolecular data

Publications (2)

Publication Number Publication Date
CN102411572A CN102411572A (en) 2012-04-11
CN102411572B true CN102411572B (en) 2014-11-05

Family

ID=45913649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010288419.3A Active CN102411572B (en) 2010-09-21 2010-09-21 Efficient sharing method for biomolecular data

Country Status (1)

Country Link
CN (1) CN102411572B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106547879A (en) * 2016-10-26 2017-03-29 安徽扬远信息科技有限公司 It is a kind of to be based on PDM and ERP system integrated approach
CN110825944B (en) * 2019-10-29 2023-06-16 深圳前海环融联易信息科技服务有限公司 Webpage form data acquisition method and device, computer equipment and storage medium
CN120636557A (en) * 2025-08-11 2025-09-12 上海泰楚生物技术有限公司 Macromolecule analysis data sharing management method and system thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1037045A (en) * 1988-04-08 1989-11-08 国际商业机器公司 The system and method that the data item (objelt) of relational database is effectively analyzed
CN1497475A (en) * 2002-09-25 2004-05-19 �Ҵ���˾ System and method for displaying and selecting hierarchical data buse segment and field
CN1701343A (en) * 2002-09-20 2005-11-23 德克萨斯大学董事会 Computer program product, system and method for information discovery and association analysis
CN1781094A (en) * 2003-03-10 2006-05-31 尤尼西斯公司 System and method for storing and accessing data in an interlocking tree data warehouse
CN101216824A (en) * 2007-01-05 2008-07-09 冯卫国 Method for publishing tree -type structure database as distributed XML database

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1037045A (en) * 1988-04-08 1989-11-08 国际商业机器公司 The system and method that the data item (objelt) of relational database is effectively analyzed
CN1701343A (en) * 2002-09-20 2005-11-23 德克萨斯大学董事会 Computer program product, system and method for information discovery and association analysis
CN1497475A (en) * 2002-09-25 2004-05-19 �Ҵ���˾ System and method for displaying and selecting hierarchical data buse segment and field
CN1781094A (en) * 2003-03-10 2006-05-31 尤尼西斯公司 System and method for storing and accessing data in an interlocking tree data warehouse
CN101216824A (en) * 2007-01-05 2008-07-09 冯卫国 Method for publishing tree -type structure database as distributed XML database

Also Published As

Publication number Publication date
CN102411572A (en) 2012-04-11

Similar Documents

Publication Publication Date Title
Ramos et al. Multiomic integration of public oncology databases in bioconductor
Law et al. RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR
Riemondy et al. valr: Reproducible genome interval analysis in R
Canzar et al. Short read mapping: an algorithmic tour
Masseroli et al. Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying
Danek et al. Indexes of large genome collections on a PC
Liu et al. Sequence alignment/map format: a comprehensive review of approaches and applications
EP2759952A1 (en) Efficient genomic read alignment in an in-memory database
Alonso-Alemany et al. Further steps in TANGO: improved taxonomic assignment in metagenomics
Delehelle et al. ASGART: fast and parallel genome scale segmental duplications mapping
Ceri et al. Overview of GeCo: a project for exploring and integrating signals from the genome
CN102411572B (en) Efficient sharing method for biomolecular data
Jalili et al. Indexing next-generation sequencing data
Cunha et al. Identifying maximal perfect haplotype blocks
Liu et al. deBWT: parallel construction of Burrows–Wheeler Transform for large collection of genomes with de Bruijn-branch encoding
CN111125216B (en) Method and device for importing data into Phoenix
Protsyuk et al. Shared bioinformatics databases within the Unipro UGENE platform
Howe et al. Advancing Declarative Query in the Long Tail of Science.
Lieberman et al. Visual exploration across biomedical databases
Aizad et al. Graph data modelling for genomic variants
Hu et al. Design database for quantitative trait loci (QTL) data warehouse, data mining, and meta-analysis
Gunasekara et al. ExactSearch: a web-based plant motif search tool
Si et al. Survey of gene splicing algorithms based on reads
Usmani et al. DNA-based storage of RDF graph data: a futuristic approach to data analytics
US20240079090A1 (en) Visualization of synthetically-modified molecular nucleotide sequences

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
DD01 Delivery of document by public notice

Addressee: Chongqing Nuojing Biological Information Technology Co.,Ltd.

Document name: Notification to Pay the Fees

DD01 Delivery of document by public notice

Addressee: Chongqing Nuojing Biological Information Technology Co.,Ltd.

Document name: Notification of Termination of Patent Right

DD01 Delivery of document by public notice

Addressee: Chongqing Nuojing Biological Information Technology Co.,Ltd.

Document name: Notification to Pay the Fees

Addressee: Chongqing Nuojing Biological Information Technology Co.,Ltd.

Document name: Notification of Decision on Request for Restoration of Right

DD01 Delivery of document by public notice
DD01 Delivery of document by public notice
DD01 Delivery of document by public notice

Addressee: Chongqing Nuojing Biological Information Technology Co.,Ltd.

Document name: Notification to Pay the Fees

DD01 Delivery of document by public notice
DD01 Delivery of document by public notice

Addressee: Chongqing Nuojing Biological Information Technology Co.,Ltd.

Document name: Notification of Termination of Patent Right

DD01 Delivery of document by public notice
DD01 Delivery of document by public notice

Addressee: Chongqing Nuojing Biological Information Technology Co.,Ltd.

Document name: Notification of Passing Examination on Formalities

TR01 Transfer of patent right

Effective date of registration: 20240131

Address after: Building 4, 10th Floor, Photovoltaic Science and Technology Innovation Park, No. 1288 Kanghe Road, Gaozhao Street, Xiuzhou District, Jiaxing City, Zhejiang Province, 314011

Patentee after: JIAXING ANYU BIOTECHNOLOGY CO.,LTD.

Country or region after: China

Address before: 401121 2 # 6 #, 3rd Floor, Zone B, Neptune Technology Building, North New Area, Chongqing

Patentee before: CHONGQING NUOJING BIOLOGICAL INFORMATION TECHNOLOGY Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right