CN115983245B - Method and device for analyzing middle-length text information of building drawing - Google Patents
Method and device for analyzing middle-length text information of building drawing Download PDFInfo
- Publication number
- CN115983245B CN115983245B CN202310266783.7A CN202310266783A CN115983245B CN 115983245 B CN115983245 B CN 115983245B CN 202310266783 A CN202310266783 A CN 202310266783A CN 115983245 B CN115983245 B CN 115983245B
- Authority
- CN
- China
- Prior art keywords
- information
- text
- analyzing
- building drawing
- attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000000605 extraction Methods 0.000 claims abstract description 30
- 238000010276 construction Methods 0.000 claims description 25
- 238000004458 analytical method Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 5
- 230000008520 organization Effects 0.000 claims description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 34
- 239000010935 stainless steel Substances 0.000 description 7
- 229910001220 stainless steel Inorganic materials 0.000 description 7
- 229910001335 Galvanized steel Inorganic materials 0.000 description 6
- 239000008397 galvanized steel Substances 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 229910052802 copper Inorganic materials 0.000 description 2
- 239000010949 copper Substances 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000002844 melting Methods 0.000 description 2
- 230000008018 melting Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012916 structural analysis Methods 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a method and a device for analyzing medium-length text information of a building drawing, wherein the method comprises the following steps: acquiring building industry knowledge, writing the building industry knowledge into a configuration file, acquiring a building drawing in a preset format, and analyzing the building drawing to obtain a plurality of text lines; acquiring title level information of the building drawing according to a plurality of text lines and a preset rule, and determining a component extraction range of the building drawing according to the title level information; matching from the component extraction ranges of a plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities to obtain text semantic block information; analyzing the text semantic block information to obtain a corresponding grammar tree, and analyzing and extracting component information and attribute name information in the text semantic block information to convert the long text information into a preset structure. The invention solves the problem of low efficiency in the prior art when analyzing the long text information.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for analyzing medium-length text information of a building drawing.
Background
With the rapid development of intelligent construction in recent years, informatization technology is widely used in various fields. At present, most existing buildings are designed based on two-dimensional drawings, such as CAD drawings, and structural analysis of long text information in the building drawings is used as an important part of building intellectualization, and is a basic stone in the fields of building intelligent design, intelligent aesthetic drawing, BIM model intelligent construction and the like.
The long text information is an important component of a building drawing, comprises a design description and a chart notes, is a schema of a construction drawing, is a description of drawings, quantity sheets and practice which are not fully expressed, is a description of a problem solving method focused by an examination institution, and is one of information sources for converting the drawing intelligence of the building drawing into a BIM model.
For example, the long text information of the construction drawing of the structural specialty, such as "1, the unnoticed plate thicknesses are all 110mm; the top elevation of the plate is not noted as 8.950m. ", it is necessary to extract" concrete slab-thickness-unnoticed-110 mm "," concrete slab-roof-top-unnoticed-8.950 m "therefrom. And long text information of water supply and drainage professional building drawings, such as '1', the indoor life water supply main pipe adopts a PSP steel-plastic composite pipe for building water supply, and is connected by electromagnetic melting. The water pipe-indoor-connection mode-life water supply system, the dry pipe-electromagnetic melting, the water pipe-indoor-material-life water supply system and the dry pipe-PSP steel-plastic composite pipe are extracted from the water pipe-indoor-connection mode-life water supply system. The long text information is expressed in a natural language form, how to extract five types of entities of components, component conditions, attributes, attribute conditions and attribute values from the long text information, and the entities are structurally analyzed according to each professional knowledge in the building industry to obtain a fixed structure of 'component-component conditions-attribute conditions-attribute values', which is the basis for further developing intelligent construction, intelligent drawing and intelligent design of BIM models, while in the prior art, most of the information of 'component-component conditions-attribute conditions-attribute values' is extracted from the long text information in a manual investigation and screening form, so that the efficiency is low.
Disclosure of Invention
In view of the above, the present invention aims to provide a method and a device for analyzing long text information in a building drawing, which aims to solve the problem of low efficiency in the prior art when analyzing long text information.
The invention is realized in the following way:
a method for analyzing long and medium text information in a building drawing, the method comprising:
collecting construction industry knowledge and writing the construction industry knowledge into a configuration file, wherein the configuration file stores a plurality of component information, attribute information and attribute value information;
acquiring a building drawing in a preset format, and analyzing the building drawing to obtain a plurality of text lines of the building drawing;
acquiring title level information of the building drawing according to the text lines and a preset rule, and determining a component extraction range in long text information of the building drawing according to the title level information;
matching from the component extraction ranges of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities, and obtaining text semantic block information according to the matching results of the plurality of text lines;
analyzing the text semantic block information to obtain a corresponding grammar tree, and analyzing and extracting component information and attribute name information in the text semantic block information by the grammar tree to convert long text information in the building drawing into a preset structure.
Further, in the method for analyzing long text information in a building drawing, the step of obtaining title level information of the building drawing according to the plurality of text lines and a preset rule includes:
detecting whether the initial segment of each text line has a sequence number to distinguish the text from the title, and acquiring a title level according to the consistency of the sequence number format;
and acquiring other title information between adjacent title levels of the same title level, and acquiring the title level information of the building drawing through a recursion strategy.
Further, in the method for analyzing long text information in a building drawing, the step of determining the component extraction range in the long text information in the building drawing according to the title level information includes:
according to the title level information, sequentially obtaining short titles from the title level information according to a rule from big to small;
and determining a target component limiting template which accords with the short title, and storing the mapping relation between the target component and the lower text of the short title to determine the component extraction range of the short title.
Further, in the method for analyzing long text information in a building drawing, the step of matching the plurality of text lines from the component extraction ranges according to the component, the attribute and the attribute value template in the configuration file to obtain the entity and the candidate entity category corresponding to the entity, and obtaining text semantic block information according to the matching results of the plurality of text lines includes:
matching from the component extraction ranges of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities;
deleting shorter matching results according to the principle of taking length and not taking short to obtain a final entity matching result, and sequentially arranging the entity matching results according to the sequence in the text line to obtain text semantic block information.
Further, in the method for analyzing long text information in a building drawing, the step of analyzing the text semantic block information to obtain a corresponding syntax tree includes:
analyzing a long text information organization mode of the building drawing, and writing a lexical grammar analysis rule template supported by a preset tool;
and automatically generating a corresponding lexical grammar analyzer by adopting the preset tool so as to convert the text semantic block information into a corresponding grammar tree.
Further, in the method for analyzing long text information in a building drawing, the step of analyzing and extracting the component information and the attribute name information in the text semantic block information by using the syntax tree to convert the long text information in the building drawing into a preset structure includes:
determining attribution of the attribute values according to the positions of templates corresponding to the attribute values in the configuration file so as to extract component information and attribute name information in the text semantic block information from the grammar tree.
Further, the method for analyzing the middle-length text information of the building drawing, wherein the method further comprises the following steps:
and if a plurality of attributions of the attribute values are detected, determining the attributions of the attribute values according to the positions of templates corresponding to the attribute values in the configuration file and the component range defined by the title level.
Another object of the present invention is to provide a device for analyzing medium-length text information in a construction drawing, the device comprising:
the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring construction industry knowledge and writing the construction industry knowledge into a configuration file, and the configuration file stores a plurality of component information, attribute information and attribute value information;
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a building drawing in a preset format and analyzing the building drawing to obtain a plurality of text lines of the building drawing;
the determining module is used for acquiring the title level information of the building drawing according to the text lines and preset rules, and determining the component extraction range in the long text information of the building drawing according to the title level information;
the matching module is used for matching from the component extraction range of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities, and obtaining text semantic block information according to the matching results of the plurality of text lines;
the analysis module is used for analyzing the text semantic block information to obtain a corresponding grammar tree, and analyzing and extracting component information and attribute name information in the text semantic block information to convert long text information in the building drawing into a preset structure.
Another object of the present invention is to provide a readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the method according to any of the above.
It is a further object of the invention to provide an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which processor implements the steps of any of the methods described above when executing the program.
According to the invention, the configuration file is written through the knowledge of the building industry, the long text in the building drawing is analyzed, the title level information is divided according to the title, the component range and the section affected by the subtitle are determined according to the title, the grammar tree can be generated after the sentence semantic block information is generated, and the fixed structure containing the component information and the attribute is obtained by analyzing the grammar tree to extract the component information and the attribute name information, so that the automatic analysis and the structure conversion of the long text information in the building drawing are realized, and the problem of low efficiency in the process of analyzing the long text information in the prior art is solved.
Drawings
FIG. 1 is a flow chart of a method for analyzing medium-length text information in a building drawing according to a first embodiment of the present invention;
fig. 2 is a block diagram showing a construction of a device for analyzing medium-length text information in a construction drawing according to a second embodiment of the present invention.
The invention will be further described in the following detailed description in conjunction with the above-described figures.
Detailed Description
In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed types.
How the long text information in the construction drawing is analyzed will be described in detail with reference to specific embodiments and drawings.
Example 1
Referring to fig. 1, a method for analyzing medium-length text information in a building drawing according to a first embodiment of the present invention is shown, and the method includes steps S10 to S14.
Step S10, building industry knowledge is collected and written into a configuration file, and a plurality of component information, attribute information and attribute value information are stored in the configuration file.
The building engineering knowledge can be collected, and includes basic knowledge in industry, such as building rules, construction standards of different buildings, etc., the building designer writes the industry knowledge into a configuration file, the configuration file stores a plurality of component information, attribute information and attribute value information, specifically, when the construction is implemented, the configuration file stores component-attribute value in a json tree structure mode, and the hierarchy is: component node-attribute node-value node, more specifically, generic node structure: key is a character string and stores a standard name; pattern is a list storing other expressions of keys; value is a list of dictionaries, storing information of the next hierarchy. The concrete steps are as follows: a single configuration file stores a plurality of component information; the single component information stores a plurality of attribute names contained therein, and the single attribute information stores a plurality of values contained therein; the attribute node specific attributes include a component condition, an attribute condition identifier.
Step S11, a building drawing in a preset format is obtained, and the building drawing is analyzed to obtain a plurality of text lines of the building drawing.
The building drawings are generally stored in a DWG format, namely the building drawings are generally DWG files, the DWG files are analyzed by an objectARX tool, text blocks are sequentially read from top to bottom according to the rows to obtain text rows, and therefore long text information in the building drawings can be disassembled into a plurality of text rows.
Step S12, title level information of the building drawing is obtained according to the text lines and preset rules, and a component extraction range in long text information of the building drawing is determined according to the title level information.
Wherein, according to the professional knowledge of the building industry, the component information extraction range is related to the title, so that firstly, the title level information of the whole long text is obtained, then, the component range is determined according to the title level information, and a certain long text information shown in the following is taken as an example: and 3, pipe material: the life water supply pipe vertical pipe and the life water supply pipe horizontal pipe adopt lining plastic hot dip galvanized steel pipes, the pressure of the lining plastic hot dip galvanized steel pipes is 1.6MPa, the pressure of the lining plastic hot dip galvanized steel pipes is less than DN100, the lining plastic hot dip galvanized steel pipes are connected through DN100 clamping hoops, the pressure of the lining plastic hot dip galvanized steel pipes is greater than or equal to DN100, and the pressure of the lining plastic hot dip galvanized steel pipes is 4: 1) The pipe diameter of the domestic cold and hot water supply pipe is larger than DN50, a full copper gate valve is adopted, the rest adopts a full copper ball valve or a stop valve, the working pressure is equal to that of the pipe, and according to the professional knowledge of the building industry, for example, the title of the pipe and the section under the title range of the pipe only extract the information of the water pipe component; and 4, only extracting the information of the equipment components and not extracting the information of the water pipes in the section under the heading of the valve and the heading range of the valve, so that the redundant result caused by the simultaneous existence of one piece of information in a plurality of components is avoided.
Further, in some optional embodiments of the present invention, the step of obtaining header level information of the building drawing according to the plurality of text lines and a preset rule includes:
detecting whether the initial segment of each text line has a sequence number to distinguish the text from the title, and acquiring a title level according to the consistency of the sequence number format;
and acquiring other title information between adjacent title levels of the same title level, and acquiring the title level information of the building drawing through a recursion strategy.
Specifically, the text and the title are distinguished according to whether the starting fragment of the text line is a sequence number. Acquiring a title level according to the consistency of the title serial number format; if the first title format is "one, …", then find all titles "two, …" and "three, …" etc. of the same format later, take it out as the first level title; then processing the title between the first primary title and the second primary title, and acquiring all secondary titles between the first primary title and the second primary title in the same way; the titles of the third level, fourth level, etc. are recursively acquired. And according to the logic, acquiring the title level information of the whole long text through a recursion strategy.
Still further, in some optional embodiments of the present invention, the step of determining the component extraction range in the long text information of the building drawing according to the header level information includes:
according to the title level information, sequentially obtaining short titles from the title level information according to a rule from big to small;
and determining a target component limiting template which accords with the short title, and storing the mapping relation between the target component and the lower text of the short title to determine the component extraction range of the short title.
The method comprises the steps of sequentially judging and obtaining short titles from large to small according to the title level of a long text, and storing the mapping relation between a 'device' component and a short title '4' lower text if the short title accords with a certain component limiting template, such as '4' valve accords with a limiting template 'valve|accessory|device' of a 'device' component; at the time of subsequent structural analysis, the structural information of the "equipment" component is extracted from only the lower text of the "4. Valve". If the short title does not conform to the component restriction template, the component extraction range of the lower text is not restricted.
And step S13, matching from the component extraction range of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities, and obtaining text semantic block information according to the matching results of the plurality of text lines.
According to pattern templates of all nodes (components, attributes and attribute values) of the configuration file, obtaining entities and candidate entity categories corresponding to the entities from long text matching, and deleting shorter matching results according to the principle of taking length and not taking short; in general, the semantics of long text expressions are more comprehensive than short text. And arranging the final entity matching results in sequence according to the sequence of occurrence in sentences to obtain text semantic block information. Such as sentences: the indoor cold and hot water pipes are all made of thin-wall stainless steel pipes, DN is less than 80 and connected by double clamping pressure, DN is more than or equal to 80, pipe diameters are connected by flanges, and converted semantic block information is "[ indoor/component condition ] [ domestic water supply system/attribute condition ] [ hot water return system/attribute condition ] [ all adopt/irrelevant text ] [ thin-wall stainless steel pipe/attribute value ] [ DN is less than 80/attribute condition ] [ adopt/irrelevant text ] [ double clamping pressure/attribute value ] [ connect/attribute name ] [ irrelevant text ] [ and/irrelevant text ] [ DN is more than or equal to 80 pipe diameters/attribute condition ] [ adopt/irrelevant text ] [ flange/attribute value ]".
Step S14, analyzing the text semantic block information to obtain a corresponding grammar tree, and analyzing and extracting component information and attribute name information in the text semantic block information to convert long text information in the building drawing into a preset structure.
For example, the text semantic block information is converted into a grammar tree (settree (prs [ indoor/component condition ] [ domestic water supply system/attribute condition ] [ hot water supply system/attribute condition ] [ hot water return system/attribute condition ] (pr (pair [ thin-wall stainless steel tube/attribute value ]))) (pr [ dn < 80/attribute condition ] (pair [ double-card pressure/attribute value ] [ connection/attribute name ])) and (pr [ dn more than or equal to 80 pipe diameter/attribute condition ] (pair [ flange/attribute value ] [ connection/attribute name ]))), specifically, a CAD long text information organization mode is analyzed, a lexical grammar analysis rule template supported by an Antlr4 tool is written, and an Antlr4 tool is adopted to automatically generate a lexical grammar analyzer to process sentence semantic block information so as to obtain a corresponding grammar tree.
Further, parsing the syntax tree, taking the long text information as an example, the syntax tree is parsed as follows:
according to the professional knowledge of the building industry, partial component condition inheritance strategies are as follows:
(1) The component condition contained by the prs node is inherited by all prs child nodes;
according to the professional knowledge of the building industry, partial attribute condition inheritance strategies are as follows:
(1) The attribute condition contained in prs is inherited by all pr child nodes;
(2) The attribute condition contained in the first pr node in prs is inherited by the remaining pr nodes;
(3) The attribute condition contained in the pr node acts on the internal pair node of the pr node;
(4) If the attribute conditions are different values of the same attribute, the attribute conditions cannot be inherited;
the attribute condition association relationship of the pair node is obtained as follows:
wherein [ domestic water supply system/attribute condition ] [ hot water supply system/attribute condition ] [ hot water return system/attribute condition ] is different values of the same attribute "system type", and is split into parallel relations. [ dn is more than or equal to 80 pipe diameters/attribute conditions ] [ dn is less than 80 pipe diameters/attribute conditions ] are different values of the same attribute pipe diameter, and are split into parallel relations.
(1) [ indoor/component conditions ] [ domestic feedwater system/attribute conditions ] (pair [ thin-walled stainless steel tube/attribute values ]);
[ indoor/component conditions ] [ Hot Water feed System/Attribute conditions ] (pair [ thin wall stainless Steel tube/Attribute values ]);
[ indoor/component condition ] [ Hot Water Return System/Attribute Condition ] (pair [ thin-walled stainless Steel tube/Attribute value ]).
(2) [ indoor/component conditions ] [ domestic feedwater system/attribute conditions ] (pair [ double-clamp pressure/attribute value ] [ connection/attribute name ]);
[ indoor/component conditions ] [ Hot Water supply System/Attribute conditions ] (pair [ Dual-Card pressure/Attribute value ] [ connection/Attribute name ]);
[ indoor/Member condition ] [ Hot Water return System/Attribute Condition ] (pair [ Dual-Card pressure/Attribute value ] [ connection/Attribute name ]).
(3) [ indoor/component condition ] [ domestic water supply system/attribute condition ] [ dn is more than or equal to 80 pipe diameter/attribute condition ] (pair [ flange/attribute value ] [ connection/attribute name ]);
[ indoor/component condition ] [ hot water supply system/attribute condition ] [ dn is more than or equal to 80 pipe diameter/attribute condition ] (pair [ flange/attribute value ] [ connection/attribute name ]);
[ indoor/component condition ] [ hot water backwater system/attribute condition ] [ dn is greater than or equal to 80 pipe diameter/attribute condition ] (pair [ flange/attribute value ] [ connection/attribute name ]).
(4) [ indoor/component condition ] [ domestic water supply system/attribute condition ] [ dn is more than or equal to 80 pipe diameter/attribute condition ] (pair [ flange/attribute value ] [ connection/attribute name ]);
[ indoor/component conditions ] [ Hot Water feedwater System/Attribute conditions ] (pair [ flange/Attribute values ] [ connection/Attribute names ]);
[ indoor/component condition ] [ hot water backwater system/attribute condition ] [ dn is greater than or equal to 80 pipe diameter/attribute condition ] (pair [ flange/attribute value ] [ connection/attribute name ]).
Further, parsing the syntax tree to extract the component information and the attribute name information in the text semantic block information may be implemented as follows:
determining attribution of the attribute values according to the positions of templates corresponding to the attribute values in the configuration file so as to extract component information and attribute name information in the text semantic block information from the grammar tree;
and if a plurality of attributions of the attribute values are detected, determining the attributions of the attribute values according to the positions of templates corresponding to the attribute values in the configuration file and the component range defined by the title level.
Specifically, according to the position of the template corresponding to the attribute value in the configuration file, determining which attribute value of which component is the attribute value of which attribute, and indirectly deducing the component and the attribute information; if the possibility of a plurality of components or a plurality of attributes exists, the text is ambiguous, and the components are further determined according to the component range defined by the title; if the title does not limit the component range, the information cannot be judged, and the result can be completely output and further judged by intelligent BIM construction or intelligent examination.
Finally, the indoor cold and hot water pipes are all made of thin-wall stainless steel pipes, DN is less than 80 and is connected by double clamping and pressing, DN is more than or equal to 80 and pipe diameters are connected by flanges, and a fixed structure 'component-component condition-attribute condition-attribute value' is generated, so that analysis of long text information is completed.
In summary, the method for analyzing the long text information in the building drawing according to the embodiment of the invention is characterized in that the configuration file is written through the knowledge of the building industry, the long text in the building drawing is analyzed, the title level information is divided according to the title, the component range and the section affected by the subtitle are determined according to the title, the grammar tree can be generated after the sentence semantic block information is generated, and the structure containing the component information and the attribute is obtained by analyzing the grammar tree to extract the component information and the attribute name information, so that the automatic analysis and structure conversion of the long text information in the building drawing are realized, and the problem of low efficiency in the process of analyzing the long text information in the prior art is solved.
Example two
Referring to fig. 2, a device for analyzing long and medium text information in a construction drawing according to a second embodiment of the present invention is shown, where the device includes:
the collection module 100 is configured to collect construction industry knowledge and write the construction industry knowledge into a configuration file, where a plurality of component information, attribute information, and attribute value information are stored in the configuration file;
the acquisition module 200 is used for acquiring a building drawing in a preset format and analyzing the building drawing to obtain a plurality of text lines of the building drawing;
the determining module 300 is configured to obtain header level information of the building drawing according to the plurality of text lines and a preset rule, and determine a component extraction range in long text information of the building drawing according to the header level information;
the matching module 400 is configured to match the component extraction ranges of the plurality of text lines according to the component, the attribute and the attribute value template in the configuration file to obtain an entity and a candidate entity class corresponding to the entity, and obtain text semantic block information according to the matching results of the plurality of text lines;
the analysis module 500 is configured to analyze the text semantic block information to obtain a corresponding syntax tree, and analyze and extract component information and attribute name information in the text semantic block information to convert long text information in the building drawing into a preset structure.
Further, in some optional embodiments of the present invention, the determining module includes:
the detection unit is used for detecting whether the initial segment of each text line has a sequence number so as to distinguish texts from titles, and acquiring a title level according to the consistency of the sequence number format;
and the acquisition unit is used for acquiring other title information between the same title level neighbors and acquiring the title level information of the building drawing through a recursion strategy.
Further, the device for analyzing long and medium text information in a building drawing, wherein the determining module further comprises:
the sorting unit is used for sequentially acquiring short titles from the title level information according to the title level information and the rule from big to small;
and the extraction unit is used for determining a target component limiting template which accords with the short title and storing the mapping relation between the target component and the lower text of the short title so as to determine the component extraction range of the short title.
Further, the device for analyzing long text information in a building drawing, wherein the matching module comprises:
the matching unit is used for matching from the component extraction ranges of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities;
and the arrangement unit is used for deleting the shorter matching result according to the length taking principle and the non-length taking principle to obtain a final entity matching result, and sequentially arranging the entity matching results according to the sequence in the text line to obtain text semantic block information.
Further, the device for analyzing the middle-length text information of the building drawing, wherein the analysis module comprises:
the analysis unit is used for analyzing the long text information organization mode of the building drawing and writing a lexical grammar analysis rule template supported by a preset tool;
and the conversion unit is used for automatically generating a corresponding lexical grammar analyzer by adopting the preset tool so as to convert the text semantic block information into a corresponding grammar tree.
Further, the device for analyzing long and medium text information in a building drawing, wherein the analysis module further comprises:
and the configuration unit is used for determining attribution of the attribute values according to the positions of templates corresponding to the attribute values in the configuration file so as to extract the component information and the attribute name information in the text semantic block information from the grammar tree.
Further, the device for analyzing the middle-length text information of the building drawing, wherein the device further comprises:
and the detection module is used for determining the attribution of the attribute values according to the positions of the templates corresponding to the attribute values in the configuration file and the component range defined by the title level if a plurality of attributions of the attribute values are detected.
Example III
Another aspect of the present invention also provides a readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method described in the first embodiment above.
Example IV
In another aspect, the present invention also provides an electronic device, where the electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the program to implement the steps of the method described in the first embodiment.
The technical features of the above embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the above embodiments are not described, however, they should be considered as the scope of the description of the present specification as long as there is no contradiction between the combinations of the technical features.
Those of skill in the art will appreciate that the logic and/or steps represented in the flow diagrams or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
Claims (9)
1. The method for analyzing the medium-length text information of the building drawing is characterized by comprising the following steps of:
collecting construction industry knowledge and writing the construction industry knowledge into a configuration file, wherein the configuration file stores a plurality of component information, attribute information and attribute value information;
acquiring a building drawing in a preset format, and analyzing the building drawing to obtain a plurality of text lines of the building drawing;
acquiring title level information of the building drawing according to the text lines and a preset rule, and determining a component extraction range in long text information of the building drawing according to the title level information;
matching from the component extraction ranges of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities, and obtaining text semantic block information according to the matching results of the plurality of text lines;
analyzing the text semantic block information to obtain a corresponding grammar tree, and analyzing and extracting component information and attribute name information in the text semantic block information by the grammar tree to convert long text information in the building drawing into a preset structure;
the step of matching the plurality of text lines from the component extraction ranges of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain the entity and the candidate entity category corresponding to the entity, and obtaining text semantic block information according to the matching result of the plurality of text lines comprises the following steps:
matching from the component extraction ranges of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities;
deleting shorter matching results according to the principle of taking length and not taking short to obtain a final entity matching result, and sequentially arranging the entity matching results according to the sequence in the text line to obtain text semantic block information.
2. The method for analyzing long and medium text information in a building drawing according to claim 1, wherein the step of obtaining header level information of the building drawing according to a preset rule according to the plurality of text lines comprises:
detecting whether the initial segment of each text line has a sequence number to distinguish the text from the title, and acquiring a title level according to the consistency of the sequence number format;
and acquiring other title information between adjacent title levels of the same title level, and acquiring the title level information of the building drawing through a recursion strategy.
3. The construction drawing middle-length text information analysis method according to claim 1, wherein the step of determining a component extraction range in the construction drawing middle-length text information according to the title level information comprises:
according to the title level information, sequentially obtaining short titles from the title level information according to a rule from big to small;
and determining a target component limiting template which accords with the short title, and storing the mapping relation between the target component and the lower text of the short title to determine the component extraction range of the short title.
4. The method for analyzing long and medium text information in a building drawing according to claim 1, wherein the step of analyzing the text semantic block information to obtain a corresponding syntax tree comprises:
analyzing a long text information organization mode of the building drawing, and writing a lexical grammar analysis rule template supported by a preset tool;
and automatically generating a corresponding lexical grammar analyzer by adopting the preset tool so as to convert the text semantic block information into a corresponding grammar tree.
5. The method for analyzing long and medium text information in a building drawing according to claim 1, wherein the step of analyzing the syntax tree to extract component information and attribute name information in the text semantic block information to convert the long text information in the building drawing into a preset structure comprises the steps of:
determining attribution of the attribute values according to the positions of templates corresponding to the attribute values in the configuration file so as to extract component information and attribute name information in the text semantic block information from the grammar tree.
6. The method for analyzing long and medium text information in a building drawing according to claim 5, wherein the method further comprises:
and if a plurality of attributions of the attribute values are detected, determining the attributions of the attribute values according to the positions of templates corresponding to the attribute values in the configuration file and the component range defined by the title level.
7. A device for analyzing long and medium text information in a building drawing, the device comprising:
the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring construction industry knowledge and writing the construction industry knowledge into a configuration file, and the configuration file stores a plurality of component information, attribute information and attribute value information;
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a building drawing in a preset format and analyzing the building drawing to obtain a plurality of text lines of the building drawing;
the determining module is used for acquiring the title level information of the building drawing according to the text lines and preset rules, and determining the component extraction range in the long text information of the building drawing according to the title level information;
the matching module is used for matching from the component extraction range of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities, and obtaining text semantic block information according to the matching results of the plurality of text lines;
the analysis module is used for analyzing the text semantic block information to obtain a corresponding grammar tree, analyzing the grammar tree and extracting component information and attribute name information in the text semantic block information so as to convert long text information in the building drawing into a preset structure;
the matching module comprises:
the matching unit is used for matching from the component extraction ranges of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities;
and the arrangement unit is used for deleting the shorter matching result according to the length taking principle and the non-length taking principle to obtain a final entity matching result, and sequentially arranging the entity matching results according to the sequence in the text line to obtain text semantic block information.
8. A readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 6.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to any one of claims 1 to 6 when the program is executed.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310266783.7A CN115983245B (en) | 2023-03-20 | 2023-03-20 | Method and device for analyzing middle-length text information of building drawing |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310266783.7A CN115983245B (en) | 2023-03-20 | 2023-03-20 | Method and device for analyzing middle-length text information of building drawing |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN115983245A CN115983245A (en) | 2023-04-18 |
| CN115983245B true CN115983245B (en) | 2023-06-06 |
Family
ID=85970892
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310266783.7A Active CN115983245B (en) | 2023-03-20 | 2023-03-20 | Method and device for analyzing middle-length text information of building drawing |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115983245B (en) |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110162786A (en) * | 2019-04-23 | 2019-08-23 | 百度在线网络技术(北京)有限公司 | Construct the method, apparatus of configuration file and drawing-out structure information |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7689557B2 (en) * | 2005-06-07 | 2010-03-30 | Madan Pandit | System and method of textual information analytics |
| RU2592396C1 (en) * | 2015-02-03 | 2016-07-20 | Общество с ограниченной ответственностью "Аби ИнфоПоиск" | Method and system for machine extraction and interpretation of text information |
| US11270105B2 (en) * | 2019-09-24 | 2022-03-08 | International Business Machines Corporation | Extracting and analyzing information from engineering drawings |
| CN111159453B (en) * | 2019-12-24 | 2023-06-20 | 清华大学 | Method and device for matching labels and components of CAD drawings |
| CN112651373B (en) * | 2021-01-04 | 2024-02-09 | 广联达科技股份有限公司 | Method and device for identifying text information of building drawing |
| CN113886930B (en) * | 2021-10-21 | 2024-04-30 | 上海品览数据科技有限公司 | Automatic generation method of building design description document |
| CN114462383B (en) * | 2022-04-12 | 2022-07-08 | 江西少科智能建造科技有限公司 | Method, system, storage medium and equipment for obtaining design specification of building drawing |
-
2023
- 2023-03-20 CN CN202310266783.7A patent/CN115983245B/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110162786A (en) * | 2019-04-23 | 2019-08-23 | 百度在线网络技术(北京)有限公司 | Construct the method, apparatus of configuration file and drawing-out structure information |
Non-Patent Citations (1)
| Title |
|---|
| Text/Graphics Segmentation in Architectural Floor Plans;Sheraz Ahmed等;2011 International Conference on Document Analysis and Recognition;全文 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN115983245A (en) | 2023-04-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111831794A (en) | Knowledge map-based construction method for knowledge question-answering system in comprehensive pipe gallery industry | |
| CN108052547A (en) | Natural language question-answering method and system based on question sentence and knowledge graph structural analysis | |
| CN101398858B (en) | Web service semantic extracting method based on noumenon learning | |
| CN107958068B (en) | Language model smoothing method based on entity knowledge base | |
| CN105335487A (en) | Agricultural specialist information retrieval system and method on basis of agricultural technology information ontology library | |
| CN110119510A (en) | A kind of Relation extraction method and device based on transmitting dependence and structural auxiliary word | |
| CN117909484B (en) | Construction method and question answering system of Term-BERT model for construction information query | |
| CN112579444B (en) | Automatic analysis modeling method, system, device and medium based on text cognition | |
| CN117573797A (en) | Test question retrieval method based on large language model | |
| CN119988600A (en) | Coal industry large model retrieval enhanced generation method and system based on knowledge graph | |
| CN102298552A (en) | Method for performing source code instrumentation on the basis of code inquiry | |
| CN112732969A (en) | Image semantic analysis method and device, storage medium and electronic equipment | |
| Pouliot et al. | Exploring schema matching to compare geospatial standards: application to underground utility networks | |
| CN119807447A (en) | A file retrieval method, system, product and readable storage medium | |
| CN115983245B (en) | Method and device for analyzing middle-length text information of building drawing | |
| CN118194865B (en) | Technology development track recognition method based on scientific-technology path multidimensional interaction | |
| CN119807232A (en) | Cypher query statement generation optimization method, device and system based on large language model | |
| CN119810846A (en) | Intelligent document review and traceability positioning method based on LLM natural language processing | |
| Van Der Haegen | Building a Legal Citation Network: The Influence of the Court of Cassation on the Lower Judiciary | |
| CN117875307A (en) | Text parsing method and device for intelligent question and answer | |
| CN114462383B (en) | Method, system, storage medium and equipment for obtaining design specification of building drawing | |
| CN111178771B (en) | System construction method and device | |
| CN112346711A (en) | A programming specification knowledge graph construction system and method for semantic recognition | |
| CN115438644B (en) | Method, storage medium and system for similarity analysis of informatization projects | |
| CN120561658B (en) | A method and system for automatically constructing a mechanism model based on a large model |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |