[go: up one dir, main page]

CN115983245B - Method and device for analyzing middle-length text information of building drawing - Google Patents

Method and device for analyzing middle-length text information of building drawing Download PDF

Info

Publication number
CN115983245B
CN115983245B CN202310266783.7A CN202310266783A CN115983245B CN 115983245 B CN115983245 B CN 115983245B CN 202310266783 A CN202310266783 A CN 202310266783A CN 115983245 B CN115983245 B CN 115983245B
Authority
CN
China
Prior art keywords
information
text
analyzing
building drawing
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310266783.7A
Other languages
Chinese (zh)
Other versions
CN115983245A (en
Inventor
李一华
彭飞
周自强
刘玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Shaoke Intelligent Construction Technology Co ltd
Jiangxi Zhongzhi Technology Co ltd
Original Assignee
Jiangxi Shaoke Intelligent Construction Technology Co ltd
Jiangxi Zhongzhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Shaoke Intelligent Construction Technology Co ltd, Jiangxi Zhongzhi Technology Co ltd filed Critical Jiangxi Shaoke Intelligent Construction Technology Co ltd
Priority to CN202310266783.7A priority Critical patent/CN115983245B/en
Publication of CN115983245A publication Critical patent/CN115983245A/en
Application granted granted Critical
Publication of CN115983245B publication Critical patent/CN115983245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a method and a device for analyzing medium-length text information of a building drawing, wherein the method comprises the following steps: acquiring building industry knowledge, writing the building industry knowledge into a configuration file, acquiring a building drawing in a preset format, and analyzing the building drawing to obtain a plurality of text lines; acquiring title level information of the building drawing according to a plurality of text lines and a preset rule, and determining a component extraction range of the building drawing according to the title level information; matching from the component extraction ranges of a plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities to obtain text semantic block information; analyzing the text semantic block information to obtain a corresponding grammar tree, and analyzing and extracting component information and attribute name information in the text semantic block information to convert the long text information into a preset structure. The invention solves the problem of low efficiency in the prior art when analyzing the long text information.

Description

Method and device for analyzing middle-length text information of building drawing
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for analyzing medium-length text information of a building drawing.
Background
With the rapid development of intelligent construction in recent years, informatization technology is widely used in various fields. At present, most existing buildings are designed based on two-dimensional drawings, such as CAD drawings, and structural analysis of long text information in the building drawings is used as an important part of building intellectualization, and is a basic stone in the fields of building intelligent design, intelligent aesthetic drawing, BIM model intelligent construction and the like.
The long text information is an important component of a building drawing, comprises a design description and a chart notes, is a schema of a construction drawing, is a description of drawings, quantity sheets and practice which are not fully expressed, is a description of a problem solving method focused by an examination institution, and is one of information sources for converting the drawing intelligence of the building drawing into a BIM model.
For example, the long text information of the construction drawing of the structural specialty, such as "1, the unnoticed plate thicknesses are all 110mm; the top elevation of the plate is not noted as 8.950m. ", it is necessary to extract" concrete slab-thickness-unnoticed-110 mm "," concrete slab-roof-top-unnoticed-8.950 m "therefrom. And long text information of water supply and drainage professional building drawings, such as '1', the indoor life water supply main pipe adopts a PSP steel-plastic composite pipe for building water supply, and is connected by electromagnetic melting. The water pipe-indoor-connection mode-life water supply system, the dry pipe-electromagnetic melting, the water pipe-indoor-material-life water supply system and the dry pipe-PSP steel-plastic composite pipe are extracted from the water pipe-indoor-connection mode-life water supply system. The long text information is expressed in a natural language form, how to extract five types of entities of components, component conditions, attributes, attribute conditions and attribute values from the long text information, and the entities are structurally analyzed according to each professional knowledge in the building industry to obtain a fixed structure of 'component-component conditions-attribute conditions-attribute values', which is the basis for further developing intelligent construction, intelligent drawing and intelligent design of BIM models, while in the prior art, most of the information of 'component-component conditions-attribute conditions-attribute values' is extracted from the long text information in a manual investigation and screening form, so that the efficiency is low.
Disclosure of Invention
In view of the above, the present invention aims to provide a method and a device for analyzing long text information in a building drawing, which aims to solve the problem of low efficiency in the prior art when analyzing long text information.
The invention is realized in the following way:
a method for analyzing long and medium text information in a building drawing, the method comprising:
collecting construction industry knowledge and writing the construction industry knowledge into a configuration file, wherein the configuration file stores a plurality of component information, attribute information and attribute value information;
acquiring a building drawing in a preset format, and analyzing the building drawing to obtain a plurality of text lines of the building drawing;
acquiring title level information of the building drawing according to the text lines and a preset rule, and determining a component extraction range in long text information of the building drawing according to the title level information;
matching from the component extraction ranges of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities, and obtaining text semantic block information according to the matching results of the plurality of text lines;
analyzing the text semantic block information to obtain a corresponding grammar tree, and analyzing and extracting component information and attribute name information in the text semantic block information by the grammar tree to convert long text information in the building drawing into a preset structure.
Further, in the method for analyzing long text information in a building drawing, the step of obtaining title level information of the building drawing according to the plurality of text lines and a preset rule includes:
detecting whether the initial segment of each text line has a sequence number to distinguish the text from the title, and acquiring a title level according to the consistency of the sequence number format;
and acquiring other title information between adjacent title levels of the same title level, and acquiring the title level information of the building drawing through a recursion strategy.
Further, in the method for analyzing long text information in a building drawing, the step of determining the component extraction range in the long text information in the building drawing according to the title level information includes:
according to the title level information, sequentially obtaining short titles from the title level information according to a rule from big to small;
and determining a target component limiting template which accords with the short title, and storing the mapping relation between the target component and the lower text of the short title to determine the component extraction range of the short title.
Further, in the method for analyzing long text information in a building drawing, the step of matching the plurality of text lines from the component extraction ranges according to the component, the attribute and the attribute value template in the configuration file to obtain the entity and the candidate entity category corresponding to the entity, and obtaining text semantic block information according to the matching results of the plurality of text lines includes:
matching from the component extraction ranges of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities;
deleting shorter matching results according to the principle of taking length and not taking short to obtain a final entity matching result, and sequentially arranging the entity matching results according to the sequence in the text line to obtain text semantic block information.
Further, in the method for analyzing long text information in a building drawing, the step of analyzing the text semantic block information to obtain a corresponding syntax tree includes:
analyzing a long text information organization mode of the building drawing, and writing a lexical grammar analysis rule template supported by a preset tool;
and automatically generating a corresponding lexical grammar analyzer by adopting the preset tool so as to convert the text semantic block information into a corresponding grammar tree.
Further, in the method for analyzing long text information in a building drawing, the step of analyzing and extracting the component information and the attribute name information in the text semantic block information by using the syntax tree to convert the long text information in the building drawing into a preset structure includes:
determining attribution of the attribute values according to the positions of templates corresponding to the attribute values in the configuration file so as to extract component information and attribute name information in the text semantic block information from the grammar tree.
Further, the method for analyzing the middle-length text information of the building drawing, wherein the method further comprises the following steps:
and if a plurality of attributions of the attribute values are detected, determining the attributions of the attribute values according to the positions of templates corresponding to the attribute values in the configuration file and the component range defined by the title level.
Another object of the present invention is to provide a device for analyzing medium-length text information in a construction drawing, the device comprising:
the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring construction industry knowledge and writing the construction industry knowledge into a configuration file, and the configuration file stores a plurality of component information, attribute information and attribute value information;
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a building drawing in a preset format and analyzing the building drawing to obtain a plurality of text lines of the building drawing;
the determining module is used for acquiring the title level information of the building drawing according to the text lines and preset rules, and determining the component extraction range in the long text information of the building drawing according to the title level information;
the matching module is used for matching from the component extraction range of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities, and obtaining text semantic block information according to the matching results of the plurality of text lines;
the analysis module is used for analyzing the text semantic block information to obtain a corresponding grammar tree, and analyzing and extracting component information and attribute name information in the text semantic block information to convert long text information in the building drawing into a preset structure.
Another object of the present invention is to provide a readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the method according to any of the above.
It is a further object of the invention to provide an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which processor implements the steps of any of the methods described above when executing the program.
According to the invention, the configuration file is written through the knowledge of the building industry, the long text in the building drawing is analyzed, the title level information is divided according to the title, the component range and the section affected by the subtitle are determined according to the title, the grammar tree can be generated after the sentence semantic block information is generated, and the fixed structure containing the component information and the attribute is obtained by analyzing the grammar tree to extract the component information and the attribute name information, so that the automatic analysis and the structure conversion of the long text information in the building drawing are realized, and the problem of low efficiency in the process of analyzing the long text information in the prior art is solved.
Drawings
FIG. 1 is a flow chart of a method for analyzing medium-length text information in a building drawing according to a first embodiment of the present invention;
fig. 2 is a block diagram showing a construction of a device for analyzing medium-length text information in a construction drawing according to a second embodiment of the present invention.
The invention will be further described in the following detailed description in conjunction with the above-described figures.
Detailed Description
In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed types.
How the long text information in the construction drawing is analyzed will be described in detail with reference to specific embodiments and drawings.
Example 1
Referring to fig. 1, a method for analyzing medium-length text information in a building drawing according to a first embodiment of the present invention is shown, and the method includes steps S10 to S14.
Step S10, building industry knowledge is collected and written into a configuration file, and a plurality of component information, attribute information and attribute value information are stored in the configuration file.
The building engineering knowledge can be collected, and includes basic knowledge in industry, such as building rules, construction standards of different buildings, etc., the building designer writes the industry knowledge into a configuration file, the configuration file stores a plurality of component information, attribute information and attribute value information, specifically, when the construction is implemented, the configuration file stores component-attribute value in a json tree structure mode, and the hierarchy is: component node-attribute node-value node, more specifically, generic node structure: key is a character string and stores a standard name; pattern is a list storing other expressions of keys; value is a list of dictionaries, storing information of the next hierarchy. The concrete steps are as follows: a single configuration file stores a plurality of component information; the single component information stores a plurality of attribute names contained therein, and the single attribute information stores a plurality of values contained therein; the attribute node specific attributes include a component condition, an attribute condition identifier.
Step S11, a building drawing in a preset format is obtained, and the building drawing is analyzed to obtain a plurality of text lines of the building drawing.
The building drawings are generally stored in a DWG format, namely the building drawings are generally DWG files, the DWG files are analyzed by an objectARX tool, text blocks are sequentially read from top to bottom according to the rows to obtain text rows, and therefore long text information in the building drawings can be disassembled into a plurality of text rows.
Step S12, title level information of the building drawing is obtained according to the text lines and preset rules, and a component extraction range in long text information of the building drawing is determined according to the title level information.
Wherein, according to the professional knowledge of the building industry, the component information extraction range is related to the title, so that firstly, the title level information of the whole long text is obtained, then, the component range is determined according to the title level information, and a certain long text information shown in the following is taken as an example: and 3, pipe material: the life water supply pipe vertical pipe and the life water supply pipe horizontal pipe adopt lining plastic hot dip galvanized steel pipes, the pressure of the lining plastic hot dip galvanized steel pipes is 1.6MPa, the pressure of the lining plastic hot dip galvanized steel pipes is less than DN100, the lining plastic hot dip galvanized steel pipes are connected through DN100 clamping hoops, the pressure of the lining plastic hot dip galvanized steel pipes is greater than or equal to DN100, and the pressure of the lining plastic hot dip galvanized steel pipes is 4: 1) The pipe diameter of the domestic cold and hot water supply pipe is larger than DN50, a full copper gate valve is adopted, the rest adopts a full copper ball valve or a stop valve, the working pressure is equal to that of the pipe, and according to the professional knowledge of the building industry, for example, the title of the pipe and the section under the title range of the pipe only extract the information of the water pipe component; and 4, only extracting the information of the equipment components and not extracting the information of the water pipes in the section under the heading of the valve and the heading range of the valve, so that the redundant result caused by the simultaneous existence of one piece of information in a plurality of components is avoided.
Further, in some optional embodiments of the present invention, the step of obtaining header level information of the building drawing according to the plurality of text lines and a preset rule includes:
detecting whether the initial segment of each text line has a sequence number to distinguish the text from the title, and acquiring a title level according to the consistency of the sequence number format;
and acquiring other title information between adjacent title levels of the same title level, and acquiring the title level information of the building drawing through a recursion strategy.
Specifically, the text and the title are distinguished according to whether the starting fragment of the text line is a sequence number. Acquiring a title level according to the consistency of the title serial number format; if the first title format is "one, …", then find all titles "two, …" and "three, …" etc. of the same format later, take it out as the first level title; then processing the title between the first primary title and the second primary title, and acquiring all secondary titles between the first primary title and the second primary title in the same way; the titles of the third level, fourth level, etc. are recursively acquired. And according to the logic, acquiring the title level information of the whole long text through a recursion strategy.
Still further, in some optional embodiments of the present invention, the step of determining the component extraction range in the long text information of the building drawing according to the header level information includes:
according to the title level information, sequentially obtaining short titles from the title level information according to a rule from big to small;
and determining a target component limiting template which accords with the short title, and storing the mapping relation between the target component and the lower text of the short title to determine the component extraction range of the short title.
The method comprises the steps of sequentially judging and obtaining short titles from large to small according to the title level of a long text, and storing the mapping relation between a 'device' component and a short title '4' lower text if the short title accords with a certain component limiting template, such as '4' valve accords with a limiting template 'valve|accessory|device' of a 'device' component; at the time of subsequent structural analysis, the structural information of the "equipment" component is extracted from only the lower text of the "4. Valve". If the short title does not conform to the component restriction template, the component extraction range of the lower text is not restricted.
And step S13, matching from the component extraction range of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities, and obtaining text semantic block information according to the matching results of the plurality of text lines.
According to pattern templates of all nodes (components, attributes and attribute values) of the configuration file, obtaining entities and candidate entity categories corresponding to the entities from long text matching, and deleting shorter matching results according to the principle of taking length and not taking short; in general, the semantics of long text expressions are more comprehensive than short text. And arranging the final entity matching results in sequence according to the sequence of occurrence in sentences to obtain text semantic block information. Such as sentences: the indoor cold and hot water pipes are all made of thin-wall stainless steel pipes, DN is less than 80 and connected by double clamping pressure, DN is more than or equal to 80, pipe diameters are connected by flanges, and converted semantic block information is "[ indoor/component condition ] [ domestic water supply system/attribute condition ] [ hot water return system/attribute condition ] [ all adopt/irrelevant text ] [ thin-wall stainless steel pipe/attribute value ] [ DN is less than 80/attribute condition ] [ adopt/irrelevant text ] [ double clamping pressure/attribute value ] [ connect/attribute name ] [ irrelevant text ] [ and/irrelevant text ] [ DN is more than or equal to 80 pipe diameters/attribute condition ] [ adopt/irrelevant text ] [ flange/attribute value ]".
Step S14, analyzing the text semantic block information to obtain a corresponding grammar tree, and analyzing and extracting component information and attribute name information in the text semantic block information to convert long text information in the building drawing into a preset structure.
For example, the text semantic block information is converted into a grammar tree (settree (prs [ indoor/component condition ] [ domestic water supply system/attribute condition ] [ hot water supply system/attribute condition ] [ hot water return system/attribute condition ] (pr (pair [ thin-wall stainless steel tube/attribute value ]))) (pr [ dn < 80/attribute condition ] (pair [ double-card pressure/attribute value ] [ connection/attribute name ])) and (pr [ dn more than or equal to 80 pipe diameter/attribute condition ] (pair [ flange/attribute value ] [ connection/attribute name ]))), specifically, a CAD long text information organization mode is analyzed, a lexical grammar analysis rule template supported by an Antlr4 tool is written, and an Antlr4 tool is adopted to automatically generate a lexical grammar analyzer to process sentence semantic block information so as to obtain a corresponding grammar tree.
Further, parsing the syntax tree, taking the long text information as an example, the syntax tree is parsed as follows:
according to the professional knowledge of the building industry, partial component condition inheritance strategies are as follows:
(1) The component condition contained by the prs node is inherited by all prs child nodes;
according to the professional knowledge of the building industry, partial attribute condition inheritance strategies are as follows:
(1) The attribute condition contained in prs is inherited by all pr child nodes;
(2) The attribute condition contained in the first pr node in prs is inherited by the remaining pr nodes;
(3) The attribute condition contained in the pr node acts on the internal pair node of the pr node;
(4) If the attribute conditions are different values of the same attribute, the attribute conditions cannot be inherited;
the attribute condition association relationship of the pair node is obtained as follows:
wherein [ domestic water supply system/attribute condition ] [ hot water supply system/attribute condition ] [ hot water return system/attribute condition ] is different values of the same attribute "system type", and is split into parallel relations. [ dn is more than or equal to 80 pipe diameters/attribute conditions ] [ dn is less than 80 pipe diameters/attribute conditions ] are different values of the same attribute pipe diameter, and are split into parallel relations.
(1) [ indoor/component conditions ] [ domestic feedwater system/attribute conditions ] (pair [ thin-walled stainless steel tube/attribute values ]);
[ indoor/component conditions ] [ Hot Water feed System/Attribute conditions ] (pair [ thin wall stainless Steel tube/Attribute values ]);
[ indoor/component condition ] [ Hot Water Return System/Attribute Condition ] (pair [ thin-walled stainless Steel tube/Attribute value ]).
(2) [ indoor/component conditions ] [ domestic feedwater system/attribute conditions ] (pair [ double-clamp pressure/attribute value ] [ connection/attribute name ]);
[ indoor/component conditions ] [ Hot Water supply System/Attribute conditions ] (pair [ Dual-Card pressure/Attribute value ] [ connection/Attribute name ]);
[ indoor/Member condition ] [ Hot Water return System/Attribute Condition ] (pair [ Dual-Card pressure/Attribute value ] [ connection/Attribute name ]).
(3) [ indoor/component condition ] [ domestic water supply system/attribute condition ] [ dn is more than or equal to 80 pipe diameter/attribute condition ] (pair [ flange/attribute value ] [ connection/attribute name ]);
[ indoor/component condition ] [ hot water supply system/attribute condition ] [ dn is more than or equal to 80 pipe diameter/attribute condition ] (pair [ flange/attribute value ] [ connection/attribute name ]);
[ indoor/component condition ] [ hot water backwater system/attribute condition ] [ dn is greater than or equal to 80 pipe diameter/attribute condition ] (pair [ flange/attribute value ] [ connection/attribute name ]).
(4) [ indoor/component condition ] [ domestic water supply system/attribute condition ] [ dn is more than or equal to 80 pipe diameter/attribute condition ] (pair [ flange/attribute value ] [ connection/attribute name ]);
[ indoor/component conditions ] [ Hot Water feedwater System/Attribute conditions ] (pair [ flange/Attribute values ] [ connection/Attribute names ]);
[ indoor/component condition ] [ hot water backwater system/attribute condition ] [ dn is greater than or equal to 80 pipe diameter/attribute condition ] (pair [ flange/attribute value ] [ connection/attribute name ]).
Further, parsing the syntax tree to extract the component information and the attribute name information in the text semantic block information may be implemented as follows:
determining attribution of the attribute values according to the positions of templates corresponding to the attribute values in the configuration file so as to extract component information and attribute name information in the text semantic block information from the grammar tree;
and if a plurality of attributions of the attribute values are detected, determining the attributions of the attribute values according to the positions of templates corresponding to the attribute values in the configuration file and the component range defined by the title level.
Specifically, according to the position of the template corresponding to the attribute value in the configuration file, determining which attribute value of which component is the attribute value of which attribute, and indirectly deducing the component and the attribute information; if the possibility of a plurality of components or a plurality of attributes exists, the text is ambiguous, and the components are further determined according to the component range defined by the title; if the title does not limit the component range, the information cannot be judged, and the result can be completely output and further judged by intelligent BIM construction or intelligent examination.
Finally, the indoor cold and hot water pipes are all made of thin-wall stainless steel pipes, DN is less than 80 and is connected by double clamping and pressing, DN is more than or equal to 80 and pipe diameters are connected by flanges, and a fixed structure 'component-component condition-attribute condition-attribute value' is generated, so that analysis of long text information is completed.
In summary, the method for analyzing the long text information in the building drawing according to the embodiment of the invention is characterized in that the configuration file is written through the knowledge of the building industry, the long text in the building drawing is analyzed, the title level information is divided according to the title, the component range and the section affected by the subtitle are determined according to the title, the grammar tree can be generated after the sentence semantic block information is generated, and the structure containing the component information and the attribute is obtained by analyzing the grammar tree to extract the component information and the attribute name information, so that the automatic analysis and structure conversion of the long text information in the building drawing are realized, and the problem of low efficiency in the process of analyzing the long text information in the prior art is solved.
Example two
Referring to fig. 2, a device for analyzing long and medium text information in a construction drawing according to a second embodiment of the present invention is shown, where the device includes:
the collection module 100 is configured to collect construction industry knowledge and write the construction industry knowledge into a configuration file, where a plurality of component information, attribute information, and attribute value information are stored in the configuration file;
the acquisition module 200 is used for acquiring a building drawing in a preset format and analyzing the building drawing to obtain a plurality of text lines of the building drawing;
the determining module 300 is configured to obtain header level information of the building drawing according to the plurality of text lines and a preset rule, and determine a component extraction range in long text information of the building drawing according to the header level information;
the matching module 400 is configured to match the component extraction ranges of the plurality of text lines according to the component, the attribute and the attribute value template in the configuration file to obtain an entity and a candidate entity class corresponding to the entity, and obtain text semantic block information according to the matching results of the plurality of text lines;
the analysis module 500 is configured to analyze the text semantic block information to obtain a corresponding syntax tree, and analyze and extract component information and attribute name information in the text semantic block information to convert long text information in the building drawing into a preset structure.
Further, in some optional embodiments of the present invention, the determining module includes:
the detection unit is used for detecting whether the initial segment of each text line has a sequence number so as to distinguish texts from titles, and acquiring a title level according to the consistency of the sequence number format;
and the acquisition unit is used for acquiring other title information between the same title level neighbors and acquiring the title level information of the building drawing through a recursion strategy.
Further, the device for analyzing long and medium text information in a building drawing, wherein the determining module further comprises:
the sorting unit is used for sequentially acquiring short titles from the title level information according to the title level information and the rule from big to small;
and the extraction unit is used for determining a target component limiting template which accords with the short title and storing the mapping relation between the target component and the lower text of the short title so as to determine the component extraction range of the short title.
Further, the device for analyzing long text information in a building drawing, wherein the matching module comprises:
the matching unit is used for matching from the component extraction ranges of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities;
and the arrangement unit is used for deleting the shorter matching result according to the length taking principle and the non-length taking principle to obtain a final entity matching result, and sequentially arranging the entity matching results according to the sequence in the text line to obtain text semantic block information.
Further, the device for analyzing the middle-length text information of the building drawing, wherein the analysis module comprises:
the analysis unit is used for analyzing the long text information organization mode of the building drawing and writing a lexical grammar analysis rule template supported by a preset tool;
and the conversion unit is used for automatically generating a corresponding lexical grammar analyzer by adopting the preset tool so as to convert the text semantic block information into a corresponding grammar tree.
Further, the device for analyzing long and medium text information in a building drawing, wherein the analysis module further comprises:
and the configuration unit is used for determining attribution of the attribute values according to the positions of templates corresponding to the attribute values in the configuration file so as to extract the component information and the attribute name information in the text semantic block information from the grammar tree.
Further, the device for analyzing the middle-length text information of the building drawing, wherein the device further comprises:
and the detection module is used for determining the attribution of the attribute values according to the positions of the templates corresponding to the attribute values in the configuration file and the component range defined by the title level if a plurality of attributions of the attribute values are detected.
Example III
Another aspect of the present invention also provides a readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method described in the first embodiment above.
Example IV
In another aspect, the present invention also provides an electronic device, where the electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the program to implement the steps of the method described in the first embodiment.
The technical features of the above embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the above embodiments are not described, however, they should be considered as the scope of the description of the present specification as long as there is no contradiction between the combinations of the technical features.
Those of skill in the art will appreciate that the logic and/or steps represented in the flow diagrams or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (9)

1. The method for analyzing the medium-length text information of the building drawing is characterized by comprising the following steps of:
collecting construction industry knowledge and writing the construction industry knowledge into a configuration file, wherein the configuration file stores a plurality of component information, attribute information and attribute value information;
acquiring a building drawing in a preset format, and analyzing the building drawing to obtain a plurality of text lines of the building drawing;
acquiring title level information of the building drawing according to the text lines and a preset rule, and determining a component extraction range in long text information of the building drawing according to the title level information;
matching from the component extraction ranges of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities, and obtaining text semantic block information according to the matching results of the plurality of text lines;
analyzing the text semantic block information to obtain a corresponding grammar tree, and analyzing and extracting component information and attribute name information in the text semantic block information by the grammar tree to convert long text information in the building drawing into a preset structure;
the step of matching the plurality of text lines from the component extraction ranges of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain the entity and the candidate entity category corresponding to the entity, and obtaining text semantic block information according to the matching result of the plurality of text lines comprises the following steps:
matching from the component extraction ranges of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities;
deleting shorter matching results according to the principle of taking length and not taking short to obtain a final entity matching result, and sequentially arranging the entity matching results according to the sequence in the text line to obtain text semantic block information.
2. The method for analyzing long and medium text information in a building drawing according to claim 1, wherein the step of obtaining header level information of the building drawing according to a preset rule according to the plurality of text lines comprises:
detecting whether the initial segment of each text line has a sequence number to distinguish the text from the title, and acquiring a title level according to the consistency of the sequence number format;
and acquiring other title information between adjacent title levels of the same title level, and acquiring the title level information of the building drawing through a recursion strategy.
3. The construction drawing middle-length text information analysis method according to claim 1, wherein the step of determining a component extraction range in the construction drawing middle-length text information according to the title level information comprises:
according to the title level information, sequentially obtaining short titles from the title level information according to a rule from big to small;
and determining a target component limiting template which accords with the short title, and storing the mapping relation between the target component and the lower text of the short title to determine the component extraction range of the short title.
4. The method for analyzing long and medium text information in a building drawing according to claim 1, wherein the step of analyzing the text semantic block information to obtain a corresponding syntax tree comprises:
analyzing a long text information organization mode of the building drawing, and writing a lexical grammar analysis rule template supported by a preset tool;
and automatically generating a corresponding lexical grammar analyzer by adopting the preset tool so as to convert the text semantic block information into a corresponding grammar tree.
5. The method for analyzing long and medium text information in a building drawing according to claim 1, wherein the step of analyzing the syntax tree to extract component information and attribute name information in the text semantic block information to convert the long text information in the building drawing into a preset structure comprises the steps of:
determining attribution of the attribute values according to the positions of templates corresponding to the attribute values in the configuration file so as to extract component information and attribute name information in the text semantic block information from the grammar tree.
6. The method for analyzing long and medium text information in a building drawing according to claim 5, wherein the method further comprises:
and if a plurality of attributions of the attribute values are detected, determining the attributions of the attribute values according to the positions of templates corresponding to the attribute values in the configuration file and the component range defined by the title level.
7. A device for analyzing long and medium text information in a building drawing, the device comprising:
the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring construction industry knowledge and writing the construction industry knowledge into a configuration file, and the configuration file stores a plurality of component information, attribute information and attribute value information;
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a building drawing in a preset format and analyzing the building drawing to obtain a plurality of text lines of the building drawing;
the determining module is used for acquiring the title level information of the building drawing according to the text lines and preset rules, and determining the component extraction range in the long text information of the building drawing according to the title level information;
the matching module is used for matching from the component extraction range of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities, and obtaining text semantic block information according to the matching results of the plurality of text lines;
the analysis module is used for analyzing the text semantic block information to obtain a corresponding grammar tree, analyzing the grammar tree and extracting component information and attribute name information in the text semantic block information so as to convert long text information in the building drawing into a preset structure;
the matching module comprises:
the matching unit is used for matching from the component extraction ranges of the plurality of text lines according to the components, the attributes and the attribute value templates in the configuration file to obtain entities and candidate entity categories corresponding to the entities;
and the arrangement unit is used for deleting the shorter matching result according to the length taking principle and the non-length taking principle to obtain a final entity matching result, and sequentially arranging the entity matching results according to the sequence in the text line to obtain text semantic block information.
8. A readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 6.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to any one of claims 1 to 6 when the program is executed.
CN202310266783.7A 2023-03-20 2023-03-20 Method and device for analyzing middle-length text information of building drawing Active CN115983245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310266783.7A CN115983245B (en) 2023-03-20 2023-03-20 Method and device for analyzing middle-length text information of building drawing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310266783.7A CN115983245B (en) 2023-03-20 2023-03-20 Method and device for analyzing middle-length text information of building drawing

Publications (2)

Publication Number Publication Date
CN115983245A CN115983245A (en) 2023-04-18
CN115983245B true CN115983245B (en) 2023-06-06

Family

ID=85970892

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310266783.7A Active CN115983245B (en) 2023-03-20 2023-03-20 Method and device for analyzing middle-length text information of building drawing

Country Status (1)

Country Link
CN (1) CN115983245B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162786A (en) * 2019-04-23 2019-08-23 百度在线网络技术(北京)有限公司 Construct the method, apparatus of configuration file and drawing-out structure information

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7689557B2 (en) * 2005-06-07 2010-03-30 Madan Pandit System and method of textual information analytics
RU2592396C1 (en) * 2015-02-03 2016-07-20 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Method and system for machine extraction and interpretation of text information
US11270105B2 (en) * 2019-09-24 2022-03-08 International Business Machines Corporation Extracting and analyzing information from engineering drawings
CN111159453B (en) * 2019-12-24 2023-06-20 清华大学 Method and device for matching labels and components of CAD drawings
CN112651373B (en) * 2021-01-04 2024-02-09 广联达科技股份有限公司 Method and device for identifying text information of building drawing
CN113886930B (en) * 2021-10-21 2024-04-30 上海品览数据科技有限公司 Automatic generation method of building design description document
CN114462383B (en) * 2022-04-12 2022-07-08 江西少科智能建造科技有限公司 Method, system, storage medium and equipment for obtaining design specification of building drawing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162786A (en) * 2019-04-23 2019-08-23 百度在线网络技术(北京)有限公司 Construct the method, apparatus of configuration file and drawing-out structure information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Text/Graphics Segmentation in Architectural Floor Plans;Sheraz Ahmed等;2011 International Conference on Document Analysis and Recognition;全文 *

Also Published As

Publication number Publication date
CN115983245A (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN111831794A (en) Knowledge map-based construction method for knowledge question-answering system in comprehensive pipe gallery industry
CN108052547A (en) Natural language question-answering method and system based on question sentence and knowledge graph structural analysis
CN101398858B (en) Web service semantic extracting method based on noumenon learning
CN107958068B (en) Language model smoothing method based on entity knowledge base
CN105335487A (en) Agricultural specialist information retrieval system and method on basis of agricultural technology information ontology library
CN110119510A (en) A kind of Relation extraction method and device based on transmitting dependence and structural auxiliary word
CN117909484B (en) Construction method and question answering system of Term-BERT model for construction information query
CN112579444B (en) Automatic analysis modeling method, system, device and medium based on text cognition
CN117573797A (en) Test question retrieval method based on large language model
CN119988600A (en) Coal industry large model retrieval enhanced generation method and system based on knowledge graph
CN102298552A (en) Method for performing source code instrumentation on the basis of code inquiry
CN112732969A (en) Image semantic analysis method and device, storage medium and electronic equipment
Pouliot et al. Exploring schema matching to compare geospatial standards: application to underground utility networks
CN119807447A (en) A file retrieval method, system, product and readable storage medium
CN115983245B (en) Method and device for analyzing middle-length text information of building drawing
CN118194865B (en) Technology development track recognition method based on scientific-technology path multidimensional interaction
CN119807232A (en) Cypher query statement generation optimization method, device and system based on large language model
CN119810846A (en) Intelligent document review and traceability positioning method based on LLM natural language processing
Van Der Haegen Building a Legal Citation Network: The Influence of the Court of Cassation on the Lower Judiciary
CN117875307A (en) Text parsing method and device for intelligent question and answer
CN114462383B (en) Method, system, storage medium and equipment for obtaining design specification of building drawing
CN111178771B (en) System construction method and device
CN112346711A (en) A programming specification knowledge graph construction system and method for semantic recognition
CN115438644B (en) Method, storage medium and system for similarity analysis of informatization projects
CN120561658B (en) A method and system for automatically constructing a mechanism model based on a large model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant