Rule-based Information Extraction for Mechanical-Electrical-Plumbing-Specific Semantic Web

Automation in Construction, 2022

Recommended citation: Wu, L.T, Lin, J.R., Leng, S., Li, J.L., Hu, Z.Z. (2022). Rule-based Information Extraction for Mechanical-Electrical-Plumbing-Specific Semantic Web. Automation in Construction, 135, 104108. doi: 10.1016/j.autcon.2021.104108 http://doi.org/10.1016/j.autcon.2021.104108 cited by count

Abstract

Information extraction (IE), which aims to retrieve meaningful information from plain text, has been widely studied in general and professional domains to support downstream applications. However, due to the lack of labeled data and the complexity of professional mechanical, electrical and plumbing (MEP) information, it is challenging to apply current common deep learning IE methods to the MEP domain. To solve this problem, this paper proposes a rule-based approach for MEP IE task, including a “snowball” strategy to collect large-scale MEP corpora, a suffix-based matching algorithm on text segments for named entity recognition (NER), and a dependency-path-based matching algorithm on dependency tree for relationship extraction (RE). 2 ideas called “meta linking” and “path filtering” for RE are proposed as well, to discover the out-of-pattern entities/relationships as many as possible. To verify the feasibility of the proposed approach, 65 MB MEP corpora have been collected as input of the proposed approach and an MEP semantic web which consists of 15,978 entities and 65,110 relationship triples established, with an accuracy of 81% to entities and 75% to relationship triples, respectively. A comparison experiment between classical deep learning models and the proposed rule-based approach was carried out, illustrating that the performance of our method is 37% and 49% better than the selected deep learning NER and RE models, respectively, in the aspect of extraction precision.

Download paper here

Download preprint here

This research was supported by the National Natural Science Foundation of China (No. 51778336, No. 72091512), and the Tsinghua University – Glodon Joint Research Center for Building Information Modeling.

Financial Sources:

2017.10-2019.12: Spatial Data Analytics for BIM-based Facility Management

2018.1-2021.12: Research on information-driven multi-scale performance simulation and analysis technologies for existing buildings

2019.10-2020.12: Automatic Checking of BIM-based Design

2021.1-2025.12: Resilience Assessment and Management of City Infrastructures

Share on

Twitter Facebook Google+ LinkedIn

Jia-Rui Lin

Rule-based Information Extraction for Mechanical-Electrical-Plumbing-Specific Semantic Web

Abstract

Financial Sources:

2017.10-2019.12: Spatial Data Analytics for BIM-based Facility Management

2018.1-2021.12: Research on information-driven multi-scale performance simulation and analysis technologies for existing buildings

2019.10-2020.12: Automatic Checking of BIM-based Design

2021.1-2025.12: Resilience Assessment and Management of City Infrastructures

Share on

Leave a Comment

You May Also Enjoy

A Natural‐Language‐Based Approach to Intelligent Data Retrieval and Representation for Cloud BIM

Automatic MEP Knowledge Acquisition Based on Documents and Natural Language Processing

Linking Data Model and Formula to Automate KPI Calculation for Building Performance Benchmarking

Knowledge Extraction and Discovery Based on BIM: A Critical Review and Future Directions