ARCBERT: Largescale Domain Specific Dataset and Pretrained Language Model for AEC Industry

Published: April 02, 2022

Sine early 2019, our group devotes most of our efforts on developing new methods to extract and learn complex domain knowledge from textual documents such as building codes, construction documents. To efficiently extract and transfer prior knowledge hidden in domain documents, we developed the first largescale domain specific corpora and pretrained language model based on BERT, which outperformed traditional methods in various NLP tasks with maximum improvement of 8.1% You can download the dataset, pretrained models and algorithms here for research and exploration purpose. Latest updates of the dataset, pretrained models and algorithms could be found at github page.

If our work is adopted or used in your work, please cite the following articles：

Zheng, Z., Lu, X.Z., Chen, K.Y., Zhou, Y.C., Lin, J.R.* (2022). Pretrained Domain-Specific Language Model for Natural Language Processing Tasks in the AEC Domain. Computers in Industry, 142, 103733.

Share on

Twitter Facebook Google+ LinkedIn

SODA: Site Object Detection dAtaset for Deep Learning in Construction

Published: February 22, 2022

With collaboration with Dr. Yichuan Deng from South China University of Technology, we developed a large-scale image dataset for deep learning based site object detection, the dataset contains images taken from different view angles and ambient lighting conditions, and covers various objects such workers, equipments, materials, etc.

ART(AutoRuleTransform): Opensource Dataset and Algorithms for Automated Rule Interpretation of Building Codes

Published: March 03, 2022

For the digitalization and automated interpretation of regulatory documents, our team developed and opensourced the first large scale dataset in Chinese for automated rule transformation. Corresponding algorithms proposed for rule interpretation are also opensourced. Various types of clauses, including simple clauses, complex clauses with multiple constraints, high-order constraints and implicit properties are considered when developing the dataset and algorithms, which laid a solid foundation for future explorations.

Jia-Rui Lin

ARCBERT: Largescale Domain Specific Dataset and Pretrained Language Model for AEC Industry

Share on

Leave a Comment

You May Also Enjoy

Opensource Tool for AR-based Visualization of Computational Fluid Dynamics(CFD)

Image Dataset for Indoor Fire Load Recognition

SODA: Site Object Detection dAtaset for Deep Learning in Construction

ART(AutoRuleTransform): Opensource Dataset and Algorithms for Automated Rule Interpretation of Building Codes