PACIS 2021 Proceedings


Media is loading

Paper Type


Paper Number



BERT has attained state-of-the-art performance for extractive overview tasks on the CNN/Daily-Mail dataset. We discuss a few variants of the BERT model and articulate a novel approach to regulate fine-tuning at the sentence-level in pre-trained embeddings. This paper focuses on solving the extractive text summarization task with the help of the BERTSUM model. For better performance, the authors strive to improve BERTSUM in three directions: First is using different summarization layers after BERT (classifier or transformer). The second is not using the final layer's output as the summarizer input but the output of the penultimate or anti-penultimate layer and, finally, freezing the first three BERT layers when fine-tuning the model, thereby allowing the model to verify in the initial layers the absence of catastrophic forgetting. Our proposed, BERTSUM+Classifier and BERTSUM Penultimate+Transformer Models outperform all baselines w.r.t ROUGE-1, ROUGE-2, and ROUGE-L F1 scores.



When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.