SCUNLP-2 at the NTCIR-18 FigArg-2 Task: Apply Repeat-Error-Correction Learning on Text Classification

Tong-Ru Wu; Jheng-Long Wu

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

SCUNLP-2 at the NTCIR-18 FigArg-2 Task: Apply Repeat-Error-Correction Learning on Text Classification

https://doi.org/10.20736/0002002041

名前 / ファイル	ライセンス	アクション
06-NTCIR18-FINARG-WuT.pdf (719.7 KB)

アイテムタイプ

デフォルトアイテムタイプ（フル）(1)

公開日

2025-06-06

タイトル

SCUNLP-2 at the NTCIR-18 FigArg-2 Task: Apply Repeat-Error-Correction Learning on Text Classification

言語

作成者

Tong-Ru Wu
Jheng-Long Wu

内容記述

内容記述タイプ

Abstract

内容記述

Large Language Models (LLMs) have shown promising capabilities for zero-shot text classification, yet they often do not outperform fine-tuned traditional models like BERT when trained on sufficient labeled data. However, acquiring large-scale human-labeled datasets can be challenging, particularly in specialized domains. To address this gap, we propose Repeat-Error-Correction Learning, a framework that iteratively identifies and rewrites misclassified samples to augment the training set. First, we train a base BERT model using available text–label pairs. Next, the trained model infers labels on the same dataset, and we collect the misclassified samples. An LLM, such as GPT-4o-mini, then rewrites these erroneous texts while preserving their original labels. The rewritten texts are reintroduced into the training set, and the model is fine-tuned on this expanded corpus. By iteratively refining the training data through error correction and text rewriting, the proposed method aims to achieve robust classification performance despite limited initial annotations. Our results indicate that fine-tuning the base model by adding rewritten misclassified text achieved the highest validation set Micro-F1 score (77.33%). These findings contribute to a deeper understanding of a cost-friendly and efficient way to generate data for augmenting text classification models.

言語

出版者

NII Institutional Repository

言語

日付

2025-06-06

日付タイプ

Issued

言語

eng

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

ID登録

10.20736/0002002041

ID登録タイプ

JaLC

Versions

Ver.1

2025-06-04 08:01:13.038651

Show All versions

Cite as

Other

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

インデックスリンク

インデックスツリー

アイテム

SCUNLP-2 at the NTCIR-18 FigArg-2 Task: Apply Repeat-Error-Correction Learning on Text Classification

× Tong-Ru Wu

× Jheng-Long Wu

Versions

Share

Cite as

Other

エクスポート

コミュニティ

メニューを最小化

インデックスリンク

インデックスツリー

アイテム

SCUNLP-2 at the NTCIR-18 FigArg-2 Task: Apply Repeat-Error-Correction Learning on Text Classification

× Tong-Ru Wu

× Jheng-Long Wu

Versions

Share

Cite as

Other

エクスポート

コミュニティ