ISLab at the NTCIR-18 AEOLLM: An Evaluator for Machine-Generated Text based on Data Augmentation and ORPO

Chia-Hui Lin; Cen-Chieh Chen; Tao-Hsing Chang; Fu-Yuan Hsu

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

ISLab at the NTCIR-18 AEOLLM: An Evaluator for Machine-Generated Text based on Data Augmentation and ORPO

https://doi.org/10.20736/0002002028

名前 / ファイル	ライセンス	アクション
04-NTCIR18-AEOLLM-LinC.pdf (772.4 KB)

アイテムタイプ

デフォルトアイテムタイプ（フル）(1)

公開日

2025-06-06

タイトル

ISLab at the NTCIR-18 AEOLLM: An Evaluator for Machine-Generated Text based on Data Augmentation and ORPO

言語

作成者

Chia-Hui Lin
Cen-Chieh Chen
Tao-Hsing Chang
Fu-Yuan Hsu

内容記述

内容記述タイプ

Abstract

内容記述

In recent years, large language models (LLMs) have been widely applied to various natural language processing (NLP) tasks, demonstrating exceptional performance. To evaluate the output quality of these LLMs, numerous studies utilize one LLM as an evaluator to assess the quality of outputs from other LLMs, showing promising results on public benchmarks. However, the performance of LLMs as evaluators on many unpublished benchmarks still needs improvement. To achieve better evaluation performance, some studies have attempted to fine-tune evaluators based on large amounts of data, incurring significant manual costs and posing substantial limitations in practical applications. Therefore, this paper leverages data augmentation to increase the volume of training data and employs the odds ratio preference optimization (ORPO) algorithm for reinforcement learning to optimize the evaluator. This study uses the dataset provided by NTCIR-18’s Automatic Evaluation of LLMs (AEOLLM) task for training and testing. The proposed method achieves an accuracy of 0.7658 on the summary generation subtask of AEOLLM, the highest among all compared models. Additionally, it yields the second-highest performance in both Kendall’s tau and Spearman correlation coefficient on the summary generation and text expansion subtasks among all compared models.

言語

出版者

NII Institutional Repository

言語

日付

2025-06-06

日付タイプ

Issued

言語

eng

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

ID登録

10.20736/0002002028

ID登録タイプ

JaLC

Versions

Ver.1

2025-06-04 08:00:46.603446

Show All versions

Cite as

Other

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

インデックスリンク

インデックスツリー

アイテム

ISLab at the NTCIR-18 AEOLLM: An Evaluator for Machine-Generated Text based on Data Augmentation and ORPO

× Chia-Hui Lin

× Cen-Chieh Chen

× Tao-Hsing Chang

× Fu-Yuan Hsu

Versions

Share

Cite as

Other

エクスポート

コミュニティ

メニューを最小化

インデックスリンク

インデックスツリー

アイテム

ISLab at the NTCIR-18 AEOLLM: An Evaluator for Machine-Generated Text based on Data Augmentation and ORPO

× Chia-Hui Lin

× Cen-Chieh Chen

× Tao-Hsing Chang

× Fu-Yuan Hsu

Versions

Share

Cite as

Other

エクスポート

コミュニティ