From Divergent LLM Predictions to Reliable Lung Cancer Staging with Ensemble Fusion: CYUT at the NTCIR-18 RadNLP Main Task

Tsz-Yeung Lau; Shih-Hung Wu

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

From Divergent LLM Predictions to Reliable Lung Cancer Staging with Ensemble Fusion: CYUT at the NTCIR-18 RadNLP Main Task

https://doi.org/10.20736/0002002064

名前 / ファイル	ライセンス	アクション
04-NTCIR18-RADNLP-LauT.pdf (1.5 MB)

アイテムタイプ

デフォルトアイテムタイプ（フル）(1)

公開日

2025-06-06

タイトル

From Divergent LLM Predictions to Reliable Lung Cancer Staging with Ensemble Fusion: CYUT at the NTCIR-18 RadNLP Main Task

言語

作成者

Tsz-Yeung Lau
Shih-Hung Wu

内容記述

内容記述タイプ

Abstract

内容記述

This study investigates the application of Large Language Models (LLMs) for automated lung cancer staging based on radiology reports, as part of the CYUT team’s participation in the NTCIR-18 RadNLP Main Task. Through data analysis, we observed a moderate correlation among the T, N, and M staging classes. Experimental results indicated that jointly prompting LLMs to predict all three classes simultaneously yields improved performance. Additionally, standardizing measurement units to millimeters, rather than centimeters, proved to be a more effective strategy. Based on these findings, we refined our prompting methodology and applied it to both LLMs and reasoning-augmented models, including OpenAI’s O-series and DeepSeek-R1. These reasoning-models, enhanced through post-training with Chain-of-Thought (CoT) reasoning, demonstrated superior staging accuracy. As LLMs are generative models, their outputs may vary across different runs, introducing inconsistency in predictions. To mitigate this variability, we adopted an ensemble learning strategy aimed at consolidating divergent LLM outputs into a more stable and reliable lung cancer staging system. Experimental results demonstrate that ensemble methods consistently outperform individual models, enhancing both the robustness and reliability of staging from radiology reports. Our approach achieved second place in the NTCIR-18 RadNLP Main Task (English), underscoring the effectiveness of LLM-based ensemble techniques for TNM classification. The implementation is available at github: anson70242/NTCIR-18-RadNLP-CYUT.

言語

出版者

NII Institutional Repository

言語

日付

2025-06-06

日付タイプ

Issued

言語

eng

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

ID登録

10.20736/0002002064

ID登録タイプ

JaLC

Versions

Ver.1

2025-06-04 08:01:53.729577

Show All versions

Cite as

Other

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

インデックスリンク

インデックスツリー

アイテム

From Divergent LLM Predictions to Reliable Lung Cancer Staging with Ensemble Fusion: CYUT at the NTCIR-18 RadNLP Main Task

× Tsz-Yeung Lau

× Shih-Hung Wu

Versions

Share

Cite as

Other

エクスポート

コミュニティ

メニューを最小化

インデックスリンク

インデックスツリー

アイテム

From Divergent LLM Predictions to Reliable Lung Cancer Staging with Ensemble Fusion: CYUT at the NTCIR-18 RadNLP Main Task

× Tsz-Yeung Lau

× Shih-Hung Wu

Versions

Share

Cite as

Other

エクスポート

コミュニティ