|
内容記述 |
This study investigates the application of Large Language Models (LLMs) for automated lung cancer staging based on radiology reports, as part of the CYUT team’s participation in the NTCIR-18 RadNLP Main Task. Through data analysis, we observed a moderate correlation among the T, N, and M staging classes. Experimental results indicated that jointly prompting LLMs to predict all three classes simultaneously yields improved performance. Additionally, standardizing measurement units to millimeters, rather than centimeters, proved to be a more effective strategy. Based on these findings, we refined our prompting methodology and applied it to both LLMs and reasoning-augmented models, including OpenAI’s O-series and DeepSeek-R1. These reasoning-models, enhanced through post-training with Chain-of-Thought (CoT) reasoning, demonstrated superior staging accuracy. As LLMs are generative models, their outputs may vary across different runs, introducing inconsistency in predictions. To mitigate this variability, we adopted an ensemble learning strategy aimed at consolidating divergent LLM outputs into a more stable and reliable lung cancer staging system. Experimental results demonstrate that ensemble methods consistently outperform individual models, enhancing both the robustness and reliability of staging from radiology reports. Our approach achieved second place in the NTCIR-18 RadNLP Main Task (English), underscoring the effectiveness of LLM-based ensemble techniques for TNM classification. The implementation is available at github: anson70242/NTCIR-18-RadNLP-CYUT. |