UPxSocio at NTCIR-18 MedNLP-CHAT Task: Similarity-Based Few-Shot Example Selection for Prompt-Based Detection

Michael Van Supranes; Martin Augustine Borlongan; Joseph Ryan Lansangan; Genelyn Ma. Sarte; Shaowen Peng; Shoko Wakamiya; Eiji Aramaki

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

UPxSocio at NTCIR-18 MedNLP-CHAT Task: Similarity-Based Few-Shot Example Selection for Prompt-Based Detection

https://doi.org/10.20736/0002002055

名前 / ファイル	ライセンス	アクション
05-NTCIR18-MEDNLP-SupranesM.pdf (1.1 MB)

アイテムタイプ

デフォルトアイテムタイプ（フル）(1)

公開日

2025-06-06

タイトル

UPxSocio at NTCIR-18 MedNLP-CHAT Task: Similarity-Based Few-Shot Example Selection for Prompt-Based Detection

言語

作成者

Michael Van Supranes
Martin Augustine Borlongan
Joseph Ryan Lansangan
Genelyn Ma. Sarte
Shaowen Peng
Shoko Wakamiya
Eiji Aramaki

内容記述

内容記述タイプ

Abstract

内容記述

This paper presents our submission to the MedNLP-CHAT Task at NTCIR-18, which focuses on detecting medical, ethical, and legal risks in chatbot-generated responses. We propose a two-step prompt-based classification framework using the Gemini-1.5-flash model. The method first generates support statements to guide reasoning, which are then integrated into a few-shot prompt for final classification. We evaluated our approach on the English versions of the Japanese and German subtasks, submitting two systems per subtask that varied in example selection strategy and label distribution. Our systems achieved strong performance in detecting medical risks—particularly in the German subtask—while ethical and legal risks were more challenging. To better understand the design factors influencing performance, we conducted ablation studies across 24 prompt variants. Logistic regression and CHAID analyses revealed that accuracy depends on complex interactions between subtask language, example similarity, actual label, and selection method. Higher similarity improves classification of risk-present cases but harms performance on risk-absent cases, indicating a trade-off between recall and false positives. The $k$-nearest method was more effective under high similarity, while $k$-spread offered balanced results across classes. Although the two-step prompting strategy did not show a statistically significant advantage overall, the best-performing configuration used five support statements, with diminishing gains beyond that. Our findings suggest that optimized prompt design, particularly with controlled support and example selection, can improve risk detection without requiring large-scale training or high computational resources.

言語

出版者

NII Institutional Repository

言語

日付

2025-06-06

日付タイプ

Issued

言語

eng

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

ID登録

10.20736/0002002055

ID登録タイプ

JaLC

Versions

Ver.1

2025-06-04 08:01:38.973975

Show All versions

Cite as

Other

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

インデックスリンク

インデックスツリー

アイテム

UPxSocio at NTCIR-18 MedNLP-CHAT Task: Similarity-Based Few-Shot Example Selection for Prompt-Based Detection

× Michael Van Supranes

× Martin Augustine Borlongan

× Joseph Ryan Lansangan

× Genelyn Ma. Sarte

× Shaowen Peng

× Shoko Wakamiya

× Eiji Aramaki

Versions

Share

Cite as

Other

エクスポート

コミュニティ

メニューを最小化

インデックスリンク

インデックスツリー

アイテム

UPxSocio at NTCIR-18 MedNLP-CHAT Task: Similarity-Based Few-Shot Example Selection for Prompt-Based Detection

× Michael Van Supranes

× Martin Augustine Borlongan

× Joseph Ryan Lansangan

× Genelyn Ma. Sarte

× Shaowen Peng

× Shoko Wakamiya

× Eiji Aramaki

Versions

Share

Cite as

Other

エクスポート

コミュニティ