TY - JOUR
T1 - MeetMulti-X: A Benchmark Analysis of Scaling and Prompting Large Language Models on Automatic Minuting
AU - Sood, Ashima
AU - Singh, Muskaan
AU - Gardiner, Bryan
AU - Condell, Joan
N1 - © 2025 The Author(s). Published by Elsevier Ltd.
PY - 2025/11/29
Y1 - 2025/11/29
N2 - The task of automatic minuting, i.e., capturing all the points from transcripts of multi-party meetings, presents considerable challenges owing to the spontaneous and complex nature of discussions. As organisations increasingly depend on meetings for decision-making, the need for efficient and optimised minuting has intensified, underscoring the shortcomings of manual note-taking due to cognitive overload and diverted participant engagement. This study systematically analyses the impact of scaling Large Language Models (LLMs) on automatic minuting, emphasising key factors including pretrained dataset size, model size, context length, and prompt length. The benchmark evaluation includes both quantitative and qualitative with 19 open-source models (from 77M to 70B parameters) and 4 closed-source models (over 1T parameters) across 4 meeting corpora and prompts. Our findings indicate that (1) models with less than 8B parameters offer a favorable trade-off between performance and efficiency, achieving comparable results to their larger counterparts. (2) scaling pretrained data size improves performance up to a threshold, beyond which gains diminish. (3) context length exhibits a non-linear effect, with optimal performance around 8K–16K tokens. (4) longer prompts consistently degrade output quality, highlighting the need for concise and well-structured prompting. To the best of our knowledge this is the first work, exploring scaling LLMS on automatic minuting.
Code is available at https://anonymous.4open.science/r/MeetMultiX-9A36
AB - The task of automatic minuting, i.e., capturing all the points from transcripts of multi-party meetings, presents considerable challenges owing to the spontaneous and complex nature of discussions. As organisations increasingly depend on meetings for decision-making, the need for efficient and optimised minuting has intensified, underscoring the shortcomings of manual note-taking due to cognitive overload and diverted participant engagement. This study systematically analyses the impact of scaling Large Language Models (LLMs) on automatic minuting, emphasising key factors including pretrained dataset size, model size, context length, and prompt length. The benchmark evaluation includes both quantitative and qualitative with 19 open-source models (from 77M to 70B parameters) and 4 closed-source models (over 1T parameters) across 4 meeting corpora and prompts. Our findings indicate that (1) models with less than 8B parameters offer a favorable trade-off between performance and efficiency, achieving comparable results to their larger counterparts. (2) scaling pretrained data size improves performance up to a threshold, beyond which gains diminish. (3) context length exhibits a non-linear effect, with optimal performance around 8K–16K tokens. (4) longer prompts consistently degrade output quality, highlighting the need for concise and well-structured prompting. To the best of our knowledge this is the first work, exploring scaling LLMS on automatic minuting.
Code is available at https://anonymous.4open.science/r/MeetMultiX-9A36
KW - Automatic minuting
KW - Meeting summarisation
KW - Scaling laws
KW - Large language models
U2 - 10.1016/j.eswa.2025.130428
DO - 10.1016/j.eswa.2025.130428
M3 - Article
SN - 0957-4174
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 130428
ER -