Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |
pllumic [2025/04/24 13:46] – kkaczynski | pllumic [2025/04/25 08:25] (current) – kkaczynski |
---|
====Description==== | ====Description==== |
| |
We release the first representative subset of the PLLuM instruction corpus (PLLuMIC), which we believe to be useful in guiding and planning the development of similar datasets for other LLMs. It is a hand-crafted set of LLM fine-tuning instructions in Polish language, curated according to structured typology and thematic categorisation. It is an integral part of the upcoming scientific article "The PLLuM Instruction Corpus". The research was funded by the Polish Ministry of Digital Affairs in 2024, grant num. 1/WI/DBiI/2023. We plan to continue with the research and extend the dataset in future releases. | We release the first representative subset of the PLLuM Instruction Corpus (PLLuMIC), which we believe to be useful in guiding and planning the development of similar LLM datasets. PLLuMIC is a hand-crafted set of LLM fine-tuning Polish language instructions, developed in line with the annotation guidelines and covering a functional typology. The corpus is described in more detail in a forthcoming paper titled //The PLLuM Instruction Corpus// (Pęzik et al. 2025). We plan regular updates and significant extensions of the corpus. |
| |
---- | ---- |