spoken_offline_corpora
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
spoken_offline_corpora [2022/07/27 12:01] – pezik | spoken_offline_corpora [2023/10/08 11:22] (current) – pezik | ||
---|---|---|---|
Line 1: | Line 1: | ||
======PELCRA Spoken Offline Corpora====== | ======PELCRA Spoken Offline Corpora====== | ||
- | PELCRA Spoken Offline Corpora of conversational Polish | + | PELCRA Spoken Offline Corpora of conversational Polish |
- | + | ||
Each corpus consists of speech recordings (in WAV format) and word-by-word transcriptions, | Each corpus consists of speech recordings (in WAV format) and word-by-word transcriptions, | ||
- | Metadata are provided in XML files listing information about the recordings (titles, topics, dates, and URLs), media available (audio, video, pdf), and annotation details (file, date, annotator, place, duration, with additional information about the speakers whenever such data was available). A Document Type Definition specifying the structure of the elements and attributes of an XML document is included in each of the corpora. | + | Metadata are provided in XML files listing information about the recordings (titles, topics, dates, and URLs), media available (audio, video, pdf), and annotation details (file, date, annotator, place, duration, with additional information about the speakers whenever such data was available). A Document Type Definition specifying the structure of the elements and attributes of an XML document is included in each of the corpora. |
- | + | ||
- | Most of these offline corpora are indexed in [[http:// | + | |
- | + | ||
- | All PELCRA spoken offline corpora have altogether about 1 220 400 words. | + | |
- | ^ corpus | + | Most of these offline corpora are indexed in [[http://spokes.clarin-pl.eu/|Spokes]] and [[http://pelcra.clarin-pl.eu/spokes2-web/|Spokes2]]. A subset |
- | | PELCRA_EMO |A corpus | + | |
- | | PELCRA_LUZ |A corpus of open interviews. | 21 | 42 | 213, | + | |
- | | PELCRA_EMI |A corpus of Polish emmigrants to Scotland. | 22 | 44 | 96, | + | |
- | | PELCRA_PARL |Samples of spoken parliamentary data. | 48 | 241 | 99, | + | |
- | | PELCRA_YT_1 |Samples | + | |
- | | PELCRA_YT_2 |Second part of Polish YouTubers' | + | |
- | | MMW_1 |A corpus of Polish conversations recorded in Wrocław in the 1980s. | 14 | 65 | 60, | + | |
- | | MMW_2 |Second part of the conversations recorded in Wrocław in the 1980s. | 14 | 38 | 70, | + | |
- | | MMK |A corpus of Polish conversations recorded in Kraków in the 1980s. | 4 | 11 | 15, | + | |
- | | PELCRA_IDIO |A corpus of open interviews in Polish. | 146 | 148 | 327, | + | |
- | | | **total**| | + | |
+ | ^ corpus | ||
+ | | PELCRA_EMO |A corpus of focused interviews (people reflecting upon their emotions). | 40 | 80 | 252, | ||
+ | | PELCRA_LUZ |A corpus of open interviews. | 21 | 42 | 213, | ||
+ | | PELCRA_EMI |A corpus of Polish emmigrants to Scotland. | 22 | 44 | 96, | ||
+ | | PELCRA_PARL |Samples of spoken parliamentary data. | 48 | 241 | 99, | ||
+ | | PELCRA_YT_1 |Samples of Polish YouTubers' | ||
+ | | PELCRA_YT_2 |Second part of Polish YouTubers' | ||
+ | | MMW_1 |A corpus of Polish conversations recorded in Wrocław in the 1980s. | 14 | 65 | 60,000 | 7:02 | 8:33 | | ||
+ | | MMW_2 |Second part of the conversations recorded in Wrocław in the 1980s. | 14 | 38 | 70,000 | 7:31 | 7:50 | | ||
+ | | MMK |A corpus of Polish conversations recorded in Kraków in the 1980s. | 4 | 11 | 15, | ||
+ | | PELCRA_IDIO |A corpus of open interviews in Polish. | 146 | 148 | 327, | ||
+ | | | **TOTAL**| | ||
spoken_offline_corpora.txt · Last modified: 2023/10/08 11:22 by pezik