spoken_offline_corpora
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
spoken_offline_corpora [2023/06/13 17:13] – pezik | spoken_offline_corpora [2023/10/08 11:22] (current) – pezik | ||
---|---|---|---|
Line 1: | Line 1: | ||
======PELCRA Spoken Offline Corpora====== | ======PELCRA Spoken Offline Corpora====== | ||
- | PELCRA Spoken Offline Corpora of conversational Polish | + | PELCRA Spoken Offline Corpora of conversational Polish |
- | + | ||
Each corpus consists of speech recordings (in WAV format) and word-by-word transcriptions, | Each corpus consists of speech recordings (in WAV format) and word-by-word transcriptions, | ||
- | Metadata are provided in XML files listing information about the recordings (titles, topics, dates, and URLs), media available (audio, video, pdf), and annotation details (file, date, annotator, place, duration, with additional information about the speakers whenever such data was available). A Document Type Definition specifying the structure of the elements and attributes of an XML document is included in each of the corpora. | + | Metadata are provided in XML files listing information about the recordings (titles, topics, dates, and URLs), media available (audio, video, pdf), and annotation details (file, date, annotator, place, duration, with additional information about the speakers whenever such data was available). A Document Type Definition specifying the structure of the elements and attributes of an XML document is included in each of the corpora. |
- | Most of these offline corpora are indexed in [[http:// | + | Most of these offline corpora are indexed in [[http:// |
- | All PELCRA spoken offline corpora have altogether about 1 220 400 words. | + | ^ corpus |
- | + | ||
- | + | ||
- | ^ corpus | + | |
| PELCRA_EMO |A corpus of focused interviews (people reflecting upon their emotions). | 40 | 80 | 252, | | PELCRA_EMO |A corpus of focused interviews (people reflecting upon their emotions). | 40 | 80 | 252, | ||
| PELCRA_LUZ |A corpus of open interviews. | 21 | 42 | 213, | | PELCRA_LUZ |A corpus of open interviews. | 21 | 42 | 213, | ||
Line 18: | Line 15: | ||
| PELCRA_PARL |Samples of spoken parliamentary data. | 48 | 241 | 99, | | PELCRA_PARL |Samples of spoken parliamentary data. | 48 | 241 | 99, | ||
| PELCRA_YT_1 |Samples of Polish YouTubers' | | PELCRA_YT_1 |Samples of Polish YouTubers' | ||
- | | PELCRA_YT_2 |Second part of Polish YouTubers' | + | | PELCRA_YT_2 |Second part of Polish YouTubers' |
- | | MMW_2 |Second part of the conversations recorded in Wrocław in the 1980s. | 14 | 38 | 70, | + | | MMW_1 |A corpus of Polish conversations recorded in Wrocław in the 1980s. | 14 | 65 | 60, |
- | | MMK |A corpus of Polish conversations recorded in Kraków in the 1980s. | 4 | 11 | 15, | + | | MMW_2 |Second part of the conversations recorded in Wrocław in the 1980s. | 14 | 38 | 70, |
- | | PELCRA_IDIO |A corpus of open interviews in Polish. | 146 | 148 | 327, | + | | MMK |A corpus of Polish conversations recorded in Kraków in the 1980s. | 4 | 11 | 15, |
- | | | **total**| 357 | 820 | 1, | + | | PELCRA_IDIO |A corpus of open interviews in Polish. | 146 | 148 | 327, |
+ | | | **TOTAL**| 357 | 820 | 1, | ||
spoken_offline_corpora.1686669189.txt.gz · Last modified: 2023/06/13 17:13 by pezik