spoken_offline_corpora
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
spoken_offline_corpora [2023/06/13 17:17] – pezik | spoken_offline_corpora [2023/10/08 11:22] (current) – pezik | ||
---|---|---|---|
Line 1: | Line 1: | ||
======PELCRA Spoken Offline Corpora====== | ======PELCRA Spoken Offline Corpora====== | ||
- | PELCRA Spoken Offline Corpora of conversational Polish | + | PELCRA Spoken Offline Corpora of conversational Polish |
- | + | ||
Each corpus consists of speech recordings (in WAV format) and word-by-word transcriptions, | Each corpus consists of speech recordings (in WAV format) and word-by-word transcriptions, | ||
- | Metadata are provided in XML files listing information about the recordings (titles, topics, dates, and URLs), media available (audio, video, pdf), and annotation details (file, date, annotator, place, duration, with additional information about the speakers whenever such data was available). A Document Type Definition specifying the structure of the elements and attributes of an XML document is included in each of the corpora. | + | Metadata are provided in XML files listing information about the recordings (titles, topics, dates, and URLs), media available (audio, video, pdf), and annotation details (file, date, annotator, place, duration, with additional information about the speakers whenever such data was available). A Document Type Definition specifying the structure of the elements and attributes of an XML document is included in each of the corpora. |
- | Most of these offline corpora are indexed in [[http:// | + | Most of these offline corpora are indexed in [[http:// |
^ corpus | ^ corpus | ||
Line 16: | Line 16: | ||
| PELCRA_YT_1 |Samples of Polish YouTubers' | | PELCRA_YT_1 |Samples of Polish YouTubers' | ||
| PELCRA_YT_2 |Second part of Polish YouTubers' | | PELCRA_YT_2 |Second part of Polish YouTubers' | ||
- | | MMW_1 |A corpus of Polish conversations recorded in Wrocław in the 1980s. | 14 | 65 | 60,000 | | | + | | MMW_1 |A corpus of Polish conversations recorded in Wrocław in the 1980s. | 14 | 65 | 60,000 | 7:02 | 8: |
- | | MMW_2 |Second part of the conversations recorded in Wrocław in the 1980s. | 14 | 38 | 70,000 | | | + | | MMW_2 |Second part of the conversations recorded in Wrocław in the 1980s. | 14 | 38 | 70,000 | 7:31 | 7: |
| MMK |A corpus of Polish conversations recorded in Kraków in the 1980s. | 4 | 11 | 15, | | MMK |A corpus of Polish conversations recorded in Kraków in the 1980s. | 4 | 11 | 15, | ||
| PELCRA_IDIO |A corpus of open interviews in Polish. | 146 | 148 | 327, | | PELCRA_IDIO |A corpus of open interviews in Polish. | 146 | 148 | 327, | ||
- | + | | | **TOTAL**| 357 | 820 | 1, | |
- | | | **total**| 357 | 820 | 1, | + | |
spoken_offline_corpora.1686669474.txt.gz · Last modified: 2023/06/13 17:17 by pezik