spoken_offline_corpora
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
spoken_offline_corpora [2023/06/13 17:20] – pezik | spoken_offline_corpora [2023/07/19 13:26] – pezik | ||
---|---|---|---|
Line 1: | Line 1: | ||
======PELCRA Spoken Offline Corpora====== | ======PELCRA Spoken Offline Corpora====== | ||
- | PELCRA Spoken Offline Corpora of conversational Polish | + | PELCRA Spoken Offline Corpora of conversational Polish |
- | + | ||
Each corpus consists of speech recordings (in WAV format) and word-by-word transcriptions, | Each corpus consists of speech recordings (in WAV format) and word-by-word transcriptions, | ||
Metadata are provided in XML files listing information about the recordings (titles, topics, dates, and URLs), media available (audio, video, pdf), and annotation details (file, date, annotator, place, duration, with additional information about the speakers whenever such data was available). A Document Type Definition specifying the structure of the elements and attributes of an XML document is included in each of the corpora. SQLite database with all the corpora metadata is also available for [[https:// | Metadata are provided in XML files listing information about the recordings (titles, topics, dates, and URLs), media available (audio, video, pdf), and annotation details (file, date, annotator, place, duration, with additional information about the speakers whenever such data was available). A Document Type Definition specifying the structure of the elements and attributes of an XML document is included in each of the corpora. SQLite database with all the corpora metadata is also available for [[https:// | ||
- | Most of these offline corpora are indexed in [[http:// | + | Most of these offline corpora are indexed in [[http:// |
^ corpus | ^ corpus | ||
Line 20: | Line 20: | ||
| MMK |A corpus of Polish conversations recorded in Kraków in the 1980s. | 4 | 11 | 15, | | MMK |A corpus of Polish conversations recorded in Kraków in the 1980s. | 4 | 11 | 15, | ||
| PELCRA_IDIO |A corpus of open interviews in Polish. | 146 | 148 | 327, | | PELCRA_IDIO |A corpus of open interviews in Polish. | 146 | 148 | 327, | ||
- | + | | | **TOTAL**| 357 | 820 | 1, | |
- | | | **total**| 357 | 820 | 1, | + | |
spoken_offline_corpora.txt · Last modified: 2023/10/08 11:22 by pezik