spoken_offline_corpora
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
spoken_offline_corpora [2023/07/19 13:26] – pezik | spoken_offline_corpora [2023/10/08 11:22] (current) – pezik | ||
---|---|---|---|
Line 5: | Line 5: | ||
Each corpus consists of speech recordings (in WAV format) and word-by-word transcriptions, | Each corpus consists of speech recordings (in WAV format) and word-by-word transcriptions, | ||
- | Metadata are provided in XML files listing information about the recordings (titles, topics, dates, and URLs), media available (audio, video, pdf), and annotation details (file, date, annotator, place, duration, with additional information about the speakers whenever such data was available). A Document Type Definition specifying the structure of the elements and attributes of an XML document is included in each of the corpora. | + | Metadata are provided in XML files listing information about the recordings (titles, topics, dates, and URLs), media available (audio, video, pdf), and annotation details (file, date, annotator, place, duration, with additional information about the speakers whenever such data was available). A Document Type Definition specifying the structure of the elements and attributes of an XML document is included in each of the corpora. |
Most of these offline corpora are indexed in [[http:// | Most of these offline corpora are indexed in [[http:// |
spoken_offline_corpora.txt · Last modified: 2023/10/08 11:22 by pezik