User Tools

Site Tools


spoken_offline_corpora

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
spoken_offline_corpora [2023/06/14 10:18] pezikspoken_offline_corpora [2023/07/19 12:56] pezik
Line 7: Line 7:
 Metadata are provided in XML files listing information about the recordings (titles, topics, dates, and URLs), media available (audio, video, pdf), and annotation details (file, date, annotator, place, duration, with additional information about the speakers whenever such data was available). A Document Type Definition specifying the structure of the elements and attributes of an XML document is included in each of the corpora. SQLite database with all the corpora metadata is also available for [[https://uniwersytetlodzki-my.sharepoint.com/:u:/g/personal/pelcra_uni_lodz_pl/Ec0r-4hNLalIrTnsHl4BQQgBJUdhAK7OMmKp28vh7_Ze2w?e=bwxGsp|download.]] Metadata are provided in XML files listing information about the recordings (titles, topics, dates, and URLs), media available (audio, video, pdf), and annotation details (file, date, annotator, place, duration, with additional information about the speakers whenever such data was available). A Document Type Definition specifying the structure of the elements and attributes of an XML document is included in each of the corpora. SQLite database with all the corpora metadata is also available for [[https://uniwersytetlodzki-my.sharepoint.com/:u:/g/personal/pelcra_uni_lodz_pl/Ec0r-4hNLalIrTnsHl4BQQgBJUdhAK7OMmKp28vh7_Ze2w?e=bwxGsp|download.]]
  
-Most of these offline corpora are indexed in [[http://spokes.clarin-pl.eu/|Spokes]] and [[http://pelcra.clarin-pl.eu/spokes2-web/|Spokes2]]. A subset of them can be obtained by  [[https://forms.office.com/e/TMAA36FRwf|filling out this form]]. Once the form is submitted you will get a password necessary to download the corpora from: [[https://uniwersytetlodzki-my.sharepoint.com/:f:/g/personal/pelcra_uni_lodz_pl/ElAHooiU6MFBsyhoNsrWnSgBxxyO3lRmw-6z8waYHSw2BQ|this location]]+Most of these offline corpora are indexed in [[http://spokes.clarin-pl.eu/|Spokes]] and [[http://pelcra.clarin-pl.eu/spokes2-web/|Spokes2]]. A subset of them can be obtained by  [[https://forms.office.com/e/TMAA36FRwf|filling out this form]]. Once the form is submitted you will get a password necessary to download the corpora.
  
 ^  corpus  ^  description  ^  recordings  ^  speakers  ^   word  count    voice activity time  (hh:mm)  ^  total duration  (hh:mm)  ^  corpus  ^  description  ^  recordings  ^  speakers  ^   word  count    voice activity time  (hh:mm)  ^  total duration  (hh:mm) 
spoken_offline_corpora.txt · Last modified: 2023/10/08 11:22 by pezik