User Tools

Site Tools


paralela

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
paralela [2018/03/12 13:11] – [Bilingual query] mdeckertparalela [2020/10/17 12:32] (current) pezik
Line 7: Line 7:
 Paralela is to be quoted by referencing the following paper: Paralela is to be quoted by referencing the following paper:
  
-[[http://rownolegle.blog.ils.uw.edu.pl/files/2016/03/04_P%C4%99zik.pdf|Pęzik, Piotr. "Exploring Phraseological Equivalence with Paralela." In Polish-Language Parallel Corpora, edited by Ewa Gruszczyńska and Agnieszka Leńko-Szymańska, 67–81. Warsaw: Instytut Lingwistyki Stosowanej UW, 2016.]]+[[https://depot.ceon.pl/handle/123456789/13396|Pęzik, Piotr. "Exploring Phraseological Equivalence with Paralela." In Polish-Language Parallel Corpora, edited by Ewa Gruszczyńska and Agnieszka Leńko-Szymańska, 67–81. Warsaw: Instytut Lingwistyki Stosowanej UW, 2016.]]
  
 Here is a BibTeX record: Here is a BibTeX record:
Line 67: Line 67:
  
  
-Queries of this kind can include a sequence of terms (positions in the query). A series of words is put in query box and the results will show occurrences of the whole sequence, e.g.: +Queries of this kind can include a sequence of terms (positions in the query). A series of words is put in the query box and the results will show occurrences of the whole sequence, e.g.: 
 ;#; ;#;
 ''[[http://paralela.clarin-pl.eu/#search/pl/-1/popular%20with/-1/0/20/0/true/0/true/-1/-1/-1/-1/source|popular with]]'' ''[[http://paralela.clarin-pl.eu/#search/pl/-1/popular%20with/-1/0/20/0/true/0/true/-1/-1/-1/-1/source|popular with]]''
Line 250: Line 250:
  
  
-The following query can be specified to find sequences of a verb, followed by and adjective and followed by any form of the lemma "discovery" with up to two word tokens in between:+The following query can be specified to find sequences of a verb, followed by an adjective and followed by any form of the lemma "discovery" with up to two word tokens in between:
  
 ;#; ;#;
Line 348: Line 348:
  
  
-=== Metadata queries===+==== Metadata queries ====
  
 By using a conjunction of a span query and a logical metadata query the results can be filtered not to include a particular source or to include only cerain types of translation relationship between segments. Metadata queries use the ''[[https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html|Apache Solr DisMax syntax]]'' and they are appended as a logical conjunction to the obligatory span query. By using a conjunction of a span query and a logical metadata query the results can be filtered not to include a particular source or to include only cerain types of translation relationship between segments. Metadata queries use the ''[[https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html|Apache Solr DisMax syntax]]'' and they are appended as a logical conjunction to the obligatory span query.
Line 368: Line 368:
 === Query facets === === Query facets ===
  
-For every query the search engine computes a summary of matches from different metadata categories in the entire corpus. These summaries are called 'facets' and are divided into: source, genre, medium type.+For every query the search engine computes a summary of matches from different metadata categories in the entire corpus. These summaries are called 'facets' and are divided into: source (a large number of individual texts), genre (seven subcategories, e.g. literary prose), medium (five subcategories, e.g. internet, book), type (nine subcategories of translation relationships between segments, e.g. paraphrase).
  
 Facets are visualized as pie charts in the **Statistics** section of the results screen and are also presented in the form of **interactive tables**. Users can use those tables to narrow down the results to particular facets. Facets are visualized as pie charts in the **Statistics** section of the results screen and are also presented in the form of **interactive tables**. Users can use those tables to narrow down the results to particular facets.
 +
 +
 +=== Equivalence ===
 +This is another functionality that can be accessed through a tab in the menu of the results screen. 
 +
 +In the case of a simple query such as ''[[http://paralela.clarin-pl.eu/#search/pl/zatem/-1/-1/0/20/0/true/0/0/-1/-1/-1/-1/source|zatem]]'' we obtain a list of occurrences featuring lexemes that are likely equivalents of the search item, such as "therefore", "thus" and "so" as in the examples below:
 +
 +
 +^  # ^  Polish  ^ English  ^
 +|1|  Jest **zatem** całkowicie niewłaściwym , aby próbować regulować prawnie godziny pracy w całej UE , bo nie ma też jakiekolwiek ku temu powodu .      It is **therefore** entirely inappropriate to attempt to regulate the working hours of the whole of the EU , nor is there any reason to do so .     |
 +|2|  Głosuję **zatem** za przyjęciem przedmiotowego sprawozdania i chciałabym pogratulować sprawozdawcy .    |  I am **therefore** voting for this report , and would congratulate the rapporteur .     |
 +|3|  A zatem odpowiedź na pana pytanie brzmi : tak .    **So** the answer to your question is yes .     |
 +|4|  Potrzebujemy **zatem** kompromisowych rozwiązań dla dzieł osieroconych , co wymaga przeprowadzenia dokładnych badań mających na celu identyfikację prawowitych właścicieli praw autorskich .  |  **Thus** , we need consensual solutions for orphan works and a very thorough search to find out who the rightful copyright holders are .    |
 +|5|  Mój czas dobiega końca , **zatem** chciałbym państwa jedynie poprosić o zapoznanie się z tymi informacjami i udzielenie poparcia .    My time is up , **so** I would just like to ask you to read it and to support it .      |
 +
 +
 +==== Place holders ====
 +
 +
 +;#;
 +''[[http://paralela.clarin-pl.eu/#search/pl/-1/it%20is%20<pos=.*>%20clear%20that/-1/0/500/0/true/0/true/-1/-1/-1/-1/source|it is <pos=.*> clear that]]''
 +;#;
 +
 +
 +
 +
  
  
paralela.1520856675.txt.gz · Last modified: 2018/03/12 13:11 by mdeckert