slopeq_for_bnc
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
slopeq_for_bnc [2015/07/08 17:56] – [Wild card and quantifiers] gaszewski | slopeq_for_bnc [2017/02/03 01:13] – [Grammatical queries] mmolenda | ||
---|---|---|---|
Line 5: | Line 5: | ||
===== Surface queries ===== | ===== Surface queries ===== | ||
- | This is the simplest type of queries. Words are written in the query box in their plain orthographic form. The results are occurrences of the particular forms submitted in the query. Compare the query and example | + | This is the simplest type of queries. Words are written in the query box in their plain orthographic form. The results are occurrences of the particular forms submitted in the query. Compare the query and the results: |
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 21: | Line 21: | ||
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 40: | Line 40: | ||
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 55: | Line 55: | ||
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 74: | Line 74: | ||
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 87: | Line 87: | ||
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 104: | Line 104: | ||
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 119: | Line 119: | ||
==== Slop factor ==== | ==== Slop factor ==== | ||
- | This important functionality allows to search for a discontinuous string of words. The query specifies how many words may intervene between the terms of the query. The searched words are taken into round brackets and the allowed number of intervening words is given after the equation sign, e.g.: | + | This important functionality allows to search for a discontinuous string of words. The query specifies how many words may intervene between the terms of the query. The allowed number of intervening words is set with the slider located below the search box. In this Wiki, the value of the Slop factor is indicated by the following expression: (Slop factor = 1,2,3... etc.). |
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 137: | Line 137: | ||
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 151: | Line 151: | ||
==== Slop factor with relaxed order ==== | ==== Slop factor with relaxed order ==== | ||
- | These queries allow intervening words up to the specified number and the query terms may appear in any order. | + | These queries allow intervening words up to the specified number and the query terms may appear in any order. |
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 164: | Line 163: | ||
|5 | What was| wrong with that approach | |5 | What was| wrong with that approach | ||
- | The relaxed order in pure form is available | + | The relaxed order in pure form is available |
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 180: | Line 179: | ||
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 197: | Line 196: | ||
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 230: | Line 229: | ||
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 244: | Line 243: | ||
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 257: | Line 256: | ||
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 273: | Line 272: | ||
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 289: | Line 288: | ||
;#; | ;#; | ||
- | [[http:// | + | [[http:// |
;#; | ;#; | ||
Line 307: | Line 306: | ||
SlopeQ for the BNC makes use of the BNC tagset. We will not present the tagset in full here, but only discuss its application for our search engine. A list of all the tags is available [[http:// | SlopeQ for the BNC makes use of the BNC tagset. We will not present the tagset in full here, but only discuss its application for our search engine. A list of all the tags is available [[http:// | ||
- | The tags are three-symbol codes, which classify the exact grammatical form of the given form in the corpus. For example, “NN1” marks a singular common noun, “AJC” marks a comparative adjective and “VVG” marks an –ing form of a lexical verb. You can use exact tags like these, but it is even better to make underspecified queries by means of the wild card (and quantifiers). This is possible because the tags form a neatly ordered system. Thus, all tags starting with “N” mark nouns, all tags starting with “V” mark verbs, in particular lexical verbs are marked by tags with “VV” at the beginning, and all tags starting with “AJ” mark adjectives etc. | + | The tags are three-symbol codes, which classify the exact grammatical form of the given word in the corpus. For example, “NN1” marks a singular common noun, “AJC” marks a comparative adjective and “VVG” marks an –ing form of a lexical verb. You can use exact tags like these, but it is even better to make underspecified queries by means of the wild card (and quantifiers). This is possible because the tags form a neatly ordered system. Thus, all tags starting with “N” mark nouns, all tags starting with “V” mark verbs, in particular lexical verbs are marked by tags with “VV” at the beginning, and all tags starting with “AJ” mark adjectives etc. |
All grammatical queries have the same formula. They are written in triangular brackets as an equation “pos=” (pos stands for part-of-speech). The tag or regex tag is put immediately after the equation sign. Our first example will use an exact tag, the one for the third person singular present tense of lexical verbs. | All grammatical queries have the same formula. They are written in triangular brackets as an equation “pos=” (pos stands for part-of-speech). The tag or regex tag is put immediately after the equation sign. Our first example will use an exact tag, the one for the third person singular present tense of lexical verbs. | ||
+ | |||
+ | ;#; | ||
+ | [[http:// | ||
+ | ;#; | ||
+ | |||
+ | ^ # ^ Left ^ Match ^ | ||
+ | |1 | I 'm sure if ever the occasion| | ||
+ | |2 | they can move on to full doctor status and for many students the chance to experience life in another country more than| makes | up for the extra years of study .| | ||
+ | |3 | It also| requires | ||
+ | |4 | The trust has now drawn up detailed plans and| claims | ||
+ | |5 | I do n't suppose anyone really| | ||
+ | |||
+ | A formula covering a series of tags can be obtained by using the wild card(s). For example, in order to search for nouns in general you can input < | ||
+ | |||
+ | It is possible to combine grammatical search with other functionalities. The following query yields sequences of the word // | ||
+ | |||
+ | ;#; | ||
+ | [[http:// | ||
+ | ;#; | ||
+ | |||
+ | ^ # ^ Left ^ Match ^ | ||
+ | |1 | A paint-effect wall makes a| beautiful backdrop | ||
+ | |2 | HOMES on a north Belfast street have rooms with a| beautiful view | today . | | ||
+ | |3 | Miss Harker removed her bonnet , a| beautiful item | with long blue ribbons , and looked round for somewhere to hang it .| | ||
+ | |4 | the woman could only hope that moving her right away from the influence of the people she went around with into these| | ||
+ | |5 | And you gave me everything , my| beautiful Maggie | ||
+ | |||
+ | This kind of query is very good for researching collocates of a given word that are from a specific grammatical class. | ||
+ | |||
+ | The next query involves the slop factor and a base form. The results are sequences of any form of the word //derive// followed by a preposition with one intervening word possible. | ||
+ | |||
+ | ;#; | ||
+ | [[http:// | ||
+ | ;#; | ||
+ | |||
+ | ^ # ^ Left ^ Match ^ | ||
+ | |1 | More frequently these| | ||
+ | |2 | The payments pursuant to the discretionary power| | ||
+ | |3 | As a result , it is impossible to| derive egalitarianism in | the Marxist sense from a Biblical foundation .| | ||
+ | |4 | the development of local government reflected economic organization and the political processes which| | ||
+ | |5 | A cosmopolitan group| | ||
+ | |6 | The model is| derived by | the processes of data analysis| | ||
+ | |||
+ | It is also possible to apply base form query and grammatical query to a single term. The labels “lemma=” and “pos=” need to be taken in the same pair of brackets then. The following query yields occurrences of all forms of the verb // | ||
+ | |||
+ | ;#; | ||
+ | [[http:// | ||
+ | ;#; | ||
+ | |||
+ | ^ # ^ Left ^ Match ^ | ||
+ | |1 | As they| approach | ||
+ | |2 | she added : ‘ A customer is a customer , I| approach | ||
+ | |3 | When The Art Newspaper| | ||
+ | |4 | When CDC 's intention became clear , it was| approached | ||
+ | |5 | We think now , as Christmas| | ||
+ | |||
+ | In general, it is possible to add the grammatical specification to any other kind of query term by writing it immediately after the term (without a space). Below we have a regex query for words ending in //-fish//, but they must be tagged as nouns. In this way we exclude adjectives like // | ||
+ | |||
+ | ;#; | ||
+ | [[http:// | ||
+ | ;#; | ||
+ | |||
+ | ^ # ^ Left ^ Match ^ | ||
+ | |1 | Experienced , mature| | ||
+ | |2 | Over-exploitation has led to a collapse in numbers of| bluefish | ||
+ | |3 | A popular aquarium fish , the range of the Redfin| | ||
+ | |4 | We have a 10 gallon tank with an undergravel filter containing two common goldfish , one fancy| | ||
+ | |5 | Ian Lucas spots some new opportunities with mid-price| | ||
+ | |6 | Echinoderms , like| starfish | ||
+ | |||
slopeq_for_bnc.txt · Last modified: 2017/02/03 01:15 by mmolenda