org.dbpedia.extraction

mappings

package mappings

Visibility
  1. Public
  2. All

Type Members

  1. class AbstractExtractorWikipedia extends NifExtractor

    User: Dimitris Kontokostas Description Created: 5/19/14 9:21 AM

  2. class AnchorTextExtractor extends PageNodeExtractor

    Extracts the link texts used to refer to a page by means of internal links.

    Extracts the link texts used to refer to a page by means of internal links. This data provides one part of the input for the surface forms dataset.

  3. class ArticleCategoriesExtractor extends PageNodeExtractor

    Extracts links from concepts to categories using the SKOS vocabulary.

  4. class ArticlePageExtractor extends PageNodeExtractor

    Extracts links to corresponding Articles in Wikipedia.

  5. class ArticleTemplatesExtractor extends PageNodeExtractor

    This extractor extracts all templates that exist in an article.

    This extractor extracts all templates that exist in an article. This data can be used for Wikipedia administrative tasks.

  6. class BadQuadException extends Exception

    Will be thrown by WriterDestination to signal failed Quads (i.e.

    Will be thrown by WriterDestination to signal failed Quads (i.e. Bad Iri)

  7. class CalculateMapping extends PropertyMapping

  8. class CategoryLabelExtractor extends WikiPageExtractor

    Extracts labels for Categories.

  9. class CitationExtractor extends WikiPageExtractor

    This extractor extract citation data from articles to boostrap this it is based on the infoboxExtractor

  10. class CitedFactsExtractor extends WikiPageExtractor

    ewxperimental extractor that extracts infobox facts that have a citation on the same line and places the citation in the context

  11. class CombineDateMapping extends PropertyMapping

    TODO: change the syntax on the mappings wiki to allow an arbitrary number of template properties.

  12. class CommonsKMLExtractor extends WikiPageExtractor

    Extract KML files from the Commons and links them to the original document.

    Extract KML files from the Commons and links them to the original document. There are only 160 KML files on the Commons right now, but some day there might be more.

    These are currently used as overlays, documented at https://commons.wikimedia.org/wiki/Commons:Geocoding/Overlay

  13. class CommonsResourceExtractor extends PageNodeExtractor

    Links non-commons DBpedia resources to their DBpedia Commons counterpart using owl:sameAs.

    Links non-commons DBpedia resources to their DBpedia Commons counterpart using owl:sameAs. This requires the the Wikipedia page to contain a {{Commons}} template.

    Example http://en.wikipedia.org/wiki/Eurasian_blue_tit: Page contains node: {{Commons|Cyanistes caeruleus}}

    Produces triple: <dbr:Eurasian_blue_tit> <owl:sameAs> <dbpedia-commons:Cyanistes_caeruleus>.

  14. class CompositeExtractor[N] extends Extractor[N]

    TODO: generic type may not be optimal.

  15. class CompositeJsonNodeExtractor extends CompositeExtractor[JsonNode] with JsonNodeExtractor

  16. class CompositePageNodeExtractor extends CompositeExtractor[PageNode] with PageNodeExtractor

  17. class CompositeParseExtractor extends WikiPageExtractor

    TODO: generic type may not be optimal.

  18. class CompositeWikiPageExtractor extends CompositeExtractor[WikiPage] with WikiPageExtractor

  19. class ConditionMapping extends Extractor[TemplateNode]

  20. class ConditionalMapping extends Extractor[TemplateNode]

  21. class ConstantMapping extends PropertyMapping

    Used to map information that is only contained in the infobox template name, for example

    Used to map information that is only contained in the infobox template name, for example

    en:Infobox_Australian_Road {{TemplateMapping | mapToClass = Road | mappings = {{ConstantMapping | ontologyProperty = country | value = Australia }} ... }}

  22. class ContributorExtractor extends WikiPageExtractor

    Created by IntelliJ IDEA.

    Created by IntelliJ IDEA. User: Mohamed Morsey Date: 11/29/11 Time: 5:49 PM Extracts the information that describes the contributor (editor) of a Wikipedia page, such as his username, and his ID.

  23. class DBpediaResourceExtractor extends PageNodeExtractor

    Links DBpedia Commons resources to their counterparts in other DBpedia languages (only en, de and fr) using owl:sameAs.

    Links DBpedia Commons resources to their counterparts in other DBpedia languages (only en, de and fr) using owl:sameAs. This requires the the Wikimedia page to contain a {{VN}} template.

    Example http://commons.wikimedia.org/wiki/Cyanistes_caeruleus: Page contains node: {{VN |de=Blaumeise |en=Blue Tit |fr=Mésange bleue }}

    Produces triple: <dbpedia-commons:Cyanistes_caeruleus> <owl:sameAs> <dbr:Eurasian_blue_tit>. <dbpedia-commons:Cyanistes_caeruleus> <owl:sameAs> <dbpedia-de:Blaumeise>. <dbpedia-commons:Cyanistes_caeruleus> <owl:sameAs> <dbpedia-fr:Mésange_bleue>

  24. class DateIntervalMapping extends PropertyMapping

  25. class DisambiguationExtractor extends PageNodeExtractor

    Extracts disambiguation links.

  26. class Disambiguations extends Serializable

  27. class ExternalLinksExtractor extends PageNodeExtractor

    Extracts links to external web pages.

  28. class ExtractionMonitor extends AnyRef

  29. trait Extractor[-N] extends Serializable

    TODO: generic type may not be optimal.

  30. class FileTypeExtractor extends WikiPageExtractor

    Identifies the type of a File page.

  31. class GalleryExtractor extends WikiPageExtractor

    Extract images from galleries.

    Extract images from galleries. I'm not sure what the best RDF representation of this will be, but for now we'll start with:

    • <Main:Gallery page> <dbo:galleryItem> <File:Image>

    The gallery tag is documented at https://en.wikipedia.org/wiki/Help:Gallery_tag

  32. class GenderExtractor extends MappingExtractor

    Extracts the grammatical gender of people using a heuristic.

  33. class GeoCoordinatesMapping extends PropertyMapping

    Extracts geo-coodinates.

  34. class GeoExtractor extends PageNodeExtractor

    Extracts geo-coodinates.

  35. class HomepageExtractor extends PageNodeExtractor

    Extracts links to the official homepage of an instance.

  36. class HybridRawAndMappingExtractor extends PageNodeExtractor

    Combines the raw infobox and mappings extractor and tries to split the triples of the raw infobox extractor in triples that were mapped from the mappings extractors and triples that were not mapped

  37. class ImageAnnotationExtractor extends PageNodeExtractor

    Extracts image annotations created using the Image Annotator gadget (https://commons.wikimedia.org/wiki/Help:Gadget-ImageAnnotator)

    Extracts image annotations created using the Image Annotator gadget (https://commons.wikimedia.org/wiki/Help:Gadget-ImageAnnotator)

    The RDF produced uses the W3C Media Fragments 1.0 to identify parts of an image: http://www.w3.org/TR/2012/REC-media-frags-20120925/

  38. class ImageExtractorNew extends PageNodeExtractor

    Reworked Image Extractor

  39. class InfoboxExtractor extends PageNodeExtractor

    This extractor extracts all properties from all infoboxes.

    This extractor extracts all properties from all infoboxes. Extracted information is represented using properties in the http://xx.dbpedia.org/property/ namespace (where xx is the language code). The names of the these properties directly reflect the name of the Wikipedia infobox property. Property names are not cleaned or merged. Property types are not part of a subsumption hierarchy and there is no consistent ontology for the infobox dataset. The infobox extractor performs only a minimal amount of property value clean-up, e.g., by converting a value like “June 2009” to the XML Schema format “2009–06”. You should therefore use the infobox dataset only if your application requires complete coverage of all Wikipeda properties and you are prepared to accept relatively noisy data.

  40. class InfoboxMappingsExtractor extends PageNodeExtractor

    Extracts template variables from template pages (see http://en.wikipedia.org/wiki/Help:Template#Handling_parameters)

  41. class InfoboxMappingsTemplateExtractor extends WikiPageExtractor

  42. class InterLanguageLinksExtractor extends PageNodeExtractor

    Extracts interwiki links

  43. class IntermediateNodeMapping extends PropertyMapping

  44. trait JsonNodeExtractor extends Extractor[JsonNode]

    Extractors are mappings that extract data from a JsonNode.

    Extractors are mappings that extract data from a JsonNode. Necessary to get some type safety in CompositeExtractor: Class[_ <: Extractor] can be checked at runtime, but Class[_ <: Mapping[PageNode]] can not.

  45. class JsonParseExtractor extends WikiPageExtractor

    User: hadyelsahar Date: 11/19/13 Time: 12:43 PM

    User: hadyelsahar Date: 11/19/13 Time: 12:43 PM

    JsonParseExtractor as explained in the design : https://f.cloud.github.com/assets/607468/363286/1f8da62c-a1ff-11e2-99c3-bb5136accc07.png

    send page to JsonParser, if jsonparser returns none do nothing if it's parsed correctly send the JsonNode to the next level extractors

  46. class LabelExtractor extends WikiPageExtractor

    Extracts labels to articles based on their title.

  47. class MappingExtractor extends PageNodeExtractor

    Extracts structured data based on hand-generated mappings of Wikipedia infoboxes to the DBpedia ontology.

  48. class Mappings extends Serializable

  49. class MediaExtractor extends PageNodeExtractor

    Extracts all media files of a Wikipedia page.

    Extracts all media files of a Wikipedia page. Constructs a thumbnail image from it, and links to the resources in DBpedia Commons

    FIXME: we're sometimes dealing with encoded links, sometimes with decoded links. It's quite a mess.

  50. class MetaInformationExtractor extends WikiPageExtractor

    Created by IntelliJ IDEA.

    Created by IntelliJ IDEA. User: Mohamed Morsey Date: 9/13/11 Time: 9:03 PM Extracts page's meta-information e.g. editlink, revisonlink, ....

  51. class MissingAbstractsExtractor extends PageNodeExtractor

    Extracts page abstracts which are not yet extracted.

    Extracts page abstracts which are not yet extracted. For each page which is a candidate for extraction

    From now on we use MobileFrontend for MW <2.21 and TextExtracts for MW > 2.22 The patched mw instance is no longer needed except from minor customizations in LocalSettings.php TODO: we need to adapt the TextExtracts extension to accept custom wikicode syntax. TextExtracts now uses the article entry and extracts the abstract. The retional for the new extension is that we will not need to load all articles in MySQL, just the templates At the moment, setting up the patched MW takes longer than the loading of all articles in MySQL :) so, even this way it's way better and cleaner ;) We leave the old code commented since we might re-use it soon

  52. class NifExtractor extends WikiPageExtractor

    Extracts page html.

    Extracts page html.

    Based on AbstractExtractor, major difference is the parameter apiParametersFormat = "action=parse&prop=text&section=0&format=xml&page=%s"

    This class produces all nif related datasets for the abstract as well as the short-, long-abstracts datasets. Where the long abstracts is the nif:isString attribute of the nif instance representing the abstract section of a wikipage.

    We are going to to use this method for generating the abstracts from release 2016-10 onwards. It will be expanded to cover the whole wikipage in the future.

    Annotations
    @ExtractorAnnotation( name = "nif extractor" )
  53. class PageIdExtractor extends WikiPageExtractor

    Extracts page ids of articles, e.g.

    Extracts page ids of articles, e.g. <http://dbpedia.org/resource/Foo> <http://dbpedia.org/ontology/wikiPageID> "123456"^^<xsd:integer> .

  54. class PageLinksExtractor extends PageNodeExtractor

    Extracts internal links between DBpedia instances from the internal page links between Wikipedia articles.

    Extracts internal links between DBpedia instances from the internal page links between Wikipedia articles. The page links might be useful for structural analysis, data mining or for ranking DBpedia instances using Page Rank or similar algorithms.

  55. trait PageNodeExtractor extends Extractor[PageNode]

    Extractors are mappings that extract data from a PageNode.

    Extractors are mappings that extract data from a PageNode. Necessary to get some type safety in CompositeExtractor: Class[_ <: Extractor] can be checked at runtime, but Class[_ <: Mapping[PageNode]] can not.

  56. class PersondataExtractor extends PageNodeExtractor

    Extracts information about persons (date and place of birth etc.) from the English and German Wikipedia, represented using the FOAF vocabulary.

  57. class PndExtractor extends PageNodeExtractor

    Extracts PND (Personennamendatei) data about a person.

    Extracts PND (Personennamendatei) data about a person. PND is published by the German National Library. For each person there is a record with name, birth and occupation connected with a unique identifier, the PND number. TODO: also use http://en.wikipedia.org/wiki/Template:Authority_control and other templates.

  58. trait PropertyMapping extends Extractor[TemplateNode]

    Marker trait for mappings which map one or more properties of a specific class.

    Marker trait for mappings which map one or more properties of a specific class. Necessary to make PropertyMappings distinguishable from other Mapping[TemplateNode] types.

  59. class ProvenanceExtractor extends WikiPageExtractor

    Extracts links to the article revision that the data was extracted from, e.g.

    Extracts links to the article revision that the data was extracted from, e.g. <http://dbpedia.org/resource/Foo> <http://www.w3.org/ns/prov#wasDerivedFrom> <http://en.wikipedia.org/wiki/Foo?oldid=123456> .

  60. class RedirectExtractor extends WikiPageExtractor

    Extracts redirect links between Articles in Wikipedia.

  61. class Redirects extends Serializable

    Holds the redirects between wiki pages At the moment, only redirects between Templates are considered

  62. class RevisionIdExtractor extends WikiPageExtractor

    Extracts revision ids of articles, e.g.

    Extracts revision ids of articles, e.g. <http://dbpedia.org/resource/Foo> <http://dbpedia.org/ontology/wikiPageRevisionID> "123456"^^<xsd:integer> .

  63. class SimplePropertyMapping extends PropertyMapping

  64. class SkosCategoriesExtractor extends PageNodeExtractor

    Extracts information about which concept is a category and how categories are related using the SKOS Vocabulary.

  65. class TableMapping extends Extractor[TableNode]

  66. class TemplateMapping extends Extractor[TemplateNode]

  67. class TemplateParameterExtractor extends PageNodeExtractor

    Extracts template variables from template pages (see http://en.wikipedia.org/wiki/Help:Template#Handling_parameters)

  68. class TopicalConceptsExtractor extends PageNodeExtractor

    Relies on Cat main templates.

    Relies on Cat main templates. Goes over all categories and extract DBpedia Resources that are the main subject of that category. We are using this to infer that a resource is a Topical Concept.

    TODO only do that for resources that have no other ontology type, in post-processing

    TODO check if templates Cat_exp, Cat_main_section, and Cat_more also apply

  69. class UriSameAsIriExtractor extends PageNodeExtractor

    Extracts sameAs links for resources with themselves.

    Extracts sameAs links for resources with themselves. Only makes sense when serialization is configured such that subjects are IRIs and objects are URIs (or vice versa).

  70. trait WikiPageExtractor extends Extractor[WikiPage]

    Extractors are mappings that extract data from a WikiPage.

    Extractors are mappings that extract data from a WikiPage. Necessary to get some type safety in CompositeExtractor: Class[_ <: Extractor] can be checked at runtime, but Class[_ <: Mapping[PageNode]] can not.

  71. class WikiPageLengthExtractor extends WikiPageExtractor

    Extracts the number of characters in a wikipedia page

  72. class WikiPageOutDegreeExtractor extends PageNodeExtractor

    Extracts the number of external links to DBpedia instances from the internal page links between Wikipedia articles.

    Extracts the number of external links to DBpedia instances from the internal page links between Wikipedia articles. The Out Degree might be useful for structural analysis, data mining or for ranking DBpedia instances using Page Rank or similar algorithms. In Degree cannot be calculated at extraction time but with a post processing step from the PageLinks dataset

  73. class WikiParseExtractor extends WikiPageExtractor

    User: hadyelsahar Date: 11/19/13 Time: 12:43 PM

    User: hadyelsahar Date: 11/19/13 Time: 12:43 PM

    ParseExtractors as explained in the design : https://f.cloud.github.com/assets/607468/363286/1f8da62c-a1ff-11e2-99c3-bb5136accc07.png

    send page to SimpleWikiParser, if it returns none do nothing if it's parsed correctly send the PageNode to the next level extractors

  74. class WikidataAliasExtractor extends JsonNodeExtractor

    Created by ali on 7/29/14.

    Created by ali on 7/29/14. Extracts aliases triples from Wikidata sources on the form of <http://wikidata.dbpedia.org/resource/Q446> <http://dbpedia.org/ontology/alias> "alias"@lang .

  75. class WikidataDescriptionExtractor extends JsonNodeExtractor

    Created by ali on 7/29/14.

    Created by ali on 7/29/14. Extracts descriptions triples from Wikidata sources on the form of <http://wikidata.dbpedia.org/resource/Q139> <http://dbpedia.org/ontology/description> "description"@lang.

  76. class WikidataLLExtractor extends JsonNodeExtractor

  77. class WikidataLabelExtractor extends JsonNodeExtractor

    Extracts labels triples from Wikidata sources on the form of http://data.dbpedia.org/Q64 rdfs:label "new York"@fr http://data.dbpedia.org/Q64 rdfs:label "new York City"@en

  78. class WikidataNameSpaceSameAsExtractor extends JsonNodeExtractor

    it's an extractor to extract Mappings between Wikidata URIs to WikiData URIs inside DBpedia, in the form of : <http://wikidata.dbpedia.org/resource/Q18> <owl:sameas> <http://www.wikidata.org/entity/Q18>

  79. class WikidataPropertyExtractor extends JsonNodeExtractor

    Created by ali on 2/28/15.

    Created by ali on 2/28/15. wikidata property page's aliases, descriptions, labels and statements are extracted. wikidata nampespace used for property pages.

    aliases are extracted on the form of wikidata:P102 dbo:alias "political party, party"@en .

    descriptions are extacted on the form of wikidata:P102 dbo:description "the political party of which this politician is or has been a member"@en .

    labels are extracted on the form of wikidata:P102 rdfs:label "member of political party"@en.

    statements are extracted on the form of wikidata:P102 wikidata:P1646 wikidata:P580 .

  80. class WikidataR2RExtractor extends JsonNodeExtractor

    Created by ali on 10/26/14.

    Created by ali on 10/26/14. This extractor maps wikidata statements to DBpedia ontology wd:Q64 dbo:primeMinister wd:Q8863.

    In order to extract n-ary relation mapped statements are reified. For reification unique statement URIs is created. Mapped statements reified on the form of wd:Q64_P6_Q8863 rdf:type rdf:Statement. wd:Q64_P6_Q8863 rdf:subject wd:Q64 . wd:Q64_P6_Q8863 rdf:predicate dbo:primeMinister. wd:Q64_P6_Q8863 rdf:object wd:Q8863.

    Qualifiers use same statement URIs and mapped on the form of wd:Q64_P6_Q8863 dbo:startDate "2001-6-16"xsd:date. wd:Q64_P6_Q8863 dbo:endDate "2014-12-11"xsd:date.

  81. class WikidataRawExtractor extends JsonNodeExtractor

    Created by ali on 10/26/14.

    Created by ali on 10/26/14. Raw wikidata statements extracted on the form of wd:Q64 wikidata:P6 wd:Q8863.

    In order to extract n-ary relation statements are reified. For reification unique statement URIs is created. Mapped statements reified on the form of wd:Q64_P6_Q8863 rdf:type rdf:Statement. wd:Q64_P6_Q8863 rdf:subject wd:Q64 . wd:Q64_P6_Q8863 rdf:predicate wikidata:P6. wd:Q64_P6_Q8863 rdf:object wd:Q8863.

    Qualifiers use same statement URIs and extracted on the form of wd:Q64_P6_Q8863 wikidata:P580 "2001-6-16"xsd:date. wd:Q64_P6_Q8863 wikidata:P582 "2014-12-11"xsd:date.

  82. class WikidataReferenceExtractor extends JsonNodeExtractor

    Created by ali on 10/26/14.

    Created by ali on 10/26/14. Wikidata statement's references extracted on the form of wd:Q76_P140_V39759 dbo:reference "http://www.christianitytoday.com/ct/2008/januaryweb-only/104-32.0.html?start=2"^^ xsd:string.

  83. class WikidataSameAsExtractor extends JsonNodeExtractor

    it's an extractor to extract sameas data from DBpedia-WikiData on the form of <http://wikidata.dbpedia.org/resource/Q18> owl:sameAs <http://dbpedia.org/resource/London> <http://wikidata.dbpedia.org/resource/Q18> owl:sameAs <http://fr.dbpedia.org/resource/London> <http://wikidata.dbpedia.org/resource/Q18> owl:sameAs <http://co.dbpedia.org/resource/London>

  84. class AbstractExtractor extends WikiPageExtractor

    Extracts wiki texts like abstracts or sections in html.

    Extracts wiki texts like abstracts or sections in html. NOTE: This class is not only used for abstract extraction but for extracting wiki text of the whole page The NifAbstract Extractor is extending this class. All configurations are now outsourced to //extraction-framework/core/src/main/resources/mediawikiconfig.json change the 'publicParams' entries for tweaking endpoint and time parameters

    From now on we use MobileFrontend for MW <2.21 and TextExtracts for MW > 2.22 The patched mw instance is no longer needed except from minor customizations in LocalSettings.php TextExtracts now uses the article entry and extracts the abstract. The retional for the new extension is that we will not need to load all articles in MySQL, just the templates At the moment, setting up the patched MW takes longer than the loading of all articles in MySQL :) so, even this way it's way better and cleaner ;) We leave the old code commented since we might re-use it soon

    Annotations
    @deprecated @ExtractorAnnotation( name = "abstract extractor" )
    Deprecated

    (Since version 2016-10) replaced by NifExtractor.scala: which will extract the whole page content including the abstract

  85. class ImageExtractor extends PageNodeExtractor

    Extracts the first image of a Wikipedia page.

    Extracts the first image of a Wikipedia page. Constructs a thumbnail from it, and the full size image.

    FIXME: we're sometimes dealing with encoded links, sometimes with decoded links. It's quite a mess.

    Annotations
    @deprecated
    Deprecated

    (Since version 2017-08) replaced by ImageExtractorNew

Value Members

  1. object AbstractExtractor extends Serializable

  2. object AnchorTextExtractor extends Serializable

  3. object CompositeJsonNodeExtractor extends Serializable

    Creates new extractors.

  4. object CompositePageNodeExtractor extends Serializable

    Creates new extractors.

  5. object CompositeParseExtractor extends Serializable

    Creates new extractors.

  6. object Disambiguations extends Serializable

  7. object ExtractorState extends Enumeration

  8. object InfoboxExtractor extends Serializable

  9. object MappingsLoader extends Serializable

    Loads the mappings from the configuration and builds a MappingExtractor instance.

    Loads the mappings from the configuration and builds a MappingExtractor instance. This should be replaced by a general loader later on, which loads the mapping objects based on the grammar (which can be defined using annotations)

  10. object MissingAbstractsExtractor extends Serializable

  11. object NifExtractor extends Serializable

  12. object Redirects extends Serializable

    Loads redirects from a cache file or source of Wiki pages.

    Loads redirects from a cache file or source of Wiki pages. At the moment, only redirects between Templates are considered

  13. package fr

  14. package rml

Ungrouped