org.dbpedia.extraction.mappings

MissingAbstractsExtractor

class MissingAbstractsExtractor extends PageNodeExtractor

Extracts page abstracts which are not yet extracted. For each page which is a candidate for extraction

From now on we use MobileFrontend for MW <2.21 and TextExtracts for MW > 2.22 The patched mw instance is no longer needed except from minor customizations in LocalSettings.php TODO: we need to adapt the TextExtracts extension to accept custom wikicode syntax. TextExtracts now uses the article entry and extracts the abstract. The retional for the new extension is that we will not need to load all articles in MySQL, just the templates At the moment, setting up the patched MW takes longer than the loading of all articles in MySQL :) so, even this way it's way better and cleaner ;) We leave the old code commented since we might re-use it soon

Linear Supertypes
PageNodeExtractor, Extractor[PageNode], Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. MissingAbstractsExtractor
  2. PageNodeExtractor
  3. Extractor
  4. Serializable
  5. AnyRef
  6. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new MissingAbstractsExtractor(context: AnyRef { ... /* 2 definitions in type refinement */ })

Value Members

  1. final def !=(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  4. def apiUrl: String

    Attributes
    protected
  5. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  6. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. val datasets: Set[Dataset]

    Datasets generated by this extractor.

    Datasets generated by this extractor. Used for serialization. If a mapping implementation does not return all datasets it produces, serialization may fail.

    Definition Classes
    MissingAbstractsExtractorExtractor
  8. def decodeHtml(text: String): String

    Get the wiki text that contains the abstract text.

  9. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  11. def extract(pageNode: PageNode, subjectUri: String): Seq[Quad]

    subjectUri

    The subject URI of the generated triples

    returns

    A graph holding the extracted data

    Definition Classes
    MissingAbstractsExtractorExtractor
  12. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. def finalizeExtractor(): Unit

    when extractor needs some finalization

    when extractor needs some finalization

    Definition Classes
    Extractor
  14. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  15. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  16. def initializeExtractor(): Unit

    when extractor has a pre-phase

    when extractor has a pre-phase

    Definition Classes
    Extractor
  17. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  18. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  19. final def notify(): Unit

    Definition Classes
    AnyRef
  20. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  21. def retrievePage(pageTitle: WikiTitle): String

    Retrieves a Wikipedia page.

    Retrieves a Wikipedia page.

    pageTitle

    The encoded title of the page

    returns

    The page as an Option

  22. def short(text: String, max: Int = 500): String

    Returns the first sentences of the given text that have less than 500 characters.

    Returns the first sentences of the given text that have less than 500 characters. A sentence ends with a dot followed by whitespace. TODO: probably doesn't work for most non-European languages. TODO: analyse ActiveAbstractExtractor, I think this works quite well there, because it takes the first two or three sentences

    text
    max

    max length

    returns

    result string

  23. var state: ExtractorState.Value

    Definition Classes
    Extractor
  24. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  25. def toString(): String

    Definition Classes
    AnyRef → Any
  26. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from PageNodeExtractor

Inherited from Extractor[PageNode]

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped