AbstractExtractor

Extracts wiki texts like abstracts or sections in html. NOTE: This class is not only used for abstract extraction but for extracting wiki text of the whole page The NifAbstract Extractor is extending this class. All configurations are now outsourced to //extraction-framework/core/src/main/resources/mediawikiconfig.json change the 'publicParams' entries for tweaking endpoint and time parameters

From now on we use MobileFrontend for MW <2.21 and TextExtracts for MW > 2.22 The patched mw instance is no longer needed except from minor customizations in LocalSettings.php TextExtracts now uses the article entry and extracts the abstract. The retional for the new extension is that we will not need to load all articles in MySQL, just the templates At the moment, setting up the patched MW takes longer than the loading of all articles in MySQL :) so, even this way it's way better and cleaner ;) We leave the old code commented since we might re-use it soon

Annotations: @deprecated @ExtractorAnnotation( name = "abstract extractor" )
Deprecated: (Since version 2016-10) replaced by NifExtractor.scala: which will extract the whole page content including the abstract

Linear Supertypes

WikiPageExtractor, Extractor[WikiPage], Serializable, AnyRef, Any

Instance Constructors

new AbstractExtractor(context: AnyRef { ... /* 3 definitions in type refinement */ })

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
val apiParametersFormat: String

Attributes
protected
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
val datasets: Set[Dataset]

Datasets generated by this extractor.
Datasets generated by this extractor. Used for serialization. If a mapping implementation does not return all datasets it produces, serialization may fail.

Definition Classes
AbstractExtractor → Extractor
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def extract(pageNode: WikiPage, subjectUri: String): Seq[Quad]

subjectUri
The subject URI of the generated triples
returns
A graph holding the extracted data

Definition Classes
AbstractExtractor → Extractor
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def finalizeExtractor(): Unit

when extractor needs some finalization
when extractor needs some finalization

Definition Classes
Extractor
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
def initializeExtractor(): Unit

when extractor has a pre-phase
when extractor has a pre-phase

Definition Classes
Extractor
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
val language: String

protected params ...
protected params ...

Attributes
protected
val logger: Logger

Attributes
protected
lazy val longProperty: OntologyProperty

Attributes
protected
lazy val longQuad: (String, String, String) ⇒ Quad

Attributes
protected
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def short(text: String, max: Int = 500): String

Returns the first sentences of the given text that have less than 500 characters.
Returns the first sentences of the given text that have less than 500 characters. A sentence ends with a dot followed by whitespace. TODO: probably doesn't work for most non-European languages. TODO: analyse ActiveAbstractExtractor, I think this works quite well there, because it takes the first two or three sentences
text
max
max length
returns
result string
lazy val shortProperty: OntologyProperty

Attributes
protected
lazy val shortQuad: (String, String, String) ⇒ Quad

Attributes
protected
var state: ExtractorState.Value

Definition Classes
Extractor
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

class AbstractExtractor extends WikiPageExtractor

Instance Constructors

new AbstractExtractor(context: AnyRef { ... /* 3 definitions in type refinement */ })

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

val apiParametersFormat: String

final def asInstanceOf[T0]: T0

def clone(): AnyRef

val datasets: Set[Dataset]

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def extract(pageNode: WikiPage, subjectUri: String): Seq[Quad]

def finalize(): Unit

def finalizeExtractor(): Unit

final def getClass(): Class[_]

def hashCode(): Int

def initializeExtractor(): Unit

final def isInstanceOf[T0]: Boolean

val language: String

val logger: Logger

lazy val longProperty: OntologyProperty

lazy val longQuad: (String, String, String) ⇒ Quad

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def short(text: String, max: Int = 500): String

lazy val shortProperty: OntologyProperty

lazy val shortQuad: (String, String, String) ⇒ Quad

var state: ExtractorState.Value

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from WikiPageExtractor

Inherited from Extractor[WikiPage]

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped