MissingAbstractsExtractor

Extracts page abstracts which are not yet extracted. For each page which is a candidate for extraction

From now on we use MobileFrontend for MW <2.21 and TextExtracts for MW > 2.22 The patched mw instance is no longer needed except from minor customizations in LocalSettings.php TODO: we need to adapt the TextExtracts extension to accept custom wikicode syntax. TextExtracts now uses the article entry and extracts the abstract. The retional for the new extension is that we will not need to load all articles in MySQL, just the templates At the moment, setting up the patched MW takes longer than the loading of all articles in MySQL :) so, even this way it's way better and cleaner ;) We leave the old code commented since we might re-use it soon

Linear Supertypes

PageNodeExtractor, Extractor[PageNode], Serializable, AnyRef, Any

Instance Constructors

new MissingAbstractsExtractor(context: AnyRef { ... /* 2 definitions in type refinement */ })

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def apiUrl: String

Attributes
protected
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
val datasets: Set[Dataset]

Datasets generated by this extractor.
Datasets generated by this extractor. Used for serialization. If a mapping implementation does not return all datasets it produces, serialization may fail.

Definition Classes
MissingAbstractsExtractor → Extractor
def decodeHtml(text: String): String

Get the wiki text that contains the abstract text.
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def extract(pageNode: PageNode, subjectUri: String): Seq[Quad]

subjectUri
The subject URI of the generated triples
returns
A graph holding the extracted data

Definition Classes
MissingAbstractsExtractor → Extractor
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def finalizeExtractor(): Unit

when extractor needs some finalization
when extractor needs some finalization

Definition Classes
Extractor
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
def initializeExtractor(): Unit

when extractor has a pre-phase
when extractor has a pre-phase

Definition Classes
Extractor
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def retrievePage(pageTitle: WikiTitle): String

Retrieves a Wikipedia page.
Retrieves a Wikipedia page.
pageTitle
The encoded title of the page
returns
The page as an Option
def short(text: String, max: Int = 500): String

Returns the first sentences of the given text that have less than 500 characters.
Returns the first sentences of the given text that have less than 500 characters. A sentence ends with a dot followed by whitespace. TODO: probably doesn't work for most non-European languages. TODO: analyse ActiveAbstractExtractor, I think this works quite well there, because it takes the first two or three sentences
text
max
max length
returns
result string
var state: ExtractorState.Value

Definition Classes
Extractor
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

class MissingAbstractsExtractor extends PageNodeExtractor

Instance Constructors

new MissingAbstractsExtractor(context: AnyRef { ... /* 2 definitions in type refinement */ })

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

def apiUrl: String

final def asInstanceOf[T0]: T0

def clone(): AnyRef

val datasets: Set[Dataset]

def decodeHtml(text: String): String

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def extract(pageNode: PageNode, subjectUri: String): Seq[Quad]

def finalize(): Unit

def finalizeExtractor(): Unit

final def getClass(): Class[_]

def hashCode(): Int

def initializeExtractor(): Unit

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def retrievePage(pageTitle: WikiTitle): String

def short(text: String, max: Int = 500): String

var state: ExtractorState.Value

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from PageNodeExtractor

Inherited from Extractor[PageNode]

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped