Datasets generated by this extractor.
Datasets generated by this extractor. Used for serialization. If a mapping implementation does not return all datasets it produces, serialization may fail.
Get the wiki text that contains the abstract text.
The subject URI of the generated triples
A graph holding the extracted data
when extractor needs some finalization
when extractor needs some finalization
when extractor has a pre-phase
when extractor has a pre-phase
Retrieves a Wikipedia page.
Retrieves a Wikipedia page.
The encoded title of the page
The page as an Option
Returns the first sentences of the given text that have less than 500 characters.
Returns the first sentences of the given text that have less than 500 characters. A sentence ends with a dot followed by whitespace. TODO: probably doesn't work for most non-European languages. TODO: analyse ActiveAbstractExtractor, I think this works quite well there, because it takes the first two or three sentences
max length
result string
Extracts page abstracts which are not yet extracted. For each page which is a candidate for extraction
From now on we use MobileFrontend for MW <2.21 and TextExtracts for MW > 2.22 The patched mw instance is no longer needed except from minor customizations in LocalSettings.php TODO: we need to adapt the TextExtracts extension to accept custom wikicode syntax. TextExtracts now uses the article entry and extracts the abstract. The retional for the new extension is that we will not need to load all articles in MySQL, just the templates At the moment, setting up the patched MW takes longer than the loading of all articles in MySQL :) so, even this way it's way better and cleaner ;) We leave the old code commented since we might re-use it soon