Datasets generated by this extractor.
Datasets generated by this extractor. Used for serialization. If a mapping implementation does not return all datasets it produces, serialization may fail.
The subject URI of the generated triples
A graph holding the extracted data
when extractor needs some finalization
when extractor needs some finalization
when extractor has a pre-phase
when extractor has a pre-phase
protected params ...
protected params ...
Returns the first sentences of the given text that have less than 500 characters.
Returns the first sentences of the given text that have less than 500 characters. A sentence ends with a dot followed by whitespace. TODO: probably doesn't work for most non-European languages. TODO: analyse ActiveAbstractExtractor, I think this works quite well there, because it takes the first two or three sentences
max length
result string
Extracts wiki texts like abstracts or sections in html. NOTE: This class is not only used for abstract extraction but for extracting wiki text of the whole page The NifAbstract Extractor is extending this class. All configurations are now outsourced to //extraction-framework/core/src/main/resources/mediawikiconfig.json change the 'publicParams' entries for tweaking endpoint and time parameters
From now on we use MobileFrontend for MW <2.21 and TextExtracts for MW > 2.22 The patched mw instance is no longer needed except from minor customizations in LocalSettings.php TextExtracts now uses the article entry and extracts the abstract. The retional for the new extension is that we will not need to load all articles in MySQL, just the templates At the moment, setting up the patched MW takes longer than the loading of all articles in MySQL :) so, even this way it's way better and cleaner ;) We leave the old code commented since we might re-use it soon
(Since version 2016-10) replaced by NifExtractor.scala: which will extract the whole page content including the abstract