gnu.xml.pipeline
Class LinkFilter
- ContentHandler, ContentHandler2, DeclHandler, DTDHandler, EventConsumer, LexicalHandler
Pipeline filter to remember XHTML links found in a document,
so they can later be crawled. Fragments are not counted, and duplicates
are ignored. Callers are responsible for filtering out URLs they aren't
interested in. Events are passed through unmodified.
Input MUST include a setDocumentLocator() call, as it's used to
resolve relative links in the absence of a "base" element. Input MUST
also include namespace identifiers, since it is the XHTML namespace
identifier which is used to identify the relevant elements.
FIXME: handle xml:base attribute ... in association with
a stack of base URIs. Similarly, recognize/support XLink data.
LinkFilter() - Constructs a new event filter, which collects links in private data
structure for later enumeration.
|
LinkFilter(EventConsumer next) - Constructs a new event filter, which collects links in private data
structure for later enumeration and passes all events, unmodified,
to the next consumer.
|
void | endDocument() - Forgets about any base URI information that may be recorded.
|
Enumeration | getLinks() - Returns an enumeration of the links found since the filter
was constructed, or since removeAllLinks() was called.
|
void | removeAllLinks() - Removes records about all links reported to the event
stream, as if the filter were newly created.
|
void | startDocument() - Reports an error if no Locator has been made available.
|
void | startElement(String uri, String localName, String qName, Attributes atts) - Collects URIs for (X)HTML content from elements which hold them.
|
attributeDecl , bind , chainTo , characters , comment , elementDecl , endCDATA , endDTD , endDocument , endElement , endEntity , endPrefixMapping , externalEntityDecl , getContentHandler , getDTDHandler , getDocumentLocator , getErrorHandler , getNext , getProperty , ignorableWhitespace , internalEntityDecl , notationDecl , processingInstruction , setContentHandler , setDTDHandler , setDocumentLocator , setErrorHandler , setProperty , skippedEntity , startCDATA , startDTD , startDocument , startElement , startEntity , startPrefixMapping , unparsedEntityDecl , xmlDecl |
LinkFilter
public LinkFilter()
Constructs a new event filter, which collects links in private data
structure for later enumeration.
LinkFilter
public LinkFilter(EventConsumer next)
Constructs a new event filter, which collects links in private data
structure for later enumeration and passes all events, unmodified,
to the next consumer.
endDocument
public void endDocument()
throws SAXException
Forgets about any base URI information that may be recorded.
Applications will often want to call removeAllLinks(), likely
after examining the links which were reported.
- endDocument in interface ContentHandler
- endDocument in interface EventFilter
getLinks
public Enumeration getLinks()
Returns an enumeration of the links found since the filter
was constructed, or since removeAllLinks() was called.
- enumeration of strings.
removeAllLinks
public void removeAllLinks()
Removes records about all links reported to the event
stream, as if the filter were newly created.