|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.xml.dtm.ref.IncrementalSAXSource_Filter
IncrementalSAXSource_Filter implements IncrementalSAXSource, using a standard SAX2 event source as its input and parcelling out those events gradually in reponse to deliverMoreNodes() requests. Output from the filter will be passed along to a SAX handler registered as our listener, but those callbacks will pass through a counting stage which periodically yields control back to the controller coroutine.
%REVIEW%: This filter is not currenly intended to be reusable for parsing additional streams/documents. We may want to consider making it resettable at some point in the future. But it's a small object, so that'd be mostly a convenience issue; the cost of allocating each time is trivial compared to the cost of processing any nontrival stream.
For a brief usage example, see the unit-test main() method.
This is a simplification of the old CoroutineSAXParser, focusing specifically on filtering. The resulting controller protocol is _far_ simpler and less error-prone; the only controller operation is deliverMoreNodes(), and the only requirement is that deliverMoreNodes(false) be called if you want to discard the rest of the stream and the previous deliverMoreNodes() didn't return false.
Constructor Summary | |
IncrementalSAXSource_Filter()
|
|
IncrementalSAXSource_Filter(CoroutineManager co,
int controllerCoroutineID)
Create a IncrementalSAXSource_Filter which is not yet bound to a specific SAX event source. |
Method Summary | |
void |
characters(char[] ch,
int start,
int length)
Receive notification of character data. |
void |
comment(char[] ch,
int start,
int length)
Report an XML comment anywhere in the document. |
static IncrementalSAXSource |
createIncrementalSAXSource(CoroutineManager co,
int controllerCoroutineID)
|
Object |
deliverMoreNodes(boolean parsemore)
deliverMoreNodes() is a simple API which tells the coroutine parser that we need more nodes. |
void |
endCDATA()
Report the end of a CDATA section. |
void |
endDocument()
Receive notification of the end of a document. |
void |
endDTD()
Report the end of DTD declarations. |
void |
endElement(String namespaceURI,
String localName,
String qName)
Receive notification of the end of an element. |
void |
endEntity(String name)
Report the end of an entity. |
void |
endPrefixMapping(String prefix)
End the scope of a prefix-URI mapping. |
void |
error(SAXParseException exception)
Receive notification of a recoverable error. |
void |
fatalError(SAXParseException exception)
Receive notification of a non-recoverable error. |
int |
getControllerCoroutineID()
|
CoroutineManager |
getCoroutineManager()
|
int |
getSourceCoroutineID()
|
void |
ignorableWhitespace(char[] ch,
int start,
int length)
Receive notification of ignorable whitespace in element content. |
void |
init(CoroutineManager co,
int controllerCoroutineID,
int sourceCoroutineID)
|
void |
notationDecl(String a,
String b,
String c)
Receive notification of a notation declaration event. |
void |
processingInstruction(String target,
String data)
Receive notification of a processing instruction. |
void |
run()
|
void |
setContentHandler(ContentHandler handler)
Register a SAX-style content handler for us to output to |
void |
setDocumentLocator(Locator locator)
Receive an object for locating the origin of SAX document events. |
void |
setDTDHandler(DTDHandler handler)
Register a SAX-style DTD handler for us to output to |
void |
setErrHandler(ErrorHandler handler)
|
void |
setLexicalHandler(LexicalHandler handler)
Register a SAX-style lexical handler for us to output to |
void |
setReturnFrequency(int events)
|
void |
setXMLReader(XMLReader eventsource)
Bind our input streams to an XMLReader. |
void |
skippedEntity(String name)
Receive notification of a skipped entity. |
void |
startCDATA()
Report the start of a CDATA section. |
void |
startDocument()
Receive notification of the beginning of a document. |
void |
startDTD(String name,
String publicId,
String systemId)
Report the start of DTD declarations, if any. |
void |
startElement(String namespaceURI,
String localName,
String qName,
Attributes atts)
Receive notification of the beginning of an element. |
void |
startEntity(String name)
Report the beginning of some internal and external XML entities. |
void |
startParse(InputSource source)
Launch a thread that will run an XMLReader's parse() operation within a thread, feeding events to this IncrementalSAXSource_Filter. |
void |
startPrefixMapping(String prefix,
String uri)
Begin the scope of a prefix-URI Namespace mapping. |
void |
unparsedEntityDecl(String a,
String b,
String c,
String d)
Receive notification of an unparsed entity declaration event. |
void |
warning(SAXParseException exception)
Receive notification of a warning. |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public IncrementalSAXSource_Filter()
public IncrementalSAXSource_Filter(CoroutineManager co, int controllerCoroutineID)
Method Detail |
public static IncrementalSAXSource createIncrementalSAXSource(CoroutineManager co, int controllerCoroutineID)
public void init(CoroutineManager co, int controllerCoroutineID, int sourceCoroutineID)
public void setXMLReader(XMLReader eventsource)
public void setContentHandler(ContentHandler handler)
IncrementalSAXSource
setContentHandler
in interface IncrementalSAXSource
public void setDTDHandler(DTDHandler handler)
IncrementalSAXSource
setDTDHandler
in interface IncrementalSAXSource
public void setLexicalHandler(LexicalHandler handler)
IncrementalSAXSource
setLexicalHandler
in interface IncrementalSAXSource
public void setErrHandler(ErrorHandler handler)
public void setReturnFrequency(int events)
public void characters(char[] ch, int start, int length) throws SAXException
ContentHandler
The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.
The application must not attempt to read from the array outside of the specified range.
Individual characters may consist of more than one Java
char
value. There are two important cases where this
happens, because characters can't be represented in just sixteen bits.
In one case, characters are represented in a Surrogate Pair,
using two special Unicode values. Such characters are in the so-called
"Astral Planes", with a code point above U+FFFF. A second case involves
composite characters, such as a base character combining with one or
more accent characters.
Your code should not assume that algorithms using
char
-at-a-time idioms will be working in character
units; in some cases they will split characters. This is relevant
wherever XML permits arbitrary characters, such as attribute values,
processing instruction data, and comments as well as in data reported
from this method. It's also generally relevant whenever Java code
manipulates internationalized text; the issue isn't unique to XML.
Note that some parsers will report whitespace in element
content using the ignorableWhitespace
method rather than this one (validating parsers must
do so).
characters
in interface ContentHandler
ch
- The characters from the XML document.start
- The start position in the array.length
- The number of characters to read from the array.
SAXException
- Any SAX exception, possibly
wrapping another exception.ContentHandler.ignorableWhitespace(char[], int, int)
,
Locator
public void endDocument() throws SAXException
ContentHandler
The SAX parser will invoke this method only once, and it will be the last method invoked during the parse. The parser shall not invoke this method until it has either abandoned parsing (because of an unrecoverable error) or reached the end of input.
endDocument
in interface ContentHandler
SAXException
- Any SAX exception, possibly
wrapping another exception.ContentHandler.startDocument()
public void endElement(String namespaceURI, String localName, String qName) throws SAXException
ContentHandler
The SAX parser will invoke this method at the end of every
element in the XML document; there will be a corresponding
startElement
event for every endElement
event (even when the element is empty).
For information on the names, see startElement.
endElement
in interface ContentHandler
namespaceURI
- The Namespace URI, or the empty string if the
element has no Namespace URI or if Namespace
processing is not being performed.localName
- The local name (without prefix), or the
empty string if Namespace processing is not being
performed.qName
- The qualified XML 1.0 name (with prefix), or the
empty string if qualified names are not available.
SAXException
- Any SAX exception, possibly
wrapping another exception.public void endPrefixMapping(String prefix) throws SAXException
ContentHandler
See startPrefixMapping
for
details. These events will always occur immediately after the
corresponding endElement
event, but the order of
endPrefixMapping
events is not otherwise
guaranteed.
endPrefixMapping
in interface ContentHandler
prefix
- The prefix that was being mapping.
This is the empty string when a default mapping scope ends.
SAXException
- The client may throw
an exception during processing.ContentHandler.startPrefixMapping(java.lang.String, java.lang.String)
,
ContentHandler.endElement(java.lang.String, java.lang.String, java.lang.String)
public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException
ContentHandler
Validating Parsers must use this method to report each chunk of whitespace in element content (see the W3C XML 1.0 recommendation, section 2.10): non-validating parsers may also use this method if they are capable of parsing and using content models.
SAX parsers may return all contiguous whitespace in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity, so that the Locator provides useful information.
The application must not attempt to read from the array outside of the specified range.
ignorableWhitespace
in interface ContentHandler
ch
- The characters from the XML document.start
- The start position in the array.length
- The number of characters to read from the array.
SAXException
- Any SAX exception, possibly
wrapping another exception.ContentHandler.characters(char[], int, int)
public void processingInstruction(String target, String data) throws SAXException
ContentHandler
The Parser will invoke this method once for each processing instruction found: note that processing instructions may occur before or after the main document element.
A SAX parser must never report an XML declaration (XML 1.0, section 2.8) or a text declaration (XML 1.0, section 4.3.1) using this method.
Like characters()
, processing instruction
data may have characters that need more than one char
value.
processingInstruction
in interface ContentHandler
target
- The processing instruction target.data
- The processing instruction data, or null if
none was supplied. The data does not include any
whitespace separating it from the target.
SAXException
- Any SAX exception, possibly
wrapping another exception.public void setDocumentLocator(Locator locator)
ContentHandler
SAX parsers are strongly encouraged (though not absolutely required) to supply a locator: if it does so, it must supply the locator to the application by invoking this method before invoking any of the other methods in the ContentHandler interface.
The locator allows the application to determine the end position of any document-related event, even if the parser is not reporting an error. Typically, the application will use this information for reporting its own errors (such as character content that does not match an application's business rules). The information returned by the locator is probably not sufficient for use with a search engine.
Note that the locator will return correct information only during the invocation of the events in this interface. The application should not attempt to use it at any other time.
setDocumentLocator
in interface ContentHandler
locator
- An object that can return the location of
any SAX document event.Locator
public void skippedEntity(String name) throws SAXException
ContentHandler
The Parser will invoke this method each time the entity is
skipped. Non-validating processors may skip entities if they
have not seen the declarations (because, for example, the
entity was declared in an external DTD subset). All processors
may skip external entities, depending on the values of the
http://xml.org/sax/features/external-general-entities
and the
http://xml.org/sax/features/external-parameter-entities
properties.
skippedEntity
in interface ContentHandler
name
- The name of the skipped entity. If it is a
parameter entity, the name will begin with '%', and if
it is the external DTD subset, it will be the string
"[dtd]".
SAXException
- Any SAX exception, possibly
wrapping another exception.public void startDocument() throws SAXException
ContentHandler
The SAX parser will invoke this method only once, before any
other event callbacks (except for setDocumentLocator
).
startDocument
in interface ContentHandler
SAXException
- Any SAX exception, possibly
wrapping another exception.ContentHandler.endDocument()
public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException
ContentHandler
The Parser will invoke this method at the beginning of every
element in the XML document; there will be a corresponding
endElement
event for every startElement event
(even when the element is empty). All of the element's content will be
reported, in order, before the corresponding endElement
event.
This event allows up to three name components for each element:
Any or all of these may be provided, depending on the values of the http://xml.org/sax/features/namespaces and the http://xml.org/sax/features/namespace-prefixes properties:
Note that the attribute list provided will contain only
attributes with explicit values (specified or defaulted):
#IMPLIED attributes will be omitted. The attribute list
will contain attributes used for Namespace declarations
(xmlns* attributes) only if the
http://xml.org/sax/features/namespace-prefixes
property is true (it is false by default, and support for a
true value is optional).
Like characters()
, attribute values may have
characters that need more than one char
value.
startElement
in interface ContentHandler
namespaceURI
- The Namespace URI, or the empty string if the
element has no Namespace URI or if Namespace
processing is not being performed.localName
- The local name (without prefix), or the
empty string if Namespace processing is not being
performed.qName
- The qualified name (with prefix), or the
empty string if qualified names are not available.atts
- The attributes attached to the element. If
there are no attributes, it shall be an empty
Attributes object.
SAXException
- Any SAX exception, possibly
wrapping another exception.ContentHandler.endElement(java.lang.String, java.lang.String, java.lang.String)
,
Attributes
public void startPrefixMapping(String prefix, String uri) throws SAXException
ContentHandler
The information from this event is not necessary for
normal Namespace processing: the SAX XML reader will
automatically replace prefixes for element and attribute
names when the http://xml.org/sax/features/namespaces
feature is true (the default).
There are cases, however, when applications need to use prefixes in character data or in attribute values, where they cannot safely be expanded automatically; the start/endPrefixMapping event supplies the information to the application to expand prefixes in those contexts itself, if necessary.
Note that start/endPrefixMapping events are not
guaranteed to be properly nested relative to each other:
all startPrefixMapping events will occur immediately before the
corresponding startElement
event,
and all endPrefixMapping
events will occur immediately after the corresponding
endElement
event,
but their order is not otherwise
guaranteed.
There should never be start/endPrefixMapping events for the "xml" prefix, since it is predeclared and immutable.
startPrefixMapping
in interface ContentHandler
prefix
- The Namespace prefix being declared.
An empty string is used for the default element namespace,
which has no prefix.uri
- The Namespace URI the prefix is mapped to.
SAXException
- The client may throw
an exception during processing.ContentHandler.endPrefixMapping(java.lang.String)
,
ContentHandler.startElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes)
public void comment(char[] ch, int start, int length) throws SAXException
LexicalHandler
This callback will be used for comments inside or outside the document element, including comments in the external DTD subset (if read). Comments in the DTD must be properly nested inside start/endDTD and start/endEntity events (if used).
comment
in interface LexicalHandler
ch
- An array holding the characters in the comment.start
- The starting position in the array.length
- The number of characters to use from the array.
SAXException
- The application may raise an exception.public void endCDATA() throws SAXException
LexicalHandler
endCDATA
in interface LexicalHandler
SAXException
- The application may raise an exception.LexicalHandler.startCDATA()
public void endDTD() throws SAXException
LexicalHandler
This method is intended to report the end of the DOCTYPE declaration; if the document has no DOCTYPE declaration, this method will not be invoked.
endDTD
in interface LexicalHandler
SAXException
- The application may raise an exception.LexicalHandler.startDTD(java.lang.String, java.lang.String, java.lang.String)
public void endEntity(String name) throws SAXException
LexicalHandler
endEntity
in interface LexicalHandler
name
- The name of the entity that is ending.
SAXException
- The application may raise an exception.LexicalHandler.startEntity(java.lang.String)
public void startCDATA() throws SAXException
LexicalHandler
The contents of the CDATA section will be reported through
the regular characters
event; this event is intended only to report
the boundary.
startCDATA
in interface LexicalHandler
SAXException
- The application may raise an exception.LexicalHandler.endCDATA()
public void startDTD(String name, String publicId, String systemId) throws SAXException
LexicalHandler
This method is intended to report the beginning of the DOCTYPE declaration; if the document has no DOCTYPE declaration, this method will not be invoked.
All declarations reported through
DTDHandler
or
DeclHandler
events must appear
between the startDTD and endDTD
events.
Declarations are assumed to belong to the internal DTD subset
unless they appear between startEntity
and endEntity
events. Comments and
processing instructions from the DTD should also be reported
between the startDTD and endDTD events, in their original
order of (logical) occurrence; they are not required to
appear in their correct locations relative to DTDHandler
or DeclHandler events, however.
Note that the start/endDTD events will appear within
the start/endDocument events from ContentHandler and
before the first
startElement
event.
startDTD
in interface LexicalHandler
name
- The document type name.publicId
- The declared public identifier for the
external DTD subset, or null if none was declared.systemId
- The declared system identifier for the
external DTD subset, or null if none was declared.
(Note that this is not resolved against the document
base URI.)
SAXException
- The application may raise an
exception.LexicalHandler.endDTD()
,
LexicalHandler.startEntity(java.lang.String)
public void startEntity(String name) throws SAXException
LexicalHandler
The reporting of parameter entities (including
the external DTD subset) is optional, and SAX2 drivers that
report LexicalHandler events may not implement it; you can use the
http://xml.org/sax/features/lexical-handler/parameter-entities
feature to query or control the reporting of parameter entities.
General entities are reported with their regular names, parameter entities have '%' prepended to their names, and the external DTD subset has the pseudo-entity name "[dtd]".
When a SAX2 driver is providing these events, all other
events must be properly nested within start/end entity
events. There is no additional requirement that events from
DeclHandler
or
DTDHandler
be properly ordered.
Note that skipped entities will be reported through the
skippedEntity
event, which is part of the ContentHandler interface.
Because of the streaming event model that SAX uses, some entity boundaries cannot be reported under any circumstances:
These will be silently expanded, with no indication of where the original entity boundaries were.
Note also that the boundaries of character references (which are not really entities anyway) are not reported.
All start/endEntity events must be properly nested.
startEntity
in interface LexicalHandler
name
- The name of the entity. If it is a parameter
entity, the name will begin with '%', and if it is the
external DTD subset, it will be "[dtd]".
SAXException
- The application may raise an exception.LexicalHandler.endEntity(java.lang.String)
,
DeclHandler.internalEntityDecl(java.lang.String, java.lang.String)
,
DeclHandler.externalEntityDecl(java.lang.String, java.lang.String, java.lang.String)
public void notationDecl(String a, String b, String c) throws SAXException
DTDHandler
It is up to the application to record the notation for later reference, if necessary; notations may appear as attribute values and in unparsed entity declarations, and are sometime used with processing instruction target names.
At least one of publicId and systemId must be non-null. If a system identifier is present, and it is a URL, the SAX parser must resolve it fully before passing it to the application through this event.
There is no guarantee that the notation declaration will be reported before any unparsed entities that use it.
notationDecl
in interface DTDHandler
a
- The notation name.b
- The notation's public identifier, or null if
none was given.c
- The notation's system identifier, or null if
none was given.
SAXException
- Any SAX exception, possibly
wrapping another exception.DTDHandler.unparsedEntityDecl(java.lang.String, java.lang.String, java.lang.String, java.lang.String)
,
Attributes
public void unparsedEntityDecl(String a, String b, String c, String d) throws SAXException
DTDHandler
Note that the notation name corresponds to a notation
reported by the notationDecl
event.
It is up to the application to record the entity for later
reference, if necessary;
unparsed entities may appear as attribute values.
If the system identifier is a URL, the parser must resolve it fully before passing it to the application.
unparsedEntityDecl
in interface DTDHandler
a
- The unparsed entity's name.b
- The entity's public identifier, or null if none
was given.c
- The entity's system identifier.d
- The name of the associated notation.
SAXException
- Any SAX exception, possibly
wrapping another exception.DTDHandler.notationDecl(java.lang.String, java.lang.String, java.lang.String)
,
Attributes
public void error(SAXParseException exception) throws SAXException
ErrorHandler
This corresponds to the definition of "error" in section 1.2 of the W3C XML 1.0 Recommendation. For example, a validating parser would use this callback to report the violation of a validity constraint. The default behaviour is to take no action.
The SAX parser must continue to provide normal parsing events after invoking this method: it should still be possible for the application to process the document through to the end. If the application cannot do so, then the parser should report a fatal error even if the XML 1.0 recommendation does not require it to do so.
Filters may use this method to report other, non-XML errors as well.
error
in interface ErrorHandler
exception
- The error information encapsulated in a
SAX parse exception.
SAXException
- Any SAX exception, possibly
wrapping another exception.SAXParseException
public void fatalError(SAXParseException exception) throws SAXException
ErrorHandler
This corresponds to the definition of "fatal error" in section 1.2 of the W3C XML 1.0 Recommendation. For example, a parser would use this callback to report the violation of a well-formedness constraint.
The application must assume that the document is unusable after the parser has invoked this method, and should continue (if at all) only for the sake of collecting addition error messages: in fact, SAX parsers are free to stop reporting any other events once this method has been invoked.
fatalError
in interface ErrorHandler
exception
- The error information encapsulated in a
SAX parse exception.
SAXException
- Any SAX exception, possibly
wrapping another exception.SAXParseException
public void warning(SAXParseException exception) throws SAXException
ErrorHandler
SAX parsers will use this method to report conditions that are not errors or fatal errors as defined by the XML 1.0 recommendation. The default behaviour is to take no action.
The SAX parser must continue to provide normal parsing events after invoking this method: it should still be possible for the application to process the document through to the end.
Filters may use this method to report other, non-XML warnings as well.
warning
in interface ErrorHandler
exception
- The warning information encapsulated in a
SAX parse exception.
SAXException
- Any SAX exception, possibly
wrapping another exception.SAXParseException
public int getSourceCoroutineID()
public int getControllerCoroutineID()
public CoroutineManager getCoroutineManager()
public void startParse(InputSource source) throws SAXException
startParse
in interface IncrementalSAXSource
SAXException
- is parse thread is already in progress
or parsing can not be started.public void run()
run
in interface Runnable
public Object deliverMoreNodes(boolean parsemore)
deliverMoreNodes
in interface IncrementalSAXSource
parsemore
- If true, tells the incremental filter to generate
another chunk of output. If false, tells the filter that we're
satisfied and it can terminate parsing of this document.
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |