Class Tokenizer

  • All Implemented Interfaces:
    org.xml.sax.Locator
    Direct Known Subclasses:
    ErrorReportingTokenizer

    public class Tokenizer
    extends java.lang.Object
    implements org.xml.sax.Locator
    An implementation of http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html This class implements the Locator interface. This is not an incidental implementation detail: Users of this class are encouraged to make use of the Locator nature. By default, the tokenizer may report data that XML 1.0 bans. The tokenizer can be configured to treat these conditions as fatal or to coerce the infoset to something that XML 1.0 allows.
    Version:
    $Id$
    Author:
    hsivonen
    • Constructor Detail

      • Tokenizer

        public Tokenizer​(TokenHandler tokenHandler,
                         boolean newAttributesEachTime)
      • Tokenizer

        public Tokenizer​(TokenHandler tokenHandler)
        The constructor.
        Parameters:
        tokenHandler - the handler for receiving tokens
    • Method Detail

      • setInterner

        public void setInterner​(Interner interner)
      • initLocation

        public void initLocation​(java.lang.String newPublicId,
                                 java.lang.String newSystemId)
      • isMappingLangToXmlLang

        public boolean isMappingLangToXmlLang()
        Returns the mappingLangToXmlLang.
        Returns:
        the mappingLangToXmlLang
      • setMappingLangToXmlLang

        public void setMappingLangToXmlLang​(boolean mappingLangToXmlLang)
        Sets the mappingLangToXmlLang.
        Parameters:
        mappingLangToXmlLang - the mappingLangToXmlLang to set
      • setErrorHandler

        public void setErrorHandler​(org.xml.sax.ErrorHandler eh)
        Sets the error handler.
        See Also:
        XMLReader.setErrorHandler(org.xml.sax.ErrorHandler)
      • getErrorHandler

        public org.xml.sax.ErrorHandler getErrorHandler()
      • setCommentPolicy

        public void setCommentPolicy​(XmlViolationPolicy commentPolicy)
        Sets the commentPolicy.
        Parameters:
        commentPolicy - the commentPolicy to set
      • setContentNonXmlCharPolicy

        public void setContentNonXmlCharPolicy​(XmlViolationPolicy contentNonXmlCharPolicy)
        Sets the contentNonXmlCharPolicy.
        Parameters:
        contentNonXmlCharPolicy - the contentNonXmlCharPolicy to set
      • setContentSpacePolicy

        public void setContentSpacePolicy​(XmlViolationPolicy contentSpacePolicy)
        Sets the contentSpacePolicy.
        Parameters:
        contentSpacePolicy - the contentSpacePolicy to set
      • setXmlnsPolicy

        public void setXmlnsPolicy​(XmlViolationPolicy xmlnsPolicy)
        Sets the xmlnsPolicy.
        Parameters:
        xmlnsPolicy - the xmlnsPolicy to set
      • setHtml4ModeCompatibleWithXhtml1Schemata

        public void setHtml4ModeCompatibleWithXhtml1Schemata​(boolean html4ModeCompatibleWithXhtml1Schemata)
        Sets the html4ModeCompatibleWithXhtml1Schemata.
        Parameters:
        html4ModeCompatibleWithXhtml1Schemata - the html4ModeCompatibleWithXhtml1Schemata to set
      • setStateAndEndTagExpectation

        public void setStateAndEndTagExpectation​(int specialTokenizerState,
                                                 java.lang.String endTagExpectation)
        Sets the tokenizer state and the associated element name. This should only ever used to put the tokenizer into one of the states that have a special end tag expectation.
        Parameters:
        specialTokenizerState - the tokenizer state to set
        endTagExpectation - the expected end tag for transitioning back to normal
      • setStateAndEndTagExpectation

        public void setStateAndEndTagExpectation​(int specialTokenizerState,
                                                 ElementName endTagExpectation)
        Sets the tokenizer state and the associated element name. This should only ever used to put the tokenizer into one of the states that have a special end tag expectation.
        Parameters:
        specialTokenizerState - the tokenizer state to set
        endTagExpectation - the expected end tag for transitioning back to normal
      • setLineNumber

        public void setLineNumber​(int line)
        For C++ use only.
      • getLineNumber

        public int getLineNumber()
        Specified by:
        getLineNumber in interface org.xml.sax.Locator
        See Also:
        Locator.getLineNumber()
      • getColumnNumber

        public int getColumnNumber()
        Specified by:
        getColumnNumber in interface org.xml.sax.Locator
        See Also:
        Locator.getColumnNumber()
      • getPublicId

        public java.lang.String getPublicId()
        Specified by:
        getPublicId in interface org.xml.sax.Locator
        See Also:
        Locator.getPublicId()
      • getSystemId

        public java.lang.String getSystemId()
        Specified by:
        getSystemId in interface org.xml.sax.Locator
        See Also:
        Locator.getSystemId()
      • notifyAboutMetaBoundary

        public void notifyAboutMetaBoundary()
      • strBufToString

        protected java.lang.String strBufToString()
        The smaller buffer as a String. Currently only used for error reporting.

        C++ memory note: The return value must be released.

        Returns:
        the smaller buffer as a string
      • flushChars

        protected void flushChars​(char[] buf,
                                  int pos)
                           throws org.xml.sax.SAXException
        Flushes coalesced character tokens.
        Parameters:
        buf - TODO
        pos - TODO
        Throws:
        org.xml.sax.SAXException
      • fatal

        public void fatal​(java.lang.String message)
                   throws org.xml.sax.SAXException
        Reports an condition that would make the infoset incompatible with XML 1.0 as fatal.
        Parameters:
        message - the message
        Throws:
        org.xml.sax.SAXException
        org.xml.sax.SAXParseException
      • err

        public void err​(java.lang.String message)
                 throws org.xml.sax.SAXException
        Reports a Parse Error.
        Parameters:
        message - the message
        Throws:
        org.xml.sax.SAXException
      • errTreeBuilder

        public void errTreeBuilder​(java.lang.String message)
                            throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • warn

        public void warn​(java.lang.String message)
                  throws org.xml.sax.SAXException
        Reports a warning
        Parameters:
        message - the message
        Throws:
        org.xml.sax.SAXException
      • startErrorReporting

        protected void startErrorReporting()
                                    throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • start

        public void start()
                   throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • tokenizeBuffer

        public boolean tokenizeBuffer​(UTF16Buffer buffer)
                               throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • transition

        protected int transition​(int from,
                                 int to,
                                 boolean reconsume,
                                 int pos)
                          throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • silentCarriageReturn

        protected void silentCarriageReturn()
      • silentLineFeed

        protected void silentLineFeed()
      • eof

        public void eof()
                 throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • checkChar

        protected char checkChar​(char[] buf,
                                 int pos)
                          throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • isAlreadyComplainedAboutNonAscii

        public boolean isAlreadyComplainedAboutNonAscii()
        Returns the alreadyComplainedAboutNonAscii.
        Returns:
        the alreadyComplainedAboutNonAscii
      • internalEncodingDeclaration

        public boolean internalEncodingDeclaration​(java.lang.String internalCharset)
                                            throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • end

        public void end()
                 throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • requestSuspension

        public void requestSuspension()
      • becomeConfident

        public void becomeConfident()
      • isNextCharOnNewLine

        public boolean isNextCharOnNewLine()
        Returns the nextCharOnNewLine.
        Returns:
        the nextCharOnNewLine
      • isPrevCR

        public boolean isPrevCR()
      • getLine

        public int getLine()
        Returns the line.
        Returns:
        the line
      • getCol

        public int getCol()
        Returns the col.
        Returns:
        the col
      • isInDataState

        public boolean isInDataState()
      • resetToDataState

        public void resetToDataState()
      • loadState

        public void loadState​(Tokenizer other)
                       throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • initializeWithoutStarting

        public void initializeWithoutStarting()
                                       throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errGarbageAfterLtSlash

        protected void errGarbageAfterLtSlash()
                                       throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errLtSlashGt

        protected void errLtSlashGt()
                             throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errWarnLtSlashInRcdata

        protected void errWarnLtSlashInRcdata()
                                       throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errHtml4LtSlashInRcdata

        protected void errHtml4LtSlashInRcdata​(char folded)
                                        throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errCharRefLacksSemicolon

        protected void errCharRefLacksSemicolon()
                                         throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNoDigitsInNCR

        protected void errNoDigitsInNCR()
                                 throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errGtInSystemId

        protected void errGtInSystemId()
                                throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errGtInPublicId

        protected void errGtInPublicId()
                                throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNamelessDoctype

        protected void errNamelessDoctype()
                                   throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errConsecutiveHyphens

        protected void errConsecutiveHyphens()
                                      throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errPrematureEndOfComment

        protected void errPrematureEndOfComment()
                                         throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errBogusComment

        protected void errBogusComment()
                                throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errUnquotedAttributeValOrNull

        protected void errUnquotedAttributeValOrNull​(char c)
                                              throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errSlashNotFollowedByGt

        protected void errSlashNotFollowedByGt()
                                        throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errHtml4XmlVoidSyntax

        protected void errHtml4XmlVoidSyntax()
                                      throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNoSpaceBetweenAttributes

        protected void errNoSpaceBetweenAttributes()
                                            throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errHtml4NonNameInUnquotedAttribute

        protected void errHtml4NonNameInUnquotedAttribute​(char c)
                                                   throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errLtOrEqualsOrGraveInUnquotedAttributeOrNull

        protected void errLtOrEqualsOrGraveInUnquotedAttributeOrNull​(char c)
                                                              throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errAttributeValueMissing

        protected void errAttributeValueMissing()
                                         throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errBadCharBeforeAttributeNameOrNull

        protected void errBadCharBeforeAttributeNameOrNull​(char c)
                                                    throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errEqualsSignBeforeAttributeName

        protected void errEqualsSignBeforeAttributeName()
                                                 throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errBadCharAfterLt

        protected void errBadCharAfterLt​(char c)
                                  throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errLtGt

        protected void errLtGt()
                        throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errProcessingInstruction

        protected void errProcessingInstruction()
                                         throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errUnescapedAmpersandInterpretedAsCharacterReference

        protected void errUnescapedAmpersandInterpretedAsCharacterReference()
                                                                     throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNotSemicolonTerminated

        protected void errNotSemicolonTerminated()
                                          throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNoNamedCharacterMatch

        protected void errNoNamedCharacterMatch()
                                         throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errQuoteBeforeAttributeName

        protected void errQuoteBeforeAttributeName​(char c)
                                            throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errQuoteOrLtInAttributeNameOrNull

        protected void errQuoteOrLtInAttributeNameOrNull​(char c)
                                                  throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errExpectedPublicId

        protected void errExpectedPublicId()
                                    throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errBogusDoctype

        protected void errBogusDoctype()
                                throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • maybeWarnPrivateUseAstral

        protected void maybeWarnPrivateUseAstral()
                                          throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • maybeWarnPrivateUse

        protected void maybeWarnPrivateUse​(char ch)
                                    throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • maybeErrAttributesOnEndTag

        protected void maybeErrAttributesOnEndTag​(HtmlAttributes attrs)
                                           throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • maybeErrSlashInEndTag

        protected void maybeErrSlashInEndTag​(boolean selfClosing)
                                      throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNcrNonCharacter

        protected char errNcrNonCharacter​(char ch)
                                   throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errAstralNonCharacter

        protected void errAstralNonCharacter​(int ch)
                                      throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNcrSurrogate

        protected void errNcrSurrogate()
                                throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNcrControlChar

        protected char errNcrControlChar​(char ch)
                                  throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNcrCr

        protected void errNcrCr()
                         throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNcrInC1Range

        protected void errNcrInC1Range()
                                throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errEofInPublicId

        protected void errEofInPublicId()
                                 throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errEofInComment

        protected void errEofInComment()
                                throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errEofInDoctype

        protected void errEofInDoctype()
                                throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errEofInAttributeValue

        protected void errEofInAttributeValue()
                                       throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errEofInAttributeName

        protected void errEofInAttributeName()
                                      throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errEofWithoutGt

        protected void errEofWithoutGt()
                                throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errEofInTagName

        protected void errEofInTagName()
                                throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errEofInEndTag

        protected void errEofInEndTag()
                               throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errEofAfterLt

        protected void errEofAfterLt()
                              throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNcrOutOfRange

        protected void errNcrOutOfRange()
                                 throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNcrUnassigned

        protected void errNcrUnassigned()
                                 throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errDuplicateAttribute

        protected void errDuplicateAttribute()
                                      throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errEofInSystemId

        protected void errEofInSystemId()
                                 throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errExpectedSystemId

        protected void errExpectedSystemId()
                                    throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errMissingSpaceBeforeDoctypeName

        protected void errMissingSpaceBeforeDoctypeName()
                                                 throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errHyphenHyphenBang

        protected void errHyphenHyphenBang()
                                    throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNcrControlChar

        protected void errNcrControlChar()
                                  throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNcrZero

        protected void errNcrZero()
                           throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNoSpaceBetweenDoctypeSystemKeywordAndQuote

        protected void errNoSpaceBetweenDoctypeSystemKeywordAndQuote()
                                                              throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNoSpaceBetweenPublicAndSystemIds

        protected void errNoSpaceBetweenPublicAndSystemIds()
                                                    throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • errNoSpaceBetweenDoctypePublicKeywordAndQuote

        protected void errNoSpaceBetweenDoctypePublicKeywordAndQuote()
                                                              throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • noteAttributeWithoutValue

        protected void noteAttributeWithoutValue()
                                          throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • noteUnquotedAttributeValue

        protected void noteUnquotedAttributeValue()
                                           throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • setEncodingDeclarationHandler

        public void setEncodingDeclarationHandler​(EncodingDeclarationHandler encodingDeclarationHandler)
        Sets the encodingDeclarationHandler.
        Parameters:
        encodingDeclarationHandler - the encodingDeclarationHandler to set
      • setTransitionBaseOffset

        public void setTransitionBaseOffset​(int offset)
        Sets an offset to be added to the position reported to TransitionHandler.
        Parameters:
        offset - the offset