Class CodepointCountFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.FilteringTokenFilter
org.apache.lucene.analysis.miscellaneous.CodepointCountFilter
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Unwrappable<TokenStream>
Removes words that are too long or too short from the stream.
Note: Length is calculated as the number of Unicode codepoints.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final int
private final int
private final CharTermAttribute
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
Constructor Summary
ConstructorsConstructorDescriptionCodepointCountFilter
(TokenStream in, int min, int max) Create a newCodepointCountFilter
. -
Method Summary
Modifier and TypeMethodDescriptionboolean
accept()
Override this method and return if the current input token should be returned byFilteringTokenFilter.incrementToken()
.Methods inherited from class org.apache.lucene.analysis.FilteringTokenFilter
end, incrementToken, reset
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, unwrap
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Field Details
-
min
private final int min -
max
private final int max -
termAtt
-
-
Constructor Details
-
CodepointCountFilter
Create a newCodepointCountFilter
. This will filter out tokens whoseCharTermAttribute
is either too short (Character.codePointCount(char[], int, int)
< min) or too long (Character.codePointCount(char[], int, int)
> max).- Parameters:
in
- theTokenStream
to consumemin
- the minimum lengthmax
- the maximum length
-
-
Method Details
-
accept
public boolean accept()Description copied from class:FilteringTokenFilter
Override this method and return if the current input token should be returned byFilteringTokenFilter.incrementToken()
.- Specified by:
accept
in classFilteringTokenFilter
-