Perl Diver 2.33
Main Environment Variables Perl Default Values Perl Config - Summary Perl Config - Full Installed Modules List Directory uptime Docs

Module Documentation
Details and documentation about a specific module, including version and documentation (if available). Note that while links to perldoc.com and search.cpan.org are provided, the module may be part of a larger distribution. If you reach a File Not Found page on either site, please try the parent module.

Plucene::Analysis::CharTokenizer

Name Plucene::Analysis::CharTokenizer
Version
Located at /usr/share/perl5
File /usr/share/perl5/Plucene/Analysis/CharTokenizer.pm
Is Core No
Search CPAN for this module Plucene::Analysis::CharTokenizer
Documentation Plucene::Analysis::CharTokenizer
Module Details Plucene::Analysis::CharTokenizer

NAME

Plucene::Analysis::CharTokenizer - base class for character tokenisers


SYNOPSIS

        # isa Plucene::Analysis::Tokenizer
        my $next = $chartokenizer->next;

=head1 DESCRIPTION

This is an abstract base class for simple, character-oriented tokenizers.


METHODS

token_re

This should be defined in subclasses.

normalize

This will normalise the character before it is added to the token.

next

        my $next = $chartokenizer->next;

This will return the next token in the string, or undef at the end of the string.


=cut

sub next {
my $self = shift;
my $re = $self->token_re();
my $fh = $self->{reader};
retry:
if (!defined $self->{buffer} or !length $self->{buffer}) {
return if eof($fh);
$self->{start} = tell($fh);
$self->{buffer} .= <$fh>;
}
return unless length $self->{buffer};

        if ($self->{buffer} =~ s/(.*?)($re)//) {
                $self->{start} += length $1;
                my $word = $self->normalize($2);
                my $rv   = Plucene::Analysis::Token->new(
                        text  => $word,
                        start => $self->{start},
                        end   => ($self->{start} + length($word)));
                $self->{start} += length($word);
                return $rv;
        }
        # No match, rest of buffer is useless.
        $self->{buffer} = "";
        # But we should try for some more text
        goto retry;
}

1;

Perl Diver brought to you by ScriptSolutions.com © 1997- 2026