Perl Diver 2.33

Main	Environment Variables	Perl Default Values	Perl Config - Summary	Perl Config - Full	Installed Modules	List Directory	uptime	Docs

Module Documentation
Details and documentation about a specific module, including version and documentation (if available). Note that while links to perldoc.com and search.cpan.org are provided, the module may be part of a larger distribution. If you reach a File Not Found page on either site, please try the parent module.

Plucene::Analysis::CharTokenizer

Name

Version

Located at

/usr/share/perl5

File

/usr/share/perl5/Plucene/Analysis/CharTokenizer.pm

Is Core

Search CPAN for this module

Plucene::Analysis::CharTokenizer

Documentation

Plucene::Analysis::CharTokenizer

Module Details

Plucene::Analysis::CharTokenizer

NAME
SYNOPSIS
METHODS

token_re
normalize
next

NAME

Plucene::Analysis::CharTokenizer - base class for character tokenisers

SYNOPSIS

        # isa Plucene::Analysis::Tokenizer

        my $next = $chartokenizer->next;


=head1 DESCRIPTION

This is an abstract base class for simple, character-oriented tokenizers.

METHODS

token_re

This should be defined in subclasses.

normalize

This will normalise the character before it is added to the token.

        my $next = $chartokenizer->next;

This will return the next token in the string, or undef at the end of the string.


=cut

sub next {
	my $self = shift;
	my $re = $self->token_re();
	my $fh = $self->{reader};
	retry:
	if (!defined $self->{buffer} or !length $self->{buffer}) {
	return if eof($fh);
	$self->{start} = tell($fh);
	$self->{buffer} .= <$fh>;
	}
	return unless length $self->{buffer};

        if ($self->{buffer} =~ s/(.*?)($re)//) {
                $self->{start} += length $1;
                my $word = $self->normalize($2);
                my $rv   = Plucene::Analysis::Token->new(
                        text  => $word,
                        start => $self->{start},
                        end   => ($self->{start} + length($word)));
                $self->{start} += length($word);
                return $rv;
        }

        # No match, rest of buffer is useless.
        $self->{buffer} = "";

        # But we should try for some more text
        goto retry;
}

Perl Diver brought to you by ScriptSolutions.com © 1997- 2026