Plucene::Index::Writer - write an index.
my $writer = Plucene::Index::Writer->new($path, $analyser, $create);
$writer->add_document($doc);
$writer->add_indexes(@dirs);
$writer->optimize; # called before close
my $doc_count = $writer->doc_count;
my $mergefactor = $writer->mergefactor;
$writer->set_mergefactor($value);
This is the writer class.
If an index will not have more documents added for a while and optimal search
performance is desired, then the optimize method should be called before the
index is closed.
my $writer = Plucene::Index::Writer->new($path, $analyser, $create);
This will create a new Plucene::Index::Writer object.
The third argument to the constructor determines whether a new index is
created, or whether an existing index is opened for the addition of new
documents.
my $mergefactor = $writer->mergefactor;
$writer->set_mergefactor($value);
Get / set the mergefactor. It defaults to 5.
my $doc_count = $writer->doc_count;
$writer->add_document($doc);
Adds a document to the index. After the document has been added, a merge takes
place if there are more than $Plucene::Index::Writer::mergefactor segments
in the index. This defaults to 10, but can be set to whatever value is optimal
for your application.
=cut
| sub add_document {
| | my ($self, $doc) = @_; |
my $dw = Plucene::Index::DocumentWriter->new($self->{tmp_directory},
$self->{analyzer}, MAX_FIELD_LENGTH);
my $segname = $self->_new_segname;
$dw->add_document($segname, $doc);
#lock $self;
$self->{segmentinfos}->add_element(
Plucene::Index::SegmentInfo->new({
name => $segname,
doc_count => 1,
dir => $self->{tmp_directory} }));
$self->_maybe_merge_segments;
}
sub _new_segname {
``_'' . $_[0]->{segmentinfos}->{counter}++ # Urgh
}
sub _flush {
my $self = shift;
my @segs = $self->{segmentinfos}->segments;
my $min_segment = $#segs;
my $doc_count = 0;
while ($min_segment >= 0
and $segs[$min_segment]->dir eq $self->{tmp_directory}) {
$doc_count += $segs[$min_segment]->doc_count;
$min_segment--;
}
if ( $min_segment < 0
or ($doc_count + $segs[$min_segment]->doc_count > $self->mergefactor)
or !($segs[-1]->dir eq $self->{tmp_directory})) {
$min_segment++;
}
return if $min_segment > @segs;
$self->_merge_segments($min_segment);
}
$writer->optimize;
Merges all segments together into a single segment, optimizing an index
for search. This should be the last method called on an indexer, as it
invalidates the writer object.
$writer->add_indexes(@dirs);
Merges all segments from an array of indexes into this index.
This may be used to parallelize batch indexing. A large document
collection can be broken into sub-collections. Each sub-collection can be
indexed in parallel, on a different thread, process or machine. The
complete index can then be created by merging sub-collection indexes
with this method.
After this completes, the index is optimized.
|