new-words

changeset 15:c6efd17741aa

Точка, следующая за Mr./Mrs. не считается окончанием предложения
author Igor Chubin <igor@chub.in>
date Sun Apr 04 14:12:35 2010 +0300 (2010-04-04)
parents 9b18c7efe31c
children c65ffd60cc18
files grep-sentences.pl
line diff
     1.1 --- a/grep-sentences.pl	Sun Apr 04 12:54:46 2010 +0300
     1.2 +++ b/grep-sentences.pl	Sun Apr 04 14:12:35 2010 +0300
     1.3 @@ -9,8 +9,10 @@
     1.4      $text=<PAGE>;
     1.5      $text =~ s@http://[a-zA-Z&_.:/0-9%?=,\#+()\[\]~-]*@@g;
     1.6      $text =~ s@\n@@g;
     1.7 +    $text =~ s@(Mr|Mrs)\.@\1POINT@g;
     1.8      @sentences=split /\./, $text;
     1.9      for (@sentences) {
    1.10 +         s@(Mr|Mrs)POINT@\1.@g;
    1.11          s/^\s*//;
    1.12          s/\s*$//;
    1.13          s/\[[0-9]+\]//g;