new-words
changeset 15:c6efd17741aa
Точка, следующая за Mr./Mrs. не считается окончанием предложения
author | Igor Chubin <igor@chub.in> |
---|---|
date | Sun Apr 04 14:12:35 2010 +0300 (2010-04-04) |
parents | 9b18c7efe31c |
children | c65ffd60cc18 |
files | grep-sentences.pl |
line diff
1.1 --- a/grep-sentences.pl Sun Apr 04 12:54:46 2010 +0300 1.2 +++ b/grep-sentences.pl Sun Apr 04 14:12:35 2010 +0300 1.3 @@ -9,8 +9,10 @@ 1.4 $text=<PAGE>; 1.5 $text =~ s@http://[a-zA-Z&_.:/0-9%?=,\#+()\[\]~-]*@@g; 1.6 $text =~ s@\n@@g; 1.7 + $text =~ s@(Mr|Mrs)\.@\1POINT@g; 1.8 @sentences=split /\./, $text; 1.9 for (@sentences) { 1.10 + s@(Mr|Mrs)POINT@\1.@g; 1.11 s/^\s*//; 1.12 s/\s*$//; 1.13 s/\[[0-9]+\]//g;