[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed to public posts


Hi!

> > Beyes is the wrong aproach for those random words from the
> > dictionary blocks.
> 
> Bayes is not wrong per se, but doing bayes on pure word statistics is
> wrong. It always was. People knew how it could be broken. The current rash
> of spams is just the obvious way to do it.

You want to do it on trigrams (groups of three works). Anything longer
than trigrams is not likely to be effective.

> > What we need is a bounty on these scum.  $1000 fine per
> > reported recipient with half going to the reporter would be
> > nice.
> 
> What you should aim for, and which should be much harder to break, is to 
> realize that random words that make no sense give a really unlikely 
> score when you build up a markov chain of them.
> 
> So to avoid the random words problem, do Bayes on the _chain_ of words
> instead.
> 
> Now, you can try to overcome this by spamming with something that makes
> "sense" from the markov chain standpoint, but by then that spam is going
> to be hilarious. Once I start getting spams that are generated by markov
> generators and read like "real" email, I might stop filtering them, just
> because they are bound to be a lot of fun to read.

Even if you get 100 of them per day?

I'm doing language modeling in school, and generating text that "looks
like meaningfull" to everyone but human is too easy.

[Take your favourite voice-recognition software, and speak in another
language to it. It will generate plausible-looking sentences at the
output. This could be easily automated.]
								Pavel
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo _at_ vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


この情報があなたの探していたものかどうか選択してください。
yes/まさにこれだ!   no/違うなぁ   part/一部見つかった   try/これで試してみる

あなたが探していた情報はどのようなことか、ご自由に記入下さい。特に「まさにこれだ!」と言う場合は記入をお願いします。
例:「複数のマシンからCATV経由でipmasqueradeを利用してWebを参照したい場合の設定について」
References: