[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed topublic posts


On Thu, 22 Jan 2004, David S. Miller wrote:

> On Thu, 22 Jan 2004 14:58:54 -0800 (PST)
> Linus Torvalds <torvalds _at_ osdl.org> wrote:
> 
> > Have you played with Markov chains? What happens is that you don't just
> > build up a list of words and their likelihood of being spam or ham, you
> > build up a list of word _combinations_ and the likelihood of one
> > particular word following another one.
> > 
> > That's how a lot of the "random phrase" generators on the web work.
> > 
> > They can be absolutely hilarious, exactly because the sentences they
> > generate actually _almost_ make sense. Sometimes you get an almost 
> > readable story, but one that reads like somebody having a bad trip and his 
> > reality just shifted 90 degrees. (Usually the best stories come if the 
> > training material is coherent, which email sadly usually isn't).
> 
> This reminds me of the literary work done by William S. Burrough and his
> infamous "cut ups".  Similar result for the reader.

jwz's DadaDodo program <http://www.jwz.org/dadadodo/> was inspired by 
Burroughs' cut ups, and it does this sort of Markov chain-based 
generation....
                                                                                
I'm not so sure that Markov chain analysis for spam filtering will work well
in practice for email, though, particularly when dealing with international
email. I don't write / read German or French well, but I can write in them
enough that a German / French speaker can figure out what I'm trying to say,
maybe. My German in particular is bad enough that it probably wouldn't fit
Markov chain models of how someone fluent in high German would write,
though. Similarly, look at some of the fractured English emails that appear
on this list. I can understand them, but a Markov model relaxed enough to
allow them will also include a lot of random spam....

later,
chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo _at_ vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


この情報があなたの探していたものかどうか選択してください。
yes/まさにこれだ!   no/違うなぁ   part/一部見つかった   try/これで試してみる

あなたが探していた情報はどのようなことか、ご自由に記入下さい。特に「まさにこれだ!」と言う場合は記入をお願いします。
例:「複数のマシンからCATV経由でipmasqueradeを利用してWebを参照したい場合の設定について」
References: