On Thu, 22 Jan 2004, David S. Miller wrote: > On Thu, 22 Jan 2004 14:58:54 -0800 (PST) > Linus Torvalds <torvalds _at_ osdl.org> wrote: > > > Have you played with Markov chains? What happens is that you don't just > > build up a list of words and their likelihood of being spam or ham, you > > build up a list of word _combinations_ and the likelihood of one > > particular word following another one. > > > > That's how a lot of the "random phrase" generators on the web work. > > > > They can be absolutely hilarious, exactly because the sentences they > > generate actually _almost_ make sense. Sometimes you get an almost > > readable story, but one that reads like somebody having a bad trip and his > > reality just shifted 90 degrees. (Usually the best stories come if the > > training material is coherent, which email sadly usually isn't). > > This reminds me of the literary work done by William S. Burrough and his > infamous "cut ups". Similar result for the reader. jwz's DadaDodo program <http://www.jwz.org/dadadodo/> was inspired by Burroughs' cut ups, and it does this sort of Markov chain-based generation.... I'm not so sure that Markov chain analysis for spam filtering will work well in practice for email, though, particularly when dealing with international email. I don't write / read German or French well, but I can write in them enough that a German / French speaker can figure out what I'm trying to say, maybe. My German in particular is bad enough that it probably wouldn't fit Markov chain models of how someone fluent in high German would write, though. Similarly, look at some of the fractured English emails that appear on this list. I can understand them, but a Markov model relaxed enough to allow them will also include a lot of random spam.... later, chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo _at_ vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
References:
- Re: List 'linux-dvb' closed to public postsLinus Torvalds
- Re: List 'linux-dvb' closed to public postsZan Lynx
- Re: List 'linux-dvb' closed to public postsDave Jones
- [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed to public postsMike Fedyk
- Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed to public postsAndreas Jellinghaus
- Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closedto public postsZan Lynx
- Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed to public postsJes Sorensen
- Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closedto public postsDavid Ford
- Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed topublic postsDavid Lang
- Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closedto public postsDavid Ford
- Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed to public postsjw schultz
- Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed topublic postsLinus Torvalds
- Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed topublic postsDavid S. Miller
- Prev by Date: Re: make in 2.6.x
- Next by Date: Re: make in 2.6.x
- Previous by thread: Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed topublic posts
- Next by thread: Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed topublic posts
- Indexes:[Main][Thread]