Wordfilter in Lua

I’ve been getting bored lately so I’ve decided to create a wordfilter in Lua, it’s really basic actually, but useful. The concept goes something like this. You take a list of naughty words, and filter them out of the text. :D

This filter goes out a bit more advanced then that however. It’s punctuation aware and have a few wildcard filter rules. For example, say we want to filter the word boo out of the text, we have several distinct filter rules that we can use.

Here’s the texts to be filtered.
“boo… text”
“oboo… text”
“booo… text”
“obooo… text”

Rule 1:
Filter:”boo” => Will only return true if the word boo is found in the text.
Result:

  1. “boo… text” – True
  2. “oboo… text” – False
  3. “booo… text” – False
  4. “obooo… text” – False

Rule 2:
Filter:”*boo” => Will only return true if the substring boo is found at the end of a word in the text.
Result:

  1. “boo… text” – True
  2. “oboo… text” – True
  3. “booo… text” – False
  4. “obooo… text” – False

Rule 3:
Filter:”boo*” => Will only return true if the substring boo is found at the begining of a word in the text.
Result:

  1. “boo… text” – True
  2. “oboo… text” – False
  3. “booo… text” – True
  4. “obooo… text” – False

Rule 4:
Filter:”*boo*” => Will only return true if the substring boo is found anywhere in the text.
Result:

  1. “boo… text” – True
  2. “oboo… text” – True
  3. “booo… text” – True
  4. “obooo… text” – True

Usage:

  1. Add the filter rules in the badwords.txt file, remember to close each rule with a comma
  2. Put dofile(“filter.lua”) in your file, remember to put filter.lua, string.lua, table.lua, and badwords.txt in the same directory.
  3. Use filter(Text) as the filter function, it will return either true or false. If it returns true, then the filter will have detected naughty words in the text

Download: http://6.dot.ch/filter.zip

if not string.trim then dofile("string.lua") end
if not table.find then dofile("table.lua") end

puns = {"'",
		"\"",
		",",
		".",
		"/",
		";",
		":",
		"?",
		"!",
		"*"}
wildcard = "*"
wordfile = "badwords.txt"

wopen = io.open(wordfile)

wordlist = table.trim(string.split((wopen:read("*a")), ","))

wopen:close()

function string.trimpun(t)
	local letters = t:letters()
	local first = letters[1]
	local last = letters[#letters]
	while (first:inside(puns)) do
		t = t:sub(2)
		first = t:sub(1, 1)
	end
	while (last:inside(puns)) do
		t = t:sub(1, #t-1)
		last = t:sub(#t, #t)
	end

	return t
end

function string.getwordfrom(t, i)
	--Rule: space
	t = t:split()
	local n = 0
	local s = 1
	while n < i do
		n = n + #t[s] + 1

		s = s+1
	end
	return (t[s-1]:trim():trimpun())
end

function table.overlap(t1, t2)
	local common = {}
	for k, v in pairs(t1) do
		if table.find(t2, v) then table.insert(common, v) end
	end
	if #common == 0 then return false end
	return common
end

function filter(text, words)
	if not words then words = wordlist end
	local match = ""..wildcard..""
	_text = text:split()
	for i,w in ipairs(_text) do
		_text[i] = w:trimpun()
	end
	if table.overlap(_text, words) then return true end
	for i,word in ipairs(words) do
		if word:find(match) then
			local index = word:index(match)
			word = word:translate(wildcard)
			if text:find(word) then
				--Wildcard Syntax.
				--Both - Anywhere in text
				if #index >= 2 then return true end
				--End - Only at the begining of a WORD
				if index[1] > 1 then
					local i = text:index(word)
					for k, v in ipairs(i) do
						local w = text:getwordfrom(v)
						if w:sub(1, #word) == word then return true end
					end
				end
				--Begining - Only at the end of a WORD
				if index[1] == 1 then
					local i = text:index(word)
					for k, v in ipairs(i) do
						local w = text:getwordfrom(v)
						if w:sub(#w-#word+1, #w) == word then return true end
					end
				end
			end
		end
	end

	return false
end
This entry was posted in Lua, Programming, Snippets and tagged , , , , , , , , . Bookmark the permalink.

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>