### Before we start

An email address is composed of [local part]@[domain]

## An email or domain would be flagged as spam (is_spam=1) if

**contains test**

If the local part OR domain is exactly “test”

If the local part OR domain begins with “test”

If the local part contains “+test” anywhere

If the local part has “test” followed by any number of digits, and that’s the end (ex: logantest333@madkudu.com)

If the domain contains “.test” anywhere

**some characters repeated many times:**

If any character is repeated at least 4 times consecutively

OR

If any pair of letters is repeated at least 4 times (tetetete@gmail.com)

**numbers exceed letters:**

If the local part is made up of more than half by digits (1234aa@gmail.com)

If there is at least one more digit than there are letters in the local part

If there are at least 6 numbers in the local part

**local has no letters:**

If the local part does not contain any letters

**has spammy patterns:**

If the email is longer than 20 and the top 3 most common characters make up more than 70% of characters

**domain end is domain:**

If the strings before and after the period in the domain are the same (logan@hello.hello)

**domain contains phrase from list:**

If the domain contains any phrase from the list of blacklisted phrases. Link to list.

**local contains phrase from list:**

If the local part contains any phrase from the list of blacklisted phrases. Link to list.

**contains asd or sdf twice:**

If the email contains the sequence of letters “asd” or “sdf” twice

**Looks like spam: **

contains the phrase “noemail” anywhere in it

**local part length is 1**

the local part contains contains one character (a@gmail.com)

**domain length is 1**

domain contains exactly one character (logan@a.com)

**local_no_vowels:**

If the local part of the string is at least 4 characters long, does not contain any numbers, and does not contain any vowels.

**local low vowel ratio:**

If the local part is 5 characters long and there are no vowels in it

If the local part is greater than 5 characters and the fraction of vowels is less than 0.08 (specifically, vowels / letters, not vowels / total number of characters. As in, numbers are not being counted when doing the division)

**domain contains short gibberish:**

If the domain contains any of the following:

asdef

asdf

If the domain looks exactly like any of the following:

asd.com

sdf.com

fsd.com

dsa.com

**contains absurdity:**

If the local part contains any of the following:

princessleia

If the local part is exactly:

sda

ads

dsa

nothing

abc

sdf

**not F1000 or personal and has numbers in local:**

There are two consecutive numbers in the local part.

## Comments

0 comments

Please sign in to leave a comment.