Our system is applied to comments from Youtube, Reddit and portals using DISQUS 



yielding one score per comment...

The higher the score the more suspicious the comment!


The fact is that most systems use only lists of bad words.

Why the #$@&%*! is that happening? 76% of the profane messages
have no bad words
We can filter +80%
of the profane messages!



Various machine learning models have a say for each comment as for its profanity. This allows for models which disagree with each other to decide a common outcome.

Over 5M comments, annotated by human experts and moderators have been used to generate vectors corresponding to words. This vector space is then used to relate words to each other, but, also, to relate words to senses.

Various NLP techniques have been incorporated to transform text into meaningful (for machines only) input. Also, noise has been canceled from the vocabulary and text cleansing techniques are applied. Bag of words transformation and other simple techniques have been used for some of our models.

Recurrent and Convolutional Neural Networks are used, optimized via Deep Learning techniques. Regularization, dropout, attention, normalization, padding, masking are some of the nuts and bolts that make our Neural Networks work.

