Combating Link Spam
As more and more people rely on search engines as starting points to fulfill
their need for information, it has become absolutely important to have one’s
page rank up in the top few results of popular search engines. Most search
engines use, among other things, variants of the classic PageRank algorithm,
which relies on the link structure of the web to rank pages. In order to have
their pages rank higher than deserving, some web designers, resort to all sorts
of tricks to mislead search engines by manipulating linkage (link-spam) and
content(term-spam) on their pages and the web, in the process give form to what
has come to be called web-spam. There is a continuing clash between search
engine algorithm-designers and web-spammers leading to this battleground of the
Adversarial Web.
Our main focus in this report is link-spam. We take a look at the different
methods of combating link-spam. We also look at optimal link-spam structures
and test them using Java code. We implement popular algorithms for ranking
algorithms and test the efficacy of these on a web-graph made available by
Webaroo.
Link Spamming Techniques:
To delve into link spamming let’s categorize pages according to the way they
can be manipulated by spammers to influence results:
a. Inaccessible pages: Spammers cannot modify these pages. However, they can
point to them.
b. Accessible pages: These pages don’t belong to the spammer, but they can
modify the content on these pages, in a limited manner. Typical examples are:
wikis, comments on blogs.
c. Own pages: The spammer wants to boost ranking of one or more of these
pages: target pages, t. These have a cap on budget (e.g. web hosting, etc.).
The target algorithms: HITS, PageRank, TrustRank, etc.
You can download full Combating Link Spam seminar abstract from here.
9 Sep 2013
0 comments:
Post a Comment