Graph Mining: Hits. Learn by finding Answers to the Following Questions.
What is HITS algorithm in Graph Mining esp. for Web?
What does HITS stand for?
What is another similar Algorithm?
What are different between the PageRank algorithm and HITS?
What is the PageRank algorithm? What is the purpose?
Where and how PageRank is used?
Where and how HITS is used?
Describe HITS?
Describe the PageRank algorithm.
What are the two core concepts in HITS algorithm?
Web-pages that serves as large directories of other pages with useful information – what are they called in HITS algorithm?
Web-pages that serves to provide information on Specific topics – what are they called in the HITS algorithm?
What are HUBs in HITS algorithm? What purpose do they serve?
What are Authorities in HITS algorithm? What purpose do they serve?
Compare HITS and PageRank.
A web-page containing links to Top 1000 Universities in the world – is this a HUB or Authority in HITS?
What is the Govt. of Canada website that provides information on all Govt. Services? Is this a Hub or an Authority?
How do you measure a Good Hub?
How do you measure a Good Authority?
Can a web-page be both Hub and Authority? Can you assign both measures to a page irrespective of how good or bad?
What are the two scores that HITS assign to a web-page?
What is authority score for a web-page?
What is the hub score for a web-page?
What does hub score measure?
What does authority score measure?
What are the three matrices that are used for HITS algorithm? i.e. when you want to implement HITS algorithm.
What is a Transition Matrix in HITS?
What is a HUB vector? What does it contain initially?
What is an Authority vector? What does it contain initially?
For the graph below, provide the Initial Transition, Hub, Authority Vector/Matrices.
Directed Graph {Source, Destination}
Node: Yahoo, Amazon, Microsoft
Edges: {Yahoo, Yahoo} {Yahoo, Amazon} {Yahoo, Microsoft} {Amazon, Yahoo} {Amazon, Microsoft} {Microsoft, Amazon}
For the same graph above, explain your transition matrix?
If Transition Matrix is A, Hub Vector = h0, Authority Vector = a0.
How is Hub Score for a page is updated?, How is Authority Score for a page is updated? How long does this update happen?
The algorithm/steps as mentioned above: will the update converge to a state where Hub and Authority values will no longer change? Why, why not? if convergence does not happen what to do?
What is HITS normalization? i.e. after each iteration. Why it might be important.
Give steps/equations used for HITS normalization.
What are the two ways, you can make the HITS algorithm stop? i.e. stopping points.
True or False, Destiny of PageRank and HITS were different. What does it mean?
Can you stop the HITS algorithm after a certain number of iterations?
In real life, are HITS and PageRank used/applied for the whole graph i.e. whole Internet for example? Or majority of the times, they are applied on contextual graphs?
What are contextual graphs, anyway?
Give example use cases for HITS and PageRank?
Can you think of the value aspect of these algorithms? i.e. how they affect people, communities, societies?
What are the programming languages where you will find libraries that implement the HITS and PageRank algorithm? Give the name of the libraries.
Implement the algorithms from scratch in Python or in R without using the libraries. What did you use to debug your implementation and how?
What are the challenges that you faced to implement, how did you resolve? How did you represent the Graphs (i.e. on the graph you applied for testing).
Some Answers:
For the graph below, provide the Initial Transition, Hub, Authority Vector/Matrices.
Ans: Transition Matrix
A =[
1  1  1
1  0  1
0  1  0
]
Initial Hub Vector = h0
[
Yahoo
Amazon
Microsoft
]
=[
1
1
1]
Authority Vector: a0
[
Yahoo
Amazon
Microsoft
]
=
[
1
1
1
]
For the same graph above, explain your transition matrix?
First Row: Yahoo
2nd Row: Amazon
3rd Row: Microsoft
Columns: Yahoo —- Amazon — Microsoft
Transition Matrix
[Yahoo-> Yahoo, Yahoo-> Amazon, Yahoo-> Microsoft]
[Amazon->Yahoo, Amazon->Amazon, Amazon->Microsoft]
[Microsoft->yahoo, Microsoft->Amazon, Microsoft->microsoft]
using: 1 if page i links to page j, otherwise 0
[1,  1, 1 ]
[1,  0, 1]
[0,  1, 0]
If Transition Matrix is A, Hub Vector = h0, Authority Vector = a0.
How is Hub Score for a page is updated?, How is Authority Score for a page is updated?
A * a0 = h1 : Hub Score Update: based on authority score of outgoing links
transition(A) h1 = a1 : Authority Score Update : based on hub score of incoming links
 
                
                                                                