#include <graphbuilder.h>
Collaboration diagram for GraphBuilder:
Public Methods | |
GraphBuilder (int smem, int jmem, int nmem, int lmem, bool sl) | |
~GraphBuilder () | |
void | NodeInitialize (uint32 idno) |
Creates a new WebNode on the heap and assigns anothernode as a handle. More... | |
void | NodeSetURL (const char *docurl, const char *aliasurl) |
Copies the current document's URL into another_url, and places it into the trie. More... | |
const char * | NodeGetURL () |
const char * | NodeGetAlias () |
const char * | NodeGetURL_ () |
const char * | NodeGetAlias_ () |
const uint32 | NodeGetID () |
const uint16 | NodeGetDate () |
void | NodeSetDate (unsigned short aDate) |
Sets anothernode's date. More... | |
void | NodeInsertLinks () |
Inserts the anchor links contained in linkset into anothernode's fromlinks array. More... | |
void | NodeLaunch () |
Places the WebNode handled by anothernode into the graph and clears anothernode. More... | |
URLComponents * | NodeGetURLParts () |
WebLinkGraph * | UndockWebGraph () |
Here we walk through all the nodes and change the links so that they no longer require the trie. More... | |
const char * | FormatURL (const char *anurl, int anurl_len, ContentType *t) |
void | TrieInsertLinkURL (const char *url) |
WebNodePtr | FindWebNode (const char *url) |
finds an existing WebNode from a URL. Returns NULL if not found. More... | |
void | SetupLeafTable () |
void | AddLeaf (ptrdiff_t key, LeafNodePtr leaf) |
const ptrdiff_t | FindLeafNodeKey (const char *url) |
finds an existing LeafNode key into the leaftable from a URL. Returns -1 if not found. More... | |
void | UpdateLeafLinks () |
void | StatisticsMem (ostream &o) |
void | StatisticsGraph (ostream &o) |
uint32 | LowestID () |
uint32 | HighestID () |
uint16 | LowestDate () |
uint16 | HighestDate () |
Public Attributes | |
struct { | |
bool show_links | |
int leaftable_memory | |
} | flags |
Private Attributes | |
WebNode * | curdoc |
URLComponents | curdoc_baseurl |
char | doc_url [STRINGBUF_LEN2+1] |
char | doc_alias [STRINGBUF_LEN2+1] |
const char * | docurl__ |
const char * | aliasurl__ |
WebLinkGraph * | graph |
bool | graph_is_docked |
Trie * | trie |
SimpleWebNodePtrHashTable * | nodetable |
SimpleLeafNodePtrHashTable * | leaftable |
URLFilter * | urlfilter |
RawLinkSet * | linkset |
struct { | |
uint32 heap_used_webnodes | |
uint32 cumulative_tolinks | |
uint32 cumulative_fromlinks | |
uint32 cumulative_leaflinks | |
uint32 cumulative_dangling | |
uint32 nodetable_insertions | |
uint32 nodetable_alias_insertions | |
uint32 lowid | |
uint32 highid | |
uint16 lowdate | |
uint16 highdate | |
} | stats |
Every WebNode is a separate document, and GraphBuilder handles the connection of fromlinks and tolinks, and the discarding of dangling links.
To do this, GraphBuilder must construct a trie containing all known document URLs. Once the WebLinkGraph is built, the trie is "undocked" from the list. This allows the (substantial) memory taken by the URL strings to be regained, at the cost of no longer being able to identity a WebNode by its document URL.
Definition at line 50 of file graphbuilder.h.
|
This constructor allocates approximately (smem+jmem) Mb for the trie and nmem Mb for the nodetable. Definition at line 30 of file graphbuilder.cc. References URLComponents::Clear(), curdoc, curdoc_baseurl, doc_alias, doc_url, flags, graph, graph_is_docked, kint32max, kuint16max, linkset, Mb, nodetable, NULL, RawLinkSet, SimpleWebNodePtrHashTable, stats, trie, and urlfilter. |
|
Definition at line 72 of file graphbuilder.cc. |
|
Definition at line 358 of file graphbuilder.cc. References LeafNode::Date(), SimpleHashTable< LeafNodePtr >::Find(), SimpleHashTable< LeafNodePtr >::Insert(), leaftable, and stats. Referenced by Talker::LoadLeaves(). |
|
finds an existing LeafNode key into the leaftable from a URL. Returns -1 if not found.
Definition at line 347 of file graphbuilder.cc. References URLFilter::CompressURL(), URLFilter::DeindexURL(), Trie::FindURL(), trie, and urlfilter. Referenced by Talker::LoadLeaves(). |
|
finds an existing WebNode from a URL. Returns NULL if not found.
Definition at line 335 of file graphbuilder.cc. References URLFilter::CompressURL(), URLFilter::DeindexURL(), SimpleHashTable< WebNodePtr >::Find(), Trie::FindURL(), nodetable, NULL, trie, and urlfilter. Referenced by Talker::BuildTags(), and Talker::LoadLeaves(). |
|
Definition at line 72 of file graphbuilder.h. References ContentType, curdoc_baseurl, URLFilter::FormatURL(), and urlfilter. Referenced by GraphParseHandler::AddAnchor(). |
|
Definition at line 94 of file graphbuilder.h. Referenced by Talker::ProcessCommand(). |
|
Definition at line 89 of file graphbuilder.h. |
|
Definition at line 92 of file graphbuilder.h. Referenced by Talker::ProcessCommand(). |
|
Definition at line 87 of file graphbuilder.h. Referenced by Talker::ProcessCommand(). |
|
Definition at line 246 of file graphbuilder.cc. References doc_alias. Referenced by Ripper::RipRepository(). |
|
Definition at line 238 of file graphbuilder.cc. References aliasurl__. Referenced by Ripper::RipRepository(). |
|
Definition at line 255 of file graphbuilder.cc. References curdoc, WebNode::Date(), and uint16. Referenced by Ripper::RipRepository(). |
|
Definition at line 250 of file graphbuilder.cc. References curdoc, WebNode::ID(), and uint32. Referenced by Ripper::RipRepository(). |
|
Definition at line 242 of file graphbuilder.cc. References doc_url. Referenced by GraphParseHandler::AddAnchor(), and Ripper::RipRepository(). |
|
Definition at line 234 of file graphbuilder.cc. References docurl__. Referenced by Ripper::RipRepository(). |
|
Definition at line 67 of file graphbuilder.h. References curdoc_baseurl. |
|
Creates a new WebNode on the heap and assigns anothernode as a handle.
Definition at line 109 of file graphbuilder.cc. References URLComponents::Clear(), curdoc, curdoc_baseurl, doc_alias, doc_url, stats, and uint32. Referenced by Ripper::RipRepository(). |
|
Inserts the anchor links contained in linkset into anothernode's fromlinks array. Note that at this stage, all the links are pointer differences into the trie. Definition at line 158 of file graphbuilder.cc. References curdoc, WebNode::InsertRawLinks(), and linkset. Referenced by Ripper::RipRepository(). |
|
Places the WebNode handled by anothernode into the graph and clears anothernode. The WebNode is pushed at the front of the graph. Since node id's are given sequentially in increasing order, the graph will contain nodes with decreasing id sequence. This ordering should not be tampered with, as it is used by Talker(). Definition at line 131 of file graphbuilder.cc. References URLComponents::Clear(), curdoc, curdoc_baseurl, WebNode::Date(), doc_alias, doc_url, graph, linkset, NULL, and stats. Referenced by Ripper::RipRepository(). |
|
Sets anothernode's date.
Definition at line 150 of file graphbuilder.cc. References curdoc, and WebNode::SetDate(). Referenced by GraphParseHandler::AddHeader(). |
|
Copies the current document's URL into another_url, and places it into the trie. The copy is necessary so that later calls to FormatURL() can use another_url to complete anchor link URLs, whenever those are incomplete. Definition at line 167 of file graphbuilder.cc. References aliasurl__, URLFilter::CompressURL(), ContentType, curdoc, curdoc_baseurl, URLFilter::DeindexURL(), doc_alias, doc_url, docurl__, SimpleHashTable< WebNodePtr >::Find(), flags, URLFilter::FormatURL(), SimpleHashTable< WebNodePtr >::Insert(), Trie::InsertURL(), URLComponents::netloc, nodetable, NULL, URLComponents::params, URLFilter::ParseURL(), URLComponents::path, URLComponents::query, URLComponents::scheme, stats, trie, and urlfilter. Referenced by GraphParseHandler::NewDocument(). |
|
Definition at line 351 of file graphbuilder.cc. References SimpleHashTable< LeafNodePtr >::Clear(), flags, leaftable, Mb, and SimpleLeafNodePtrHashTable. Referenced by Talker::LoadLeaves(). |
|
Definition at line 95 of file graphbuilder.cc. Referenced by Talker::PrintStatisticsGraph(). |
|
Definition at line 80 of file graphbuilder.cc. References MemoryPooled< WebNodeStruct >::FreeBlocks(), MemPool< S >::FreeBlocks1(), MemPool< S >::FreeBlocks2(), nodetable, SimpleHashTable< WebNodePtr >::Size(), Trie::Statistics(), stats, and trie. Referenced by Talker::PrintStatistics(), and Ripper::PrintStatistics(). |
|
Definition at line 371 of file graphbuilder.cc. References Trie::bigs, URLFilter::CompressURL(), URLFilter::DeindexURL(), Trie::InsertURL(), linkset, trie, and urlfilter. Referenced by GraphParseHandler::AddAnchor(). |
|
Here we walk through all the nodes and change the links so that they no longer require the trie.
Definition at line 261 of file graphbuilder.cc. References graph, graph_is_docked, nodetable, and stats. Referenced by Ripper::PublishWebGraph(). |
|
Definition at line 310 of file graphbuilder.cc. References graph, leaftable, and stats. Referenced by Talker::LoadLeaves(). |
|
Definition at line 111 of file graphbuilder.h. Referenced by NodeGetAlias_(), and NodeSetURL(). |
|
Definition at line 130 of file graphbuilder.h. |
|
Definition at line 128 of file graphbuilder.h. |
|
Definition at line 129 of file graphbuilder.h. |
|
Definition at line 127 of file graphbuilder.h. |
|
Definition at line 104 of file graphbuilder.h. Referenced by GraphBuilder(), NodeGetDate(), NodeGetID(), NodeInitialize(), NodeInsertLinks(), NodeLaunch(), NodeSetDate(), and NodeSetURL(). |
|
Definition at line 106 of file graphbuilder.h. Referenced by FormatURL(), GraphBuilder(), NodeGetURLParts(), NodeInitialize(), NodeLaunch(), and NodeSetURL(). |
|
Definition at line 108 of file graphbuilder.h. Referenced by GraphBuilder(), NodeGetAlias(), NodeInitialize(), NodeLaunch(), and NodeSetURL(). |
|
Definition at line 107 of file graphbuilder.h. Referenced by GraphBuilder(), NodeGetURL(), NodeInitialize(), NodeLaunch(), and NodeSetURL(). |
|
Definition at line 110 of file graphbuilder.h. Referenced by NodeGetURL_(), and NodeSetURL(). |
|
Referenced by GraphParseHandler::AddAnchor(), GraphBuilder(), NodeSetURL(), and SetupLeafTable(). |
|
Definition at line 115 of file graphbuilder.h. Referenced by GraphBuilder(), NodeLaunch(), StatisticsGraph(), UndockWebGraph(), and UpdateLeafLinks(). |
|
Definition at line 116 of file graphbuilder.h. Referenced by GraphBuilder(), and UndockWebGraph(). |
|
Definition at line 126 of file graphbuilder.h. |
|
Definition at line 136 of file graphbuilder.h. |
|
Definition at line 134 of file graphbuilder.h. |
|
Definition at line 120 of file graphbuilder.h. Referenced by AddLeaf(), SetupLeafTable(), UpdateLeafLinks(), and ~GraphBuilder(). |
|
Definition at line 99 of file graphbuilder.h. |
|
Definition at line 123 of file graphbuilder.h. Referenced by GraphBuilder(), NodeInsertLinks(), NodeLaunch(), TrieInsertLinkURL(), and ~GraphBuilder(). |
|
Definition at line 135 of file graphbuilder.h. |
|
Definition at line 133 of file graphbuilder.h. |
|
Definition at line 119 of file graphbuilder.h. Referenced by FindWebNode(), GraphBuilder(), NodeSetURL(), StatisticsMem(), UndockWebGraph(), and ~GraphBuilder(). |
|
Definition at line 132 of file graphbuilder.h. |
|
Definition at line 131 of file graphbuilder.h. |
|
Definition at line 98 of file graphbuilder.h. |
|
Referenced by AddLeaf(), GraphBuilder(), HighestDate(), HighestID(), LowestDate(), LowestID(), NodeInitialize(), NodeLaunch(), NodeSetURL(), StatisticsGraph(), StatisticsMem(), UndockWebGraph(), and UpdateLeafLinks(). |
|
Definition at line 118 of file graphbuilder.h. Referenced by FindLeafNodeKey(), FindWebNode(), GraphBuilder(), NodeSetURL(), StatisticsMem(), TrieInsertLinkURL(), and ~GraphBuilder(). |
|
Definition at line 121 of file graphbuilder.h. Referenced by FindLeafNodeKey(), FindWebNode(), FormatURL(), GraphBuilder(), NodeSetURL(), and TrieInsertLinkURL(). |