docs/towards_dist_search/Current_implementations.md
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Full distribution of every task (no centralized coordination at all): Peers can perform their jobs independently and communicate required data. This will prevent link congestion to a central server. Functionally identically programmed nodes, distinguished by a unique identifier only.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Platform independence of nodes.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
[Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web](https://www.cc.gatech.edu/~lingliu/papers/2003/apoidea-sigir03.pdf "https://www.cc.gatech.edu/~lingliu/papers/2003/apoidea-sigir03.pdf"), Aameek Singh, Mudhakar Srivatsa, LingLiu, and Todd Miller, 2003
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- _Content addressing_: All content is uniquely identified by its multihash checksum.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Performance: Nodes run on spare CPU processing power and are not to put too large a load on a client machine.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- No single point of failure and gracefully dealing with permanent and transient failures (identify crawl traps and be tolerant to external failures)
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- The **_Extractor_** extracts linked hashes/links from the object and passes them on to the hash validator. It can perhaps be multithreaded and process different pages simultaneously.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Version history of each file is tracked by IPFS.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Locally computable content address assignment based on consistent hashing - IPNS is consistent (check for the presence of DNSLink first for it is faster)
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Nodes do not crash on the failure of a single peer. ⇒ dynamic reallocation of addresses can be done across other peers.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Each crawl job queue has one **_Content Fetcher_** thread associated with it that streams data into the extractor.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- If we keep the connection open to a node to minimise connection establishment overhead we will need exception handling.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- The **_Hash Validator_** checks whether a hash is the responsibility of the node. If so, it sends the hash to the preprocessor. If not, it is sent out on the network.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- When a node comes back online after having failed, data that needs to be stored on that node is propagated to it and can be retrieved from it.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- A _block_ is a variable-size block of data.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- A _list_ is an ordered collection of blocks or other lists.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- If a node failure is permanent, the node needs to be recoverable using data stored by other nodes.
- Exclude checks
Unexpected trailing spaces found. Open
Open
[UbiCrawler: A Scalable Fully Distributed Web Crawler](http://vigna.di.unimi.it/ftp/papers/UbiCrawler.pdf "http://vigna.di.unimi.it/ftp/papers/UbiCrawler.pdf"), Paolo Boldi, Bruno Codenotti, Massimo Santini, Sebastiano Vigna
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- _Tamper resistance_: All content is verified with its checksum.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Data sent to a node while it is down still needs to propagate properly throughout the system.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Data is still retrievable even with a node failing.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Portability: The nodes can be configured to run on any kind of dweb network by just replacing it.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected additional newlines at the end of the file. Open
Open
### [Source](https://niverel.tymyrddin.space/en/play/stones/current/peer-crawling)
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- If a node is storing a node that is the parent (root/ancestor) of other nodes, then it is much more likely to also be storing the children. So when a requester attempts to pull down a large DAG, it first queries the DHT for providers of the root. Once the requester finds some and connects directly to retrieve the blocks, BitSwap will optimistically send them the “wantlist”, which will usually obviate any more DHT queries for that DAG.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- [Filecoin \[IOU\] (FIL)](https://www.coingecko.com/en/coins/filecoin "https://www.coingecko.com/en/coins/filecoin"), CoinGecko
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Scalability - Not relying on location implies latency can become an issue ⇒ minimising communication.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Freshness: After an object is initially acquired (processed), it may have to be periodically recrawled and checked for updates. In the simplest case, this could be done by starting another broad breadth-first crawl, or by requesting all items in the collection of a node again. Techniques for optimizing the “freshness” of such collections is usually based on observations about an item's update history (incremental crawling). Sadly, we cannot use this “as such” as an updated object in IPFS has a new hash. A variety of other heuristics can be used to recrawl as “more important” marked items. Good enough recrawling strategies are essential for maintaining an up-to-date search index with limited crawling bandwidth.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- The **_Neighbourhood_** data store contains the identifiers of agents in a node's neighbourhood.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Uses the local IPFS gateway to fetch a (named) IPFS resource.
- Exclude checks
Expected a newline at the end of the file. Open
Open
### [Source](https://niverel.tymyrddin.space/en/play/stones/current/peer-crawling)
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Duplications are removed across the network.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- A _tree_ is a collection of blocks, lists, or other trees.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- [Filecoin](https://filecoin.io/ "https://filecoin.io/")
- Exclude checks
Unexpected trailing spaces found. Open
Open
- An **_Overlay Network Layer_** responsible for formation and maintenance of a distributed search engine network, and communication between peers. These can be unstructured or structured networks. If a network is not scalable, a supernode architecture can be used to improve performance, hence a client must have support for flat as well as supernode architecture.
- Exclude checks
Expected an indentation at 8 instead of at 9. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- The **_Content Range Validator_** checks whether content lies in the range of the node.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- A **_Peer and Content Distribution Function_** determines which clients to connect with. Each client has a copy of this function. Hash list & content range associated with a client can change due to joining/leaving of nodes in the network. The function will distribute hashes to crawl as well as content among peers and makes use of the underlying dweb network to provide load balancing and scalability and takes proximity of nodes into account. Initially, we will use a static distribution function, hash list and content range assignment functions can be hash functions.
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Exclude checks
Unexpected trailing spaces found. Open
Open
- Checks file permissions.
- Exclude checks