Sehrch.com: A Structured Search Engine Powered By Hypertable
03.15.2012 | Hypertable Case Study
Sehrch.com is a structured search engine. It provides powerful querying capabilities that enable users to quickly complete complex information retrieval tasks. It gathers conceptual awareness from the Linked Open Data cloud, and can be used as (1) a regular search engine or (2) as a structured search engine. In both cases conceptual awareness is used to build entity centric result sets. Try this simple query: Pop singers less than 20 years old.
Sehrch.com gathers data from the Semantic Web in the form of RDF, crawling the Linked Open Data cloud and making requests with headers accepting RDF NTriples. Data dumps are also obtained from various sources. In order to store this data, we required a data store capable of storing tens of billions of triples using the least hardware while still delivering high performance. So we conducted our own study to find the most appropriate store for this type and quantity of data.
As Semantic Web people, our initial choice would have been to use native RDF data stores, better known as triplestores. But from our initial usage we quickly concluded that SPARQL compliant triplestores and large quantities of data do not mix well. As a challenge, we attempted to load 1.3 billion triples (the entire DBpedia and Freebase datasets) into a dual core machine with only 3GB memory. The furthest any of the open source triplestores (4store, TDB, Virtuoso) progressed to load the datasets upon the given hardware was around 80 million triples. We were told that the only solution was more hardware. We weren't the only ones facing significant hardware requirements when attempting to the load this volume of data. For example, in the following post a machine with 8 cores and 32GB of RAM was used to load DBpedia in only English and German languages (approximately 300 million triples) using Virtuoso: Setting up a local DBpedia mirror with Virtuoso, we were attempting to load four times that much data, but on a machine with only 10% of the memory!
We then discovered Hypertable. We were able to load DBpedia and Freebase (1.3 billion triples) into Hypertable in less than 24 hours on that same dual core machine (still only 3GB memory). To see how far Hypertable could go, we loaded the datasets three times over, in total storing close to 4 billion triples on that single node! We were shocked to discover that even with that volume of data, Hypertable could still deliver a sustained query throughput of 1,000 queries per second. Since then, we have never looked back and believe using Hypertable in this way could be an eye opener for the Semantic Web community as a whole. To learn more about Sehrch.com and how Hypertable is used as the underlying storage technology, see the following case study:
Sehrch.com: A Structured Search Engine Powered By Hypertable
Posted By: Az from Sehrch.com, email: az
Here's what other people had to say
Some people love to play for money. Marshall’s aversion to government theft. online pokies And the high tide.
interesting blog. It would be great if you can provide more details about it. Thanks you
Very interesting article.
Cindy WolffDecember 13, 2012This has the potential to prodvie a rich discovery tool for online collections and other resources and will encourage standardization in researching and studying the subject matter on a number of levels. The visualization of relationships among the many artists will certainly fuel creative venues for study. http://exkkocngd.com mnfpntj [link=http://nybbeb.com]nybbeb[/link]
Choosing URI’s from a collection also ipimles accepting a viewpoint. In this example you choose the viewpoint of an intergovernmental UN agency about which political entitities are considered as countries, and which not. In practical terms: no Kosovo here (that might upset Serbia), one Somalia while different areas are governed quite differently, and you end up with names like The former Yugoslav Republic of Macedonia (meant to keep Greece happy, afraid that did not work well)
Keeping in mind your intention to clrleay distinguish between the three, I venture to add some comments.Inference is an effect on a dataset achieved via reasoning. The effect could be translated into explicit expression of any implicit triple’.This applies to datasets with complex logic like ontologies and RDF as well as to simpler datasets expressed in XML and schemas or even SGML and DTDs where an IMPLIED value for an attribute is common. Schematron can cause quite some intended inference.Reasoning is the general verb to achieve anything on a logical whole (of assertions, axiom and playrules).I don’t think it’s true that reasoning in general needs a consistency check. But since reasoning on inconsistent data leads to conclusions without any garantuee and doesn’t even garantuee the correct termination of the process, good’ reasoners check for consistency before starting the heavy work. http://zuivzpagym.com fcyqxqjdh [link=http://avvchovjhnx.com]avvchovjhnx[/link]
I think we are looking at tnlheocogy that is pushing the envelope on what our current Internet can do. Speaking of Siri specifically or even Google’s Assistant’ feature are utilizing an current infrastructure the best they can. Until we as a web industry can figure out a standard for feeding this information to a system like Siri then we will never be able to have wide adoption similar to your example of feeding author information.
Sadly, sehrch.com seems to have shut down - I tried to go to the website and it was just hanging.
I have to say that this is extremely interesting. Any other implementations showcasing variants of hypertables?
Christian Hochfilzer
http://netconstructor.com
Good article