Language models like GPT-3 can herald a new type of search engine


Now, a group of Google researchers released Suggestions for a complete redesign The ranking method is eliminated, and a single large-scale AI language model (such as BERT or GPT-3-Or future versions of them. The idea is that instead of searching for information in a huge list of web pages, users will ask questions and train language models on these pages to answer the questions directly.This method can not only change the way search engines work, but also the way they work and the way we interact with them

Even if the scale of the Internet has increased sharply, search engines have become faster and more accurate. AI is now used to rank results, and Google uses BERT to understand search queries better one. However, due to these adjustments, all major search engines still work the same way they were 20 years ago: crawlers (web pages will continuously read web pages and maintain a list of all content found) index web pages and communicate with users The result of matching the query is to collect data from the index, and then rank the results.

Donald Metzler (Donald Metzler) and his colleagues at Google Research wrote: “This blueprint for indexing and then ranking has withstood the test of time and is rarely challenged or seriously considered.”

The problem is that even the best search engines today will still respond with a list of documents containing the required information, rather than the information itself. Search engines are also not good at answering queries that require answers from multiple sources. It’s like you ask your doctor for advice and receive a list of articles to read instead of direct answers.

Metzler and his colleagues are interested in search engines that behave like human experts. It should generate answers in natural language and be composed of multiple documents, and back up its answers in the form of supporting evidence, like Wikipedia articles.

The large language model makes us a part of it. GPT-3 has received training on most networks and hundreds of books, can obtain information from a variety of sources, and answer questions in natural language. The problem is that it cannot track these sources, nor can it provide evidence of answers. It is impossible to tell whether GPT-3 is imitating trustworthy information or false information, or just spreading its own nonsense.

Metzler and his colleagues call the language model Dilettantes — “They are thought to know a lot, but their knowledge is very superficial.” They claim that the solution is to build and train the future BERT and GPT-3 to Keep a record of the source of its words. No such model can do this, but it is feasible in principle, and early work has been carried out in this direction.

Zizi Zhang of the University of Sheffield in the United Kingdom said that in different search fields, from answering queries to summarizing documents to structured information, there have been decades of progress. He is researching information retrieval on the Internet. However, none of these techniques has revolutionized search, because each of them solves specific problems and cannot be promoted. He said that the exciting premise of this article is that a large language model can complete all these operations at the same time.

However, Zhang pointed out that language models do not perform well in technical or professional disciplines because there are fewer examples in the texts they are trained on. He said: “The e-commerce data on the Internet may be hundreds of times more than the data on quantum mechanics.” Today’s language models are also biased towards English, which will make the non-English part of the Internet unable to be fully served.

Mr. Zhang still welcomes this idea. He said: “This was impossible in the past, because large language models have only recently emerged.” “If feasible, it will change our search experience.”


Source link