Elastic App Search allows developers to bring the power of Elasticsearch to mobile apps in a pretuned search experience.
When parsing body text, the App Search crawler extracts all the content from the specified website and spreads it in fields depending on the HTML tags it finds. Text within title tags are assumed as title field, anchor tags are parsed as links, and body is parsed as one giant field with everything else.
But what if a website has a custom structure — for example, the color, size, and price included on product pages — that you want to capture in specific fields and not as part of a single body field?
App Search allows you to add meta tags to your website to create custom fields, but sometimes making changes on the website is too complicated or just not possible.
This post will explore the key components of how to create a proxy between the crawler and the website that does the extraction, creates the needed meta tags, and then injects these in the new response so App Search can grab the tags and use them.
[Related article: Elasticsearch Search API: A new way to locate App Search documents]
The body parsing solution
To solve the body parsing problem, you can create a NodeJS server that hosts a product page and a proxy that stands in front. This will receive the crawler request, hit the product page, inject the meta tags, and then return the page to the crawler.
Leave a Reply