Data Science in Travel Tech: Search and Booking

Data Science in Travel Tech: Search and Booking

Emerging Travel Group’s Data Science team was formed in 2014, long before the heyday of large language models. During this time, the team completed many projects using computer vision, NLP, and complex classical models.

A strong interest in language models made many of us realize that Data Science is more than just transformers (Generative pre-trained transformer, GPT). We use images, text, and tabular data to build models that work in real time or process statistical data, helping us select the best hotels for your next trip.

Editorial Board of Emerging Blog
175
9 minutes read

Who we are

Emerging Travel Group (ETG) is a global travel company operating the brands RateHawk, ZenHotels, and Roundtrip, in over 220 source markets since 2010. Our customers include trade partners and individuals simply booking their hotels online. We have more than 3,300 employees spread across Europe, the Americas, the Middle East, the Asia-Pacific region, the CIS, and Africa.

A good Data Science product is not noticeable — it just works. Let’s play a game: From the following website screenshots, try to guess how many DS products are on each page. Do not forget that a DS product is not only neural networks but also classical models and various heuristics. The results may surprise you!

Contents

Riddle #1

Riddle #2

Riddle #3

Riddle #4

Riddle #5

Riddle #1

Booking starts with a search. The screenshot shows the main page of https://zenhotels.com. It is a hotel booking website for individual travelers that offers various accommodation options, such as hotels, hostels, apartments, guest houses, and campgrounds. 

How many DS products are there?

The correct answer is 1 — full-text search and ranking of destinations, or simply “Suggestion of destinations.”

Suggestion of destinations

As you refine your query, we will adjust and show more accurate destinations sorted by popularity, even if the destination is misspelled — “Bacerlona.”

“Suggestion of destinations” is one of those examples when a DS product does not need a model — a couple of heuristics are enough. When you place the cursor in the search bar, we offer the most popular destinations. After entering a query, we show a list sorted based on the B-tree, considering what the user searched for earlier.

Riddle #2

We click “Search,” and loading starts. How many DS products managed to work while we were looking for a hotel Here’s a little hint: we will return to ranking more than once because search is one of the largest components of ETG.

The correct answer is 4. These include offline search results ranking, Dynamic Cache TTL, Look-to-book optimization, and pricing.

Offline ranking of search results

Offline ranking uses stored data to organize the order of hotel searches from different suppliers. This process is important to know which suppliers to contact first about room availability. Here, supplier response speed is considered; the faster they respond and the better their offering, the higher they appear in search results. Factors like a hotel’s popularity, guest reviews, and the expected number of free rooms also matter in our ranking.

The CatBoost model is our tool of choice, trained with 10 million user sessions from the past 30 days and data from 1,000 regions, where each may have as many as 5,000 hotels.

We aim to find the most appealing hotel at a reasonable price. The CatBoost model is our tool of choice, trained with 10 million user sessions from the past 30 days and data from 1,000 regions, where each may have as many as 5,000 hotels. This model considers factors such as user reviews, star rating, amenities, past performance, and a hotel’s pricing compared to the regional average. 

Dynamic Cache TTL

We process 1.5 million search queries per minute. We aim to minimize the number of requests to suppliers while showing the user up-to-date information on availability and prices. To reduce the time and quantity of the requests, which our servers process, we use caching. 

We have a model that tells us how long we should store supplier information in the cache. The model is trained using the supervised learning principle based on historical supplier data.

Look-to-book optimization

In the travel industry, a significant challenge is filtering out accommodations that are unlikely to be booked and focusing on options most likely to lead to a booking.

Hotel searches should strike a balance between appealing listings and accurately reflecting what the user seeks. This efficiency — how often searches result in bookings — is known as conversion.

Adjusting models to improve performance is a regular process, with a 3–4% increase in conversion seen as a success.

Look-to-book optimization is a standard approach to assessing the likelihood of a search resulting in a booking. It considers the booking history and details about the user, supplier, and hotel.

We aim to integrate Look-to-book optimization with Dynamic Cache TTL to refresh caches and expedite the search process.

Pricing

Pricing is a rather nuanced term. One of the main components of pricing is repricing.

Repricing is a repeated, refined request to suppliers to determine whether the price has changed and whether the same rooms are available from other suppliers.

It may turn out that we found a room that was better than the one requested, and the price is the same. Or we can find the same room from another supplier, but cheaper. In this case, we will choose the cheaper option and show it to the user.

The repricing results affect subsequent searches of other users, among other things.

Riddle #3

The page has loaded, and hotel options have appeared. How many DS products are on this page?

The correct answer is 3, including online ranking, ranking by loyalty programs, and choosing a hotel image.

Online ranking of search results

Online ranking is based on the point-wise approach. This is when we calculate and sort by a certain parameter. After the large search, we do a second point-wise ranking of a small sample of 100-200 hotels. We can compare more parameters with a small sample and move the best hotel up.

We want our customers to be able to book exactly what they want.

We want our customers to be able to book exactly what they want. Online ranking places those hotels that are more likely to be booked higher on the lists.

Loyalty program ranking

Loyalty programs are not only for users; they are also for accommodation owners, such as the Top Stays program. We are confident in Top Stays hotels because they have a high rating, many reviews and bookings.

We want to show hotels with both the biggest discount for the user and the best accommodations overall. Using linear regression, we are looking for a balance between loyalty programs. Our task is to predict the number of views: how many views the hotel is currently getting and how many it could get if it were higher on the list.

Selecting a hotel image

Suppliers and hoteliers send many images, and our task is to create a “Smart gallery.” This gallery considers users’ interests, whether they’d prefer to see the front of the hotel or its interiors.

Our CV (Computer Vision) model understands what is shown in the photo. After recognition, we score and sort the photos.

The choice of the first photo in the gallery may also depend on where the user is located. We can show a photo with an interior on the search results page and on the hotel page — with a facade, and vice versa.

Our CV (Computer Vision) model understands what is shown in the photo. After recognition, we score and sort the photos.

Riddle #4

We are done with the search results. Let’s go to the hotel page and look only at the top of the page. How many DS products are there?

The correct answer is 3: matching hotels from different suppliers into a single metahotel, selecting hotel reviews, recognizing objects in photos.

Matching hotels from different suppliers into a single metahotel

ETG works with over 300 suppliers and has more than 120,000 direct hotel contracts. However, many suppliers send hotel information in formats that don’t suit us, like varying coordinates for the same hotel or different names for the same place, which could increase the risk of guests arriving at the wrong location.

We process hotel data in three steps: finding nearest neighbors, removing irrelevant pairings, and classifying.

To merge data from various suppliers into one “Metahotel,” we consider the coordinates, address, name, hotel class, and accommodation type. This merging is known as “Matching.”

ETG works with over 300 suppliers and has more than 120,000 direct hotel contracts. To merge data from various suppliers into one “Metahotel,” we consider the coordinates, address, name, hotel class, and accommodation type.

A selection of hotel reviews

Reviews help our users decide whether to book or not. We collect reviews from several sources and bring them into a single format. All 50 million reviews are translated into 20 product languages ​​by a small LLM.

We plan to make the selection of reviews smarter and provide the most relevant review for the user. We determine relevance based on the history of browsing and booking hotels.

Riddle #5

We go down the page to the rooms section. Let’s play one last time: how many DS products are there?

The correct answer is 2: room matching and recognition of objects in the room photo.

Room Matching

For hotels, we combine multiple instances of the same hotel from different providers into one “Meta-hotel.” At the room level, it’s pretty much the same. The same room in the same hotel from different providers can have different photos and descriptions.

We use a multimodal model with several steps to remove duplicates and make one “Meta-room” for each room type.

Recognizing objects in room photos

The property supplier sends us a room description and a photo, and sometimes they don’t match. We use Computer Vision (CV) to show only confirmed information about the room.

After recognizing objects, we combine the description from the property supplierwith our tags and get a list of room parameters.

Frequent iterations and hypothesis testing

Guessing user preferences is difficult. The model may work well during training but poorly after the release. A new DS product can undergo 345 iterations before it produces the expected results, or it may not produce any results at all.

At the search and booking stage, it is important to have good infrastructure and be able to experiment with new hypotheses.

Not everything always goes smoothly

Sometimes, models do not produce the results we expect. For example, some photos of toilets were identified as dining rooms. 

It also happened that a really bad hotel would pop up at the very top of the search results. Due to a large number of calculations, it is quite difficult to check how the models work in real-time. Similar to hotel and room photos, our Account Managers can periodically check that and make sure that a hotel participating in the Top Stays program appears at the top of the lists.


To be continued 

The whole process can be divided into two parts: before booking and after booking. This article discusses DS products that are used before booking. There are many more DS products that help us after booking. We have models and ML tools that the support service uses to identify high-risk bookings, recognize incoming requests, prepare a template for an e-mail response, and others.

Keep learning from the ETG blog to stay up-to-date with new articles!

Learn more about ETG

Go to the top
175
9 minutes read
Share with friends