12 February 2025

Improving any video platform search speed with and without Elasticsearch while ensuring fuzzy search, typo tolerance, and fast performance.

Improving search speed without Elasticsearch while ensuring fuzzy search, typo tolerance, and fast performance. e.g. PlayTube

Education

599 Views

0 Share

Advertisement

Advertise With Vidude



We take PlayTube as an example, The sources/search/content.php file begins by checking if a keyword parameter is provided in the request. It then sanitizes the input and initializes a default message for no results. I'll now search for the part of the script where it queries the database to fetch video search results. ​

The search queries in content.php use two different approaches:

  1. Full-Text Search (MATCH...AGAINST) – This is used for title searches but may not support fuzzy matching or typos well.
  2. LIKE-based Search (title LIKE '%keyword%') – This is slower on large datasets because it scans the entire table.

Both methods are inefficient for large databases with over 100K video entries, causing slow searches.

Alternative Search Optimization Without Elasticsearch

To improve speed and support typo tolerance and fuzzy search, I suggest:

1. Use MySQL FULLTEXT with n-gram and Fuzzy Matching

  • Modify the title column in T_VIDEOS to use FULLTEXT with IN NATURAL LANGUAGE MODE.
  • Implement a fuzzy matching function with Levenshtein Distance in PHP to catch typos.

2. Use Indexed Full-Text Search for Speed

  • Ensure FULLTEXT(title, description, tags) is enabled on T_VIDEOS.
  • Change MATCH (title) AGAINST ('$keyword') to use BOOLEAN MODE (IN BOOLEAN MODE), which allows more flexible searches.

3. Optimize LIKE Queries

  • Replace LIKE '%keyword%' with LIKE 'keyword%' where possible (prefix searches are faster).
  • Add an additional INDEX for title and description.

4. Implement a Fuzzy Search Helper in PHP

  • Use Levenshtein Distance or Soundex to find similar words for typo handling.

 

Database Size vs. Performance

Number of VideosExpected Search Speed
200K - 1MFast (~500ms - 1s) with proper indexing
1M - 3MModerate (~2ms - 5s) but still usable
3M - 5M+Slow (~5s - 8s), may require caching
5M+

Struggles (~10s+), alternative storage like Redis or SphinxSearch needed

By default, MySQL FULLTEXT search doesn’t handle long natural language queries like "How to make a cheesecake?" effectively. However, we can make it work similar to Google or YouTube search with these improvements:

1. Natural Language Processing (NLP) Preprocessing

  • Strip unnecessary words (stop words) like "to", "a", "how".
  • Convert "How to make a cheesecake?" into:
    • "make cheesecake" (core keywords)
    • "cheesecake recipe", "cheesecake tutorial" (synonyms)

2. Query Transformation for Better Results

  • Convert user input into multiple search patterns:
    sql
    SELECT * FROM videos WHERE MATCH(title, description) AGAINST ('cheesecake tutorial' IN BOOLEAN MODE) OR title LIKE '%cheesecake%' OR title LIKE '%make cheesecake%' LIMIT 20;
  • This method prioritizes relevant results while keeping fuzzy matching.

3. Fuzzy Search for Typos

  • If a user types "How to make a cheescak?" (typo in cheesecake), use Levenshtein Distance in PHP to correct spelling.

4. Handling Questions in Search

  • If a user searches for "How to make a cheesecake?", we:
    1. Remove stop words (how, to, a).
    2. Convert it into "make cheesecake".
    3. Use fuzzy logic for typos.

 

Here’s a full step-by-step implementation plan to optimize video search in MySQL while supporting fuzzy search, typo tolerance, and natural language queries—without Elasticsearch.

Step 1: Modify Database for Optimized Search

1.1 Add FULLTEXT Index on Videos Table

  • Open your MySQL database and run:
sql
ALTER TABLE videos ADD FULLTEXT(title, description, tags);

This enables efficient search using MATCH() AGAINST() instead of slow LIKE '%keyword%'.

1.2 Enable MySQL Boolean Search Mode

  • Use IN BOOLEAN MODE to allow multi-word searches and better ranking.

1.3 Create a synonyms Table (Optional)

  • Helps handle searches like "How to make a cheesecake?""cheesecake recipe".
sql
CREATE TABLE synonyms ( id INT AUTO_INCREMENT PRIMARY KEY, word VARCHAR(255) NOT NULL, synonym VARCHAR(255) NOT NULL );
  • Insert some synonyms:
sql
INSERT INTO synonyms (word, synonym) VALUES ('make', 'recipe'), ('create', 'recipe'), ('tutorial', 'recipe');

This helps expand searches automatically.


Step 2: Preprocess User Input in PHP

2.1 Remove Stop Words (e.g., "how", "to", "a")

  • Create a PHP function to clean queries:
php
function cleanQuery($query) { $stop_words = ['how', 'to', 'a', 'the', 'is', 'in', 'on', 'at', 'for', 'of', 'by', 'and', 'with']; $query_words = explode(' ', strtolower($query)); $filtered_words = array_diff($query_words, $stop_words); return implode(' ', $filtered_words); }
  • Example:
php
$clean_keyword = cleanQuery("How to make a cheesecake?"); // Output: "make cheesecake"

2.2 Correct Typos with Levenshtein Distance

  • Use PHP’s Levenshtein to find close matches for misspelled words:
php
function correctTypo($input, $dictionary) { $shortest = -1; $closest = $input; foreach ($dictionary as $word) { $lev = levenshtein($input, $word); if ($lev == 0) { return $word; } if ($lev < $shortest || $shortest < 0) { $closest = $word; $shortest = $lev; } } return $closest; } // Example usage: $dictionary = ['cheesecake', 'recipe', 'tutorial', 'make']; $corrected_word = correctTypo('cheescak', $dictionary); // Output: "cheesecake"

Step 3: Optimize Search Query

3.1 Use FULLTEXT Search with Fallback

  • Modify content.php to use FULLTEXT + LIKE in a hybrid search.
php
$keyword = PT_Secure($_GET['keyword']); $clean_keyword = cleanQuery($keyword); // Prepare fuzzy matching (correction) $dictionary = ['cheesecake', 'recipe', 'tutorial', 'make']; $corrected_keyword = correctTypo($clean_keyword, $dictionary); // SQL Query with FULLTEXT + LIKE $sql = "SELECT * FROM videos WHERE MATCH(title, description, tags) AGAINST ('$corrected_keyword' IN BOOLEAN MODE) OR title LIKE '%$corrected_keyword%' OR description LIKE '%$corrected_keyword%' ORDER BY id DESC LIMIT 20"; $get_videos = mysqli_query($db, $sql);

Step 4: Improve Ranking (Smart Search)

4.1 Boost Exact Matches

Modify SQL to prioritize full matches over partial:

sql
SELECT *, MATCH(title) AGAINST ('$corrected_keyword') AS relevance FROM videos WHERE MATCH(title, description, tags) AGAINST ('$corrected_keyword' IN BOOLEAN MODE) ORDER BY relevance DESC, views DESC LIMIT 20;
  • Exact match videos appear first.
  • Most viewed videos rank higher.

Step 5: Cache Frequent Searches

5.1 Use PHP + Redis for Speed

  1. Enable Redis on Your Server:
bash
sudo apt install redis
  1. Store & Retrieve Cached Queries:
php
$redis = new Redis(); $redis->connect('127.0.0.1', 6379); $cache_key = "search_".md5($corrected_keyword); $cached_results = $redis->get($cache_key); if ($cached_results) { $get_videos = json_decode($cached_results, true); } else { // Run SQL search query $result = mysqli_query($db, $sql); $videos = mysqli_fetch_all($result, MYSQLI_ASSOC); // Store results in Redis cache $redis->setex($cache_key, 3600, json_encode($videos)); // Cache for 1 hour }

This reduces database load by caching frequent searches. ?


Step 6: Implement Auto-Suggestions

Modify ajax/search.php:

php
$suggestions_sql = "SELECT title FROM videos WHERE MATCH(title) AGAINST ('$clean_keyword*' IN BOOLEAN MODE) LIMIT 5"; $result = mysqli_query($db, $suggestions_sql); $suggestions = mysqli_fetch_all($result, MYSQLI_ASSOC); echo json_encode($suggestions);

This makes real-time search suggestions like YouTube.

 

If you want faster search performance, Elasticsearch is a great choice because:

Super Fast Search – Faster than MySQL FULLTEXT
Fuzzy Search & Typos Handling – Built-in typo correction
Scalability – Handles millions of videos easily
Advanced Ranking – More accurate search results

Installing Elasticsearch

 Below is a simple example of how to install Elasticsearch via Plesk Docker

? How to Install Elasticsearch via Plesk Docker

1️⃣ Go to Docker in Plesk
2️⃣ Search for elasticsearch
3️⃣ Select the latest stable version (e.g.7.9.3 or 8.17.0)
4️⃣ Click "Run", then configure:

  • Set Port Mapping: 9200 → 9200
  • Set Memory Limit: 2GB+ (for best performance)
  • Set Environment Variables:
    ini
    discovery.type=single-node ES_JAVA_OPTS=-Xms1g -Xmx2g

5️⃣ Start the container, then test if it's running:

bash
curl http://localhost:9200

Expected response:

json
{ "name": "your-server", "cluster_name": "docker-cluster", "version": { "number": "7.9.3" } }

Next Steps: Integrate Elasticsearch with Video Search

Now that Elasticsearch is installed, you need to:

  1. Index your videos into Elasticsearch
  2. Modify content.php to query Elasticsearch instead of MySQL
  3. Ensure fuzzy search and typo correction works

 

Now That Elasticsearch Is Running, Here’s How to Use It for Faster Search

Now that Plesk Elasticsearch x.xx.x is running smoothly, we can replace MySQL search with Elasticsearch, making searches faster, more accurate, and typo-tolerant.

Step 1: Index Your Videos into Elasticsearch

Before we can use Elasticsearch for search, we need to store your videos in Elasticsearch.

Create a New Index for Videos

Run the following cURL command to create an index:

bash
curl -XPUT "http://localhost:9200/videos" -H 'Content-Type: application/json' -d' { "settings": { "number_of_shards": 1, "number_of_replicas": 0 }, "mappings": { "properties": { "id": { "type": "integer" }, "title": { "type": "text", "analyzer": "standard" }, "description": { "type": "text", "analyzer": "standard" }, "tags": { "type": "text", "analyzer": "standard" }, "views": { "type": "integer" }, "published_at": { "type": "date" } } } }'

✅ This creates an Elasticsearch index (videos) to store video data.


Step 2: Import Your Existing Videos

Modify sources/search/content.php to insert videos into Elasticsearch.

Add a Function to Index Videos

Find where videos are added to MySQL (INSERT INTO videos).
Add this Elasticsearch indexing function in PHP:

php
function indexVideoInElasticsearch($video) { $url = 'http://localhost:9200/videos/_doc/' . $video['id']; $data = json_encode($video); $ch = curl_init($url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "PUT"); curl_setopt($ch, CURLOPT_POSTFIELDS, $data); curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json')); $response = curl_exec($ch); curl_close($ch); return $response; } // Example: Index a video after adding it to MySQL $video = [ "id" => 123, "title" => "How to Make a Cheesecake?", "description" => "Learn how to make a cheesecake step by step!", "tags" => "cheesecake, dessert, cooking", "views" => 5000, "published_at" => "2025-02-10T10:00:00" ]; indexVideoInElasticsearch($video);

✅ Every time a new video is added, it’s also indexed in Elasticsearch.


Step 3: Modify Video Search to Use Elasticsearch

Now, instead of searching MySQL, modify sources/search/content.php to search Elasticsearch.

Replace MySQL Search Query

Find this old MySQL query:

php
$sql = "SELECT * FROM videos WHERE title LIKE '%$keyword%' ORDER BY id ASC LIMIT 20";

Replace it with this Elasticsearch query:

<div class="flex items-center text-token-text-secondary px-4 py-2 text-xs font-sans justify-between rounded-t-[5px] h-9 bg-token-sidebar-surface-primary dark:bg-token-main-surface-secondary select-none&q

0
 
0

5 Comments

DamarisSom

17 days ago
I love how you took a complex topic and made it so digestible. This is exactly the type of content I enjoy reading. Keep up the amazing work! 🔥
0 0 Reply

professional Inc

17 days ago
Wow, I actually learned something new today! This post made me see things in a different light, and I’m so glad I stumbled upon it. 🎯
0 0 Reply

luciletrudel5

18 days ago
Really impressed by how thoroughly this was covered! You left no stone unturned, and it made for such a rewarding read. 🔥
0 0 Reply

MoseOKeeff

18 days ago
Great work on this post! Can't wait for more! 🚀
0 0 Reply

Mythos Car Rentals

18 days ago
Really impressed by how thoroughly this was covered! You left no stone unturned, and it made for such a rewarding read. 🔥
0 0 Reply
Show more