developer9 min read

Regular Expressions (Regex) for Beginners: A Practical Guide

Learn regex from scratch — syntax, character classes, quantifiers, anchors, and 10 practical patterns for validating emails, URLs, phone numbers, and more.

TN

ToolNest Team

September 10, 2025

#regex#regular expressions#developer tools

What Is Regex?

A regular expression (regex or regexp) is a sequence of characters that defines a search pattern. Regex is built into almost every programming language and text editor. It's used for:

  • Validation — Does this string match the expected format?
  • Search and replace — Find all occurrences of a pattern and replace them
  • Extraction — Pull specific data out of a larger string
  • Splitting — Split a string on complex patterns

Regex looks intimidating at first, but you only need a handful of concepts to handle 90% of use cases.

Basic Syntax

Literal characters — Match themselves exactly:

Pattern: cat
Matches: "cat" in "the cat sat on the mat"

The dot (.) — Matches any single character (except newline by default):

Pattern: c.t
Matches: "cat", "cut", "cot", "c@t"

Character Classes

Square brackets define a set of characters to match:

[abc]     — matches a, b, or c
[a-z]     — matches any lowercase letter
[A-Z]     — matches any uppercase letter
[0-9]     — matches any digit
[a-zA-Z]  — matches any letter
[^abc]    — matches anything EXCEPT a, b, or c (negation)

Shorthand character classes:

\d   — any digit [0-9]
\D   — any non-digit [^0-9]
\w   — word character [a-zA-Z0-9_]
\W   — non-word character
\s   — whitespace (space, tab, newline)
\S   — non-whitespace

Quantifiers

Quantifiers control how many times a character or group must appear:

*     — 0 or more
+     — 1 or more
?     — 0 or 1 (optional)
{3}   — exactly 3
{2,5} — 2 to 5
{3,}  — 3 or more

Examples:

\d+       — one or more digits: "42", "1000"
[a-z]{3}  — exactly 3 lowercase letters: "cat", "dog"
colou?r   — "color" or "colour" (u is optional)

Anchors

Anchors don't match characters — they match positions:

^  — start of string (or line with m flag)
$  — end of string (or line with m flag)
\b — word boundary
^hello    — "hello" at the start of the string
world$    — "world" at the end
^hello$   — string is exactly "hello", nothing else
\bcat\b   — "cat" as a whole word, not "category" or "catfish"

Groups and Alternation

Parentheses create a capturing group:

(ab)+    — one or more repetitions of "ab": "ab", "abab", "ababab"
(cat|dog) — matches "cat" OR "dog"

Non-capturing group (for grouping without capturing):

(?:cat|dog)+s    — "cats", "dogs", "catcats"

Flags

Flags modify regex behavior:

/pattern/g   — global: find all matches (not just first)
/pattern/i   — case-insensitive: "Cat" matches /cat/i
/pattern/m   — multiline: ^ and $ match line boundaries
/pattern/gi  — global + case-insensitive

10 Practical Regex Patterns

1. Email address (simplified):

/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/

2. US phone number:

/^\+?1?[-.\s]?\(?[0-9]{3}\)?[-.\s]?[0-9]{3}[-.\s]?[0-9]{4}$/

Matches: (555) 123-4567, 555-123-4567, 5551234567, +1 555 123 4567

3. URL:

/https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_+.~#?&/=]*)/

4. Date (YYYY-MM-DD):

/^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/

5. IP address (IPv4):

/^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$/

6. Hex color code:

/^#([0-9A-Fa-f]{6}|[0-9A-Fa-f]{3})$/

7. Postal code (US ZIP):

/^\d{5}(-\d{4})?$/

8. Strong password:

/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%]).{12,}$/

(Lookaheads require at least one lowercase, uppercase, digit, and special char)

9. Extract content between tags:

/<h1>(.*?)<\/h1>/g

10. Remove extra whitespace:

/\s+/g → replace with " "

Greedy vs Lazy Matching

By default, quantifiers are greedy — they match as much as possible.

Pattern: <.+>
String:  <div>Hello</div>
Match:   <div>Hello</div>  (the whole thing!)

Adding ? makes them lazy — they match as little as possible:

Pattern: <.+?>
String:  <div>Hello</div>
Matches: <div> and </div> separately

Common Pitfalls

  1. Forgetting to escape special characters., *, +, ?, (, ), [, ], {, }, \, ^, $, | all have special meanings. Use \. to match a literal dot.

  2. Greedy vs lazy — If your match is too long, add ? after the quantifier.

  3. Catastrophic backtracking — Some regex patterns on certain inputs can take exponential time. Avoid nested quantifiers like (a+)+.

  4. Forgetting anchors for full-string match\d+ matches any string that contains digits, including "abc123xyz". Use ^\d+$ to ensure the entire string is digits.

Use our free Regex Tester to test and debug your regular expressions with live match highlighting.

Share this article

Try the Free Tool