4.4. Knuth Morris Pratt (KMP ์•Œ๊ณ ๋ฆฌ์ฆ˜)

2024. 2. 20. 19:08ใ†CS - Roadmap.sh/4. String Search and Manipulations

CS - 4. String Search and Manipulations - 4.4. Knuth morris pratt 

 

Knuth Morris Pratt

๐Ÿ’กKnuth morris pratt is a string searching algorithm that uses a precomputed array to find the substring in a string. This array is known as the prefix function. The prefix function is the longest prefix that is also a suffix of a substring. The prefix function is used to skip the characters that are already matched. The algorithm is as follows:

  • Compute the prefix function of the substring.
  • Traverse through the string and substring simultaneously.
  • If the characters match, increment the index of both the string and substring.
  • If the characters don’t match, increment the index of the string by the value of the prefix function at the index of the substring.

 

KMP ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฏธ๋ฆฌ ๊ณ„์‚ฐ๋œ ๋ฐฐ์—ด์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์ž์—ด์˜ ํ•˜์œ„ ๋ฌธ์ž์—ด์„ ์ฐพ๋Š” ๋ฌธ์ž์—ด ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ์ด ๋ฐฐ์—ด์„ ์ ‘๋‘์‚ฌ ํ•จ์ˆ˜๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ ‘๋‘์‚ฌ ํ•จ์ˆ˜๋Š” ํ•˜์œ„ ๋ฌธ์ž์—ด์˜ ์ ‘๋ฏธ์‚ฌ์ด๊ธฐ๋„ ํ•œ ๊ฐ€์žฅ ๊ธด ์ ‘๋‘์‚ฌ์ž…๋‹ˆ๋‹ค. ์ ‘๋‘์‚ฌ ํ•จ์ˆ˜๋Š” ์ด๋ฏธ ์ผ์น˜ํ•˜๋Š” ๋ฌธ์ž๋ฅผ ๊ฑด๋„ˆ๋›ฐ๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

 

  • ํ•˜์œ„ ๋ฌธ์ž์—ด์˜ ์ ‘๋‘์‚ฌ ํ•จ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฌธ์ž์—ด๊ณผ ๋ถ€๋ถ„ ๋ฌธ์ž์—ด์„ ๋™์‹œ์— ์ˆœํšŒํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฌธ์ž๊ฐ€ ์ผ์น˜ํ•˜๋ฉด ๋ฌธ์ž์—ด๊ณผ ๋ถ€๋ถ„ ๋ฌธ์ž์—ด์˜ ์ธ๋ฑ์Šค๋ฅผ ๋ชจ๋‘ ์ฆ๊ฐ€์‹œํ‚ต๋‹ˆ๋‹ค.
  • ๋ฌธ์ž๊ฐ€ ์ผ์น˜ํ•˜์ง€ ์•Š์œผ๋ฉด ๋ฌธ์ž์—ด์˜ ์ธ๋ฑ์Šค์—์„œ ๋ถ€๋ถ„ ๋ฌธ์ž์—ด์˜ ์ธ๋ฑ์Šค์— ์žˆ๋Š” ์ ‘๋‘์‚ฌ ํ•จ์ˆ˜์˜ ๊ฐ’๋งŒํผ ๋ฌธ์ž์—ด์˜ ์ธ๋ฑ์Šค๋ฅผ ์ฆ๊ฐ€์‹œํ‚ต๋‹ˆ๋‹ค.

Knuth-Morris-Pratt (KMP) ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฌธ์ž์—ด ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ, ์ฃผ์–ด์ง„ ํ…์ŠคํŠธ์—์„œ ํŒจํ„ด์ด ์ผ์น˜ํ•˜๋Š” ์œ„์น˜๋ฅผ ์ฐพ๋Š”๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์ธ ์•„์ด๋””์–ด๋Š” ๊ฐ„๋‹จํ•œ ๋ฌธ์ž์—ด ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๋‹ค๋ฅด๊ฒŒ, ํŒจํ„ด์˜ ๊ฐ ์œ„์น˜์—์„œ ๋ถˆ์ผ์น˜๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ํŒจํ„ด์„ ๊ฐ€๋Šฅํ•œ ํ•œ ์ตœ๋Œ€ํ•œ ์ด๋™์‹œํ‚ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด "๋ถ€๋ถ„ ์ผ์น˜" ํ…Œ์ด๋ธ”(๋˜๋Š” "์‹คํŒจ ํ•จ์ˆ˜")์„ ๋ฏธ๋ฆฌ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

 

์ด๋Š” ์›Œ๋“œํ”„๋กœ์„ธ์„œ๊ฐ™์€ ์ˆ˜์ฒœ, ์ˆ˜๋ฐฑ๋งŒ ๋‹จ์–ด์—์„œ ํŠน์ • ๋ฌธ์ž์—ด ํŒจํ„ด์„ ์ฐพ์„๋•Œ ํšจ์œจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. KMP ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํ…์ŠคํŠธ ๋‚ด์—์„œ ๊ฐ€์žฅ ๋น ๋ฅด๊ฒŒ ์›ํ•˜๋Š” ๋ฌธ์ž์—ด์„ ์ฐพ์•„๋‚ด๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

 

In Python
def kmp_search(text, pattern):
    pattern_length = len(pattern)  
    text_length = len(text)  
    lps = [0]*pattern_length  # ๋ถ€๋ถ„ ์ผ์น˜ ํ…Œ์ด๋ธ” ์ดˆ๊ธฐํ™”
    j = 0  # ํŒจํ„ด ๋ฌธ์ž์—ด์˜ ์ธ๋ฑ์Šค

    # ๋ถ€๋ถ„ ์ผ์น˜ ํ…Œ์ด๋ธ” ๊ณ„์‚ฐ(์•„๋ž˜ ํ•จ์ˆ˜์— ์ •์˜ํ•จ)
    compute_lps_array(pattern, pattern_length, lps)

    i = 0  # ํ…์ŠคํŠธ ๋ฌธ์ž์—ด์˜ ์ธ๋ฑ์Šค
    while i < text_length:
        if pattern[j] == text[i]:  # ๋ฌธ์ž๊ฐ€ ์ผ์น˜ํ•˜๋ฉด ๋‘˜ ๋‹ค ์ฆ๊ฐ€
            i += 1
            j += 1

        if j == pattern_length:  # ํŒจํ„ด ๋ฌธ์ž์—ด์„ ๋๊นŒ์ง€ ๋น„๊ตํ–ˆ์œผ๋ฉด ํŒจํ„ด ๋ฐœ๊ฒฌ
            print("ํŒจํ„ด์ด ๋ฐœ๊ฒฌ๋œ ์œ„์น˜: " + str(i-j))
            j = lps[j-1]  # ๋‹ค์Œ ๋น„๊ต๋ฅผ ์œ„ํ•ด ์ธ๋ฑ์Šค ๊ฐ์†Œ

        # ํŒจํ„ด์ด ๋ถˆ์ผ์น˜ํ•˜๊ณ  ํ…์ŠคํŠธ๊ฐ€ ์•„์ง ๋๋‚˜์ง€ ์•Š์•˜๋‹ค๋ฉด
        elif i < text_length and pattern[j] != text[i]:
            if j != 0:  # ํŒจํ„ด์˜ ์ค‘๊ฐ„์—์„œ ๋ถˆ์ผ์น˜๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด
                j = lps[j-1]  # lps๊ฐ’์„ ์ด์šฉํ•˜์—ฌ j๋ฅผ ๊ฐ์†Œ
            else:  # ํŒจํ„ด์˜ ์ฒซ ๋ฒˆ์งธ ๋ฌธ์ž์—์„œ ๋ถˆ์ผ์น˜๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด
                i += 1  # ํ…์ŠคํŠธ ๋ฌธ์ž์—ด์˜ ์ธ๋ฑ์Šค๋งŒ ์ฆ๊ฐ€


# ๋ถ€๋ถ„ ์ผ์น˜ ํ…Œ์ด๋ธ”(LPS ๋ฐฐ์—ด) ๊ณ„์‚ฐ ํ•จ์ˆ˜
def compute_lps_array(pattern, pattern_length, lps):
    length = 0  # ์ด์ „์— ๊ณ„์‚ฐ๋œ ๊ฐ€์žฅ ๊ธด ์ ‘๋‘์‚ฌ/์ ‘๋ฏธ์‚ฌ์˜ ๊ธธ์ด
    lps[0] = 0  # lps[0]์€ ํ•ญ์ƒ 0
    i = 1  # lps ์ธ๋ฑ์Šค

    # ํŒจํ„ด์˜ ๋‘ ๋ฒˆ์งธ ๋ฌธ์ž๋ถ€ํ„ฐ lps ๋ฐฐ์—ด์„ ์ฑ„์›Œ๋‚˜๊ฐ
    while i < pattern_length:
        if pattern[i] == pattern[length]:  # ํŒจํ„ด์˜ ๋ฌธ์ž๊ฐ€ ์ผ์น˜ํ•˜๋ฉด
            length += 1  # ๊ธธ์ด ์ฆ๊ฐ€
            lps[i] = length  # lps ๋ฐฐ์—ด์— ๊ธธ์ด ๊ฐ’ ํ• ๋‹น
            i += 1
        else:
            if length != 0:  # ์ด์ „ ์ธ๋ฑ์Šค์—์„œ ์ผ์น˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์œผ๋ฉด
                length = lps[length-1]  # ์ด์ „ lps ๊ฐ’์„ ์‚ฌ์šฉ
            else:  # ์ด์ „ ์ธ๋ฑ์Šค์—์„œ ์ผ์น˜๊ฐ€ ์—†์—ˆ์œผ๋ฉด
                lps[i] = 0  # lps ๊ฐ’์€ 0
                i += 1

 

 

 

 

 

๋ฐฉ๊ธˆ ์œ„์— ์“ด ๊ธ€์„ ๊ฐ€์ง€๊ณ  KMP ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

 

 

ํ™œ์šฉ

 

https://www.acmicpc.net/problem/1786(๋ฐฑ์ค€ 1786 ์ฐพ๊ธฐ)

์‹ค์ œ๋กœ ctrl + F ๋ฅผ ํ†ตํ•ด์„œ ์ฐพ๊ธฐ๋ฅผ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ตฌํ˜„ํ•˜๋Š” ๋ฌธ์ œ.

728x90