4.5 Boyer Moore Algorithm(๋ณด์ด์–ด-๋ฌด์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜)

2024. 2. 21. 23:56ใ†CS - Roadmap.sh/4. String Search and Manipulations

CS - 4. String Search and Manipulations - 4.5. Boyer Moore Algorithm

Boyer Moore Algorithm

๐Ÿ’กBoyer Moore algorithm is a string searching algorithm that is used to find the index of a substring in a string. It is a very efficient algorithm that is used in many applications. It is used in text editors, compilers, and many other applications.

๋ณด์ด์–ด ๋ฌด์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฌธ์ž์—ด์—์„œ ํ•˜์œ„ ๋ฌธ์ž์—ด์˜ ์ธ๋ฑ์Šค๋ฅผ ์ฐพ๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋ฌธ์ž์—ด ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ๋งŽ์€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋งค์šฐ ํšจ์œจ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ํ…์ŠคํŠธ ํŽธ์ง‘๊ธฐ, ์ปดํŒŒ์ผ๋Ÿฌ ๋ฐ ๊ธฐํƒ€ ์—ฌ๋Ÿฌ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

 

 

๋ณด์ด์–ด-๋ฌด์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฌธ์ž์—ด ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘ ํ•˜๋‚˜๋กœ, ํŠนํžˆ ๊ธด ๋ฌธ์žฅ์—์„œ ์งง์€ ํŒจํ„ด์„ ์ฐพ์•„๋‚ด๋Š” ๋ฐ ๋งค์šฐ ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค. 1977๋…„์— Robert Boyer์™€ J Strother Moore์— ์˜ํ•ด ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋ณด์ด์–ด-๋ฌด์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํ•ต์‹ฌ ๊ฐœ๋…์€ '์˜ค๋ฅธ์ชฝ์—์„œ ์™ผ์ชฝ์œผ๋กœ' ํƒ์ƒ‰ํ•˜๋Š” ๊ฒƒ์ด๋ฉฐ, ์ด๋Š” ๋Œ€๋ถ€๋ถ„์˜ ๋‹ค๋ฅธ ๋ฌธ์ž์—ด ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ๋Š” ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ถˆ์ผ์น˜๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ ์ตœ๋Œ€ํ•œ ๋งŽ์ด ๊ฑด๋„ˆ๋›ฐ๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ๊ฒ€์ƒ‰ ์†๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค.

์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ํœด๋ฆฌ์Šคํ‹ฑ, ์ฆ‰ '๋‚˜์œ ๋ฌธ์ž ํœด๋ฆฌ์Šคํ‹ฑ'๊ณผ '์ข‹์€ ์ ‘๋ฏธ์‚ฌ ํœด๋ฆฌ์Šคํ‹ฑ'์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋‚˜์œ ๋ฌธ์ž ํœด๋ฆฌ์Šคํ‹ฑ: ํŒจํ„ด์—์„œ ๋ถˆ์ผ์น˜๊ฐ€ ๋ฐœ์ƒํ•œ ๋ฌธ์ž๊ฐ€ ํŒจํ„ด ๋‚ด์— ์กด์žฌํ•˜์ง€ ์•Š์œผ๋ฉด ํŒจํ„ด ์ „์ฒด๋ฅผ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ ํŒจํ„ด ๋‚ด์— ์กด์žฌํ•œ๋‹ค๋ฉด, ๋ถˆ์ผ์น˜๊ฐ€ ๋ฐœ์ƒํ•œ ๋ฌธ์ž๋ฅผ ํŒจํ„ด์˜ ๊ฐ€์žฅ ์˜ค๋ฅธ์ชฝ์— ์žˆ๋Š” ๋™์ผํ•œ ๋ฌธ์ž์— ๋งž์ถ”์–ด ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
์ข‹์€ ์ ‘๋ฏธ์‚ฌ ํœด๋ฆฌ์Šคํ‹ฑ: ํŒจํ„ด์—์„œ ์ผ์น˜ํ•˜๋Š” ์ ‘๋ฏธ์‚ฌ๊ฐ€ ๋ฐœ๊ฒฌ๋˜๋ฉด, ์ด ์ ‘๋ฏธ์‚ฌ๊ฐ€ ํŒจํ„ด ๋‚ด์—์„œ ๋‹ค์‹œ ๋ฐœ๊ฒฌ๋  ์œ„์น˜๋กœ ํŒจํ„ด์„ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
์ด ๋‘ ๊ฐ€์ง€ ํœด๋ฆฌ์Šคํ‹ฑ์€ ๊ฒฐํ•ฉ๋˜์–ด ์‚ฌ์šฉ๋˜๋ฉฐ, ๋‘ ์ด๋™ ์ค‘์—์„œ ๋” ๋ฉ€๋ฆฌ ์ด๋™ํ•˜๋Š” ์ชฝ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ณด์ด์–ด-๋ฌด์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฌธ์ž์—ด ๊ฒ€์ƒ‰ ๊ณผ์ •์„ ๋งค์šฐ ๋น ๋ฅด๊ฒŒ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํšจ์œจ์ ์ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์—์„œ '๋‚˜์œ ๋ฌธ์ž ํ…Œ์ด๋ธ”'๊ณผ '์ข‹์€ ์ ‘๋ฏธ์‚ฌ ํ…Œ์ด๋ธ”'์„ ์ƒ์„ฑํ•˜๊ณ  ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋ฌธ์ž์—ด ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ „์ฒ˜๋ฆฌ ๊ณผ์ • ๋•Œ๋ฌธ์— ์งง์€ ๋ฌธ์ž์—ด์ด๋‚˜ ๋‹จ์ผ ๊ฒ€์ƒ‰์—๋Š” ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์ด ๋” ํšจ์œจ์ ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

 

In Python
def generate_bad_char_shift(term):
    skip_list = {}
    for i in range(0, len(term)-1):
        skip_list[term[i]] = len(term)-i-1
    return skip_list

def search_boyer_moore(text, pattern):
    bad_char_shift = generate_bad_char_shift(pattern)
    i = 0
    while i < len(text)-len(pattern)+1:
        j = len(pattern)
        while j > 0 and pattern[j-1] == text[i+j-1]:
            j -= 1
        if j > 0:
            bad_char_shift_value = bad_char_shift.get(text[i+j-1], len(pattern))
            i += bad_char_shift_value
        else:
            return i
    return -1

 

ํ•˜๋‚˜ ์ฐพ์„์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๊ฒŒ ์ข€ ์—ฌ๋Ÿฌ๊ฐœ๋ฅผ ์ฐพ์œผ๋ ค๋ฉด ํ•จ์ˆ˜๋ฅผ ์ˆ˜์ •ํ•ด์•ผํ•œ๋‹ค.

 

 

def search_boyer_moore(text, pattern):
    bad_char_shift = generate_bad_char_shift(pattern)  # ๋‚˜์œ ๋ฌธ์ž ํœด๋ฆฌ์Šคํ‹ฑ์— ์‚ฌ์šฉํ•  skip_list ์ƒ์„ฑ
    i = 0  # ๊ฒ€์ƒ‰ ์‹œ์ž‘ ์œ„์น˜
    positions = []  # ํŒจํ„ด์ด ๋ฐœ๊ฒฌ๋œ ์œ„์น˜๋ฅผ ์ €์žฅํ•  ๋ฆฌ์ŠคํŠธ
    while i < len(text)-len(pattern)+1:  # ํ…์ŠคํŠธ๋ฅผ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ์ด๋™ํ•˜๋ฉด์„œ
        j = len(pattern)  # ํŒจํ„ด์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์ž๋ถ€ํ„ฐ ๋น„๊ต ์‹œ์ž‘
        while j > 0 and pattern[j-1] == text[i+j-1]:  # ํŒจํ„ด์˜ ๋ฌธ์ž์™€ ํ…์ŠคํŠธ์˜ ๋ฌธ์ž๊ฐ€ ์ผ์น˜ํ•˜๋Š” ๋™์•ˆ
            j -= 1  # ํŒจํ„ด์„ ์™ผ์ชฝ์œผ๋กœ ์ด๋™
        if j > 0:  # ๋ถˆ์ผ์น˜ ๋ฐœ์ƒ ์‹œ
            # ๋‚˜์œ ๋ฌธ์ž ํœด๋ฆฌ์Šคํ‹ฑ์— ๋”ฐ๋ผ ํ…์ŠคํŠธ์˜ ์ด๋™ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ
            bad_char_shift_value = bad_char_shift.get(text[i+j-1], len(pattern))
            i += bad_char_shift_value  # ํ…์ŠคํŠธ๋ฅผ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ์ด๋™
        else:  # ํŒจํ„ด์˜ ๋ชจ๋“  ๋ฌธ์ž๊ฐ€ ์ผ์น˜ํ•˜๋Š” ๊ฒฝ์šฐ
            positions.append(i)  # ํ˜„์žฌ ์œ„์น˜๋ฅผ ๋ฆฌ์ŠคํŠธ์— ์ถ”๊ฐ€
            i += 1  # ๋‹ค์Œ ์œ„์น˜๋กœ ์ด๋™
    return positions  # ๋ชจ๋“  ์œ„์น˜ ๋ฐ˜ํ™˜

 

์ด๋Ÿฐ์‹์œผ๋กœ ๋ณด์ด์–ด-๋ฌด์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๋ฌธ์ž์—ด์„ ์ฐพ์„์ˆ˜ ์žˆ๋‹ค.

 

 

Appendix

๋ณด์ด์–ด-๋ฌด์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฌธ์ž์—ด ํƒ์ƒ‰ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ ํ™œ์šฉ๋ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ๊ธด ํ…์ŠคํŠธ์—์„œ ์งง์€ ํŒจํ„ด์„ ์ฐพ๋Š” ๊ฒฝ์šฐ์— ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๋ณด์ด์–ด-๋ฌด์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋ถˆ์ผ์น˜๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ ๊ฐ€๋Šฅํ•œ ํ•œ ๋งŽ์ด ๊ฑด๋„ˆ๋›ฐ์–ด ํƒ์ƒ‰ ์†๋„๋ฅผ ๋†’์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

 

KMP ๋ž‘ ๋น„๊ตํ•œ๋‹ค๋ฉด.

 

๋ฐฉํ–ฅ์„ฑ

- KMP ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํ…์ŠคํŠธ๋ฅผ ์™ผ์ชฝ์—์„œ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ์Šค์บ”ํ•ฉ๋‹ˆ๋‹ค.

- ๋ณด์ด์–ด-๋ฌด์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์˜ค๋ฅธ์ชฝ์—์„œ ์™ผ์ชฝ์œผ๋กœ ์Šค์บ”ํ•ฉ๋‹ˆ๋‹ค.


ํšจ์œจ์„ฑ

๋ณด์ด์–ด-๋ฌด์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ผ๋ฐ˜์ ์œผ๋กœ KMP ์•Œ๊ณ ๋ฆฌ์ฆ˜๋ณด๋‹ค ๋” ๋น ๋ฅด๋ฉฐ, ํŠนํžˆ ๋Œ€์ƒ ํ…์ŠคํŠธ๊ฐ€ ์•„์ฃผ ํด ๋•Œ ๋”์šฑ ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๋ณด์ด์–ด-๋ฌด์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋ถˆ์ผ์น˜๋ฅผ ๋ฐœ๊ฒฌํ•˜๋ฉด ์ตœ๋Œ€ํ•œ ๋งŽ์ด ๊ฑด๋„ˆ๋›ฐ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด, KMP ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํ…์ŠคํŠธ๋ฅผ ํ•œ ๋ฒˆ๋งŒ ํ†ต๊ณผํ•˜๋ฉฐ, ํŒจํ„ด์— ๋Œ€ํ•œ ๋ถ€๋ถ„์ ์ธ ์ผ์น˜๋ฅผ ํ™œ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ถˆํ•„์š”ํ•œ ๋น„๊ต๋ฅผ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค.


์ „์ฒ˜๋ฆฌ

๋‘ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ชจ๋‘ ํŒจํ„ด์„ ์ „์ฒ˜๋ฆฌํ•˜๋Š” ๋‹จ๊ณ„๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. KMP ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํŒจํ„ด์˜ ์ ‘๋‘์‚ฌ์™€ ์ ‘๋ฏธ์‚ฌ๊ฐ€ ์ผ์น˜ํ•˜๋Š” ๊ธธ์ด๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ์ „์ฒ˜๋ฆฌ ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด์— ๋ณด์ด์–ด-๋ฌด์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ '๋‚˜์œ ๋ฌธ์ž ํœด๋ฆฌ์Šคํ‹ฑ'๊ณผ '์ข‹์€ ์ ‘๋ฏธ์‚ฌ ํœด๋ฆฌ์Šคํ‹ฑ'์„ ์ด์šฉํ•œ ์ „์ฒ˜๋ฆฌ ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

728x90