Fixing common errors in your novel with Find and Replace and Regex

Tags

No tags :(

Share it

Pretty much any author can use find and replace. But how many can really use it effectively, even to save money? I can.

This article will include simple find and replace suggestions. But most authors have never even heard of Regular Expressions. Regular Expressions, often called Regex for short, are really a techies tool for finding a pattern of characters. But you can really fix a lot of issues and know they are 100% fixed without having to wait for an editor to catch them.

However, if you are using Microsoft Word, you may not even know you can use regular expressions, but they are there, though not exactly fully featured, in the Advanced Find when you click More. You have a lot of search options and regular expressions in word will be different if you have Use Wildcards checked or not.

Sigil, an open source eBook authoring tool, has pretty much full support for regular expressions. However, because it is html, there is often html tags that you can use, such as </p> which ends a paragraph.

So if you fix as many errors as you can with Find and Replace and regex, then you are making the editors job easier. The editor won’t have to spend time to fix them. If you pay your editor hourly, then they will spend less hours and you spend less money. Also, the editor will be able to focus on content and real problems rather than stupid grammatical issues that can be automatically fixed by a computer.

Here is a nice list of I have created. I will mark them as whether you can using Find and Replace or whether you have to do a manual replace.

  1. Find every paragraph that does NOT end with punctuation so you can easily fix it. (Sorry, this doesn’t fix missing punctuation between sentences in a paragraph.)
    Regex: [A-Za-z]$
    Sigil: [A-Za-z]</p>
    Word: ^$^p
    

    Replace: Manual. Because you as the author must determine the punctuation and because some things that shouldn’t have punctuation, like chapter headings, might be returned in the search.

  2. Sometimes there are one or more needless spaces at the end of a paragraph.
    Regex: [ t]+$
    Sigil: [ t]+</p>
    Word: ^w^p
    

    Replace: Empty value. Just replace it with nothing and it will fix.

  3. Sometimes there is one or more needless spaces at the start of a paragraph.
    Regex: ^\s+
    Sigil: <p>\s+
    Word: ^p^w
    

    Replace: Empty value. Just replace it with nothing and it will fix.

  4. All sentences that do not start with a capital letter:
    Regex: [A-Za-z][.?!]["”]*\s+[a-z]
    Sigil: [A-Za-z][.?!]["”]*\s+[a-z]
    Word: Check the "Use Wildcards" checkbox: [A-z][\!\.\?] [a-z]
    

    Replace: Manual

  5. All end quotes followed by a capital letter to see if should be a lowercase letter.
    Example needing fixed: “Go away.” She said.
    After it was fixed: “Go away,” she said.

    Regex: [^,]["”] [A-Z]
    Sigil: [^,]["”] [A-Z]
    Word: Check the "Use Wildcards" checkbox: [A-z][\!,]["”] [A-Z]
    

    Replace: Manual

  6. Find all opening quotes following by missing quotes or embedded quotes. This requires the quotes to be smart quotes. If you aren’t using smart quotes, you don’t know if a quote is opening or closing.
    Important! This is not always incorrect. When a speaker speaks for multiple paragraphs, a final quote is not used until the last paragraph.
    Example of missing quote: “Hello, she said.
    Example of embedded quotes (should be single quotes): “The word “death” means separation.”

    Regex: “[^”]+(“|$)
    Sigil: “[^”]+(“|</p>)
    Word: Check the "Use Wildcards" checkbox: ["“][!"”]@^13
    

    Replace: Manual. You have to manually verify that the dialog doesn’t cross paragraphs.

  7. Find all closing quotes without opening quotes. This requires the quotes to be smart quotes. If you aren’t using smart quotes, you don’t know if a quote is opening or closing.
    Example 1: “Hello,” she said. How have you been?”
    Example 2: “hello,” she said.”

    Regex: (^|”)[^“]+”
    Sigil: (<p>|”)[^“]+”
    Word: Check the "Use Wildcards" checkbox: [”^13][!“]@”
    

    Fix: Manual

  8. Find any quotes that aren’t smart quotes:
    Regex: [“”’]
    Sigil: ^[^<]*<p[^>]*>[^>]*([“”’])[^>]*
    Word: "
    

    Fix: Manual, except in word. In word, just replace all quotes and they will all be switched to smart quotes (unless you turned that setting off).

  9. Comma after a conjunction:
    Regex: b(but|and|so|which|yet|or|except),
    Sigil:  b(but|and|so|which|yet|or|except),
    Word: ”
    Fix: Manual. Either move the comma to before the conjunction or delete it.
  10. Find character names you write inconsistently. I use three letters. The first letter, a consistent middle consonant, and the last letter. If your misspellings change the first or last letter, you’ll need to figure out your own.
    Example: Aiden
    Adien
    Aeden
    Adan
    Adin
    Adn

    Regex: A\w*dw*n
    Sigil: A\w*dw*n
    Word: Check the "Use Wildcards" checkbox: A([aei]@)[d]([aei]@)n
    

    Fix: Aiden (or you correct spelling)
    Example 2: Neihan
    Neihan
    Niehan
    Nihan
    Nehan
    Neihen
    Niehen
    Nihen
    Nehen

    Regex: N[ie]{1,2}h[ae]n
    Sigil: N[ie]{1,2}h[ae]n
    Word: Check the "Use Wildcards" checkbox: N([ie]@)h[ae]n
    

Expect me to continue to update this article. I guess the next step is to create a tool that will find these sentences for me (or you if I give you the tool). I already have a tool, it is just time to use it.

10 Comments

  1. CM said:

    January 23, 2014 at 12:46 pm

    Looking forward to the help with punctuation at the end of sentences. The help you have given so far has proven to be extremely useful in both Word and Sigil. Excellent work. Hope the end of sentence punctuation help comes soon.

    • J. Abram barneck said:

      January 23, 2014 at 2:14 pm

      What exact help are you looking for? What is the punctuation problem you want to solve?

      Are you looking to fix missing punctuation at the end of a Sentence? Then you would do this:

      Sigil: [^.!?] [A-Z]
      Word: [!.!?] [A-Z]

      “This is an example of a missing punctuation at the end of the sentence There should have been a punctuation.”
      “This is an example of a sentence that will result in a false positive. I love Michelle and Aiden and Lincoln.”

      You will have so many false positives that this search becomes useless. However, you can add a bunch of “exceptions” which are actually negative look-aheads in Regex terms.

      Sigil: [^.!?] (?!Michelle|Aiden|Lincoln)[A-Z]
      Word: Unknown

      Notice that you can add as many exceptions as you want to regex. Just add a pipe character “|” and then the proper name. You may have 100 proper names that you have to add. You may also want to add Mr. and Mrs. and Dr. and such.

      However, while this will help alleviate false positives, it will actually make you miss possible problematic sentences like this one:

      “I should end this sentence with a punctuation Michelle will be mad.”

      If your goal is perfection, then this problem cannot be solved with find and replace or Regex. If your goal is just to fix as many issues as you can before sending to the editor so your editor can spend less time on stupid grammar issues and more time helping your novel’s content be the best it can be, then this is still valid.

  2. CM said:

    January 25, 2014 at 12:27 pm

    Yes that is what I was meaning. Issues such as “He walked down the street to deliver the paper After he was done he noticed it was starting to drizzle.” See how the punctuation (period) would be missing between “paper” and “After” where one sentence ends and another one goes forward when it should be “deliver the paper. After he was done”.

  3. Jackie said:

    February 17, 2014 at 3:23 pm

    I love you. 😉 I was having a serious issue with missing quotation marks, and this is a 600-page book, so you have saved me an astounding amount of time. I learned that regular expressions could help me with this problem, so I was starting from the very beginning and trying to school myself in regex 101. I am no programmer, but I think I have potential, as I’m quite the researcher and actually enjoyed learning about regex. But there’s a lot to learn, and your expressions have fast tracked this process for me. THANK YOU SO MUCH!!

  4. rl said:

    November 24, 2014 at 1:15 pm

    hello, I’m looking to fix three errors in an epub conversion using regex, and I’m having a little difficulty.

    The first is to eliminate the / that appears in random words. Example, th/ere i/s sl/ashes in these sen/tenc/es

    The second is to eliminate any question marks that are at the end of each paragraph. The paragraph is two sentences. This is the last one?

    And the third is to change any capital O’s to zeros.

    Any help you could offer me would be great

    • J. Abram Barneck said:

      November 24, 2014 at 5:13 pm

      For the slash, does you novel have any slashes that you should keep? My guess is no. In this case, you don’t need regex, just replace all / with nothing. Even if you do have slashes to keep, find them in your original source. There will likely only be a few and re-add them. If you have a lot of slashes in your original source, then regex is not your tool. Regex can tell the different between correct/right slash and b/a/d or wro/ng slash. The only way to the tell the difference is that one slash is surrounded by words, wheras one is not. You would need a Natural Language parser for that. Which require coding.

    • J. Abram Barneck said:

      November 24, 2014 at 5:19 pm

      Well, if you are in sigil, your paragraph will be html tagged using the p tag:

      <p>The paragraph is two sentences. This is the last one?</p>

      You want to find this:

      \?</p>

  5. Gregg Bell said:

    August 8, 2015 at 1:37 pm

    Thanks so much for this J. Abram! Have been looking for something like this for a long time. Really appreciate you sharing it!

  6. Jörg Malek said:

    April 2, 2018 at 1:05 pm

    You are the man!
    The issue: Multiple dialogs spanning across paragraphs in an epub with over 1000 pages *sigh*.
    I was googling for hours, tearing my hairs off, bouncing my head against the wall and so on – and then your hint (#6):
    Sigil: “[^”]+(“|)
    This saved me – thank you so much for sharing 🙂

    • J. Abram Barneck said:

      April 2, 2018 at 1:39 pm

      Glad it helped you!

Leave a reply on "Fixing common errors in your novel with Find and Replace and Regex"

Fixing common errors in your novel with Find and Replace and Regex | J. Abram Barneck

Fixing common errors in your novel with Find and Replace and Regex

Tags

No tags :(

Share it

Pretty much any author can use find and replace. But how many can really use it effectively, even to save money? I can.

This article will include simple find and replace suggestions. But most authors have never even heard of Regular Expressions. Regular Expressions, often called Regex for short, are really a techies tool for finding a pattern of characters. But you can really fix a lot of issues and know they are 100% fixed without having to wait for an editor to catch them.

However, if you are using Microsoft Word, you may not even know you can use regular expressions, but they are there, though not exactly fully featured, in the Advanced Find when you click More. You have a lot of search options and regular expressions in word will be different if you have Use Wildcards checked or not.

Sigil, an open source eBook authoring tool, has pretty much full support for regular expressions. However, because it is html, there is often html tags that you can use, such as </p> which ends a paragraph.

So if you fix as many errors as you can with Find and Replace and regex, then you are making the editors job easier. The editor won’t have to spend time to fix them. If you pay your editor hourly, then they will spend less hours and you spend less money. Also, the editor will be able to focus on content and real problems rather than stupid grammatical issues that can be automatically fixed by a computer.

Here is a nice list of I have created. I will mark them as whether you can using Find and Replace or whether you have to do a manual replace.

  1. Find every paragraph that does NOT end with punctuation so you can easily fix it. (Sorry, this doesn’t fix missing punctuation between sentences in a paragraph.)
    Regex: [A-Za-z]$
    Sigil: [A-Za-z]</p>
    Word: ^$^p
    

    Replace: Manual. Because you as the author must determine the punctuation and because some things that shouldn’t have punctuation, like chapter headings, might be returned in the search.

  2. Sometimes there are one or more needless spaces at the end of a paragraph.
    Regex: [ t]+$
    Sigil: [ t]+</p>
    Word: ^w^p
    

    Replace: Empty value. Just replace it with nothing and it will fix.

  3. Sometimes there is one or more needless spaces at the start of a paragraph.
    Regex: ^\s+
    Sigil: <p>\s+
    Word: ^p^w
    

    Replace: Empty value. Just replace it with nothing and it will fix.

  4. All sentences that do not start with a capital letter:
    Regex: [A-Za-z][.?!]["”]*\s+[a-z]
    Sigil: [A-Za-z][.?!]["”]*\s+[a-z]
    Word: Check the "Use Wildcards" checkbox: [A-z][\!\.\?] [a-z]
    

    Replace: Manual

  5. All end quotes followed by a capital letter to see if should be a lowercase letter.
    Example needing fixed: “Go away.” She said.
    After it was fixed: “Go away,” she said.

    Regex: [^,]["”] [A-Z]
    Sigil: [^,]["”] [A-Z]
    Word: Check the "Use Wildcards" checkbox: [A-z][\!,]["”] [A-Z]
    

    Replace: Manual

  6. Find all opening quotes following by missing quotes or embedded quotes. This requires the quotes to be smart quotes. If you aren’t using smart quotes, you don’t know if a quote is opening or closing.
    Important! This is not always incorrect. When a speaker speaks for multiple paragraphs, a final quote is not used until the last paragraph.
    Example of missing quote: “Hello, she said.
    Example of embedded quotes (should be single quotes): “The word “death” means separation.”

    Regex: “[^”]+(“|$)
    Sigil: “[^”]+(“|</p>)
    Word: Check the "Use Wildcards" checkbox: ["“][!"”]@^13
    

    Replace: Manual. You have to manually verify that the dialog doesn’t cross paragraphs.

  7. Find all closing quotes without opening quotes. This requires the quotes to be smart quotes. If you aren’t using smart quotes, you don’t know if a quote is opening or closing.
    Example 1: “Hello,” she said. How have you been?”
    Example 2: “hello,” she said.”

    Regex: (^|”)[^“]+”
    Sigil: (<p>|”)[^“]+”
    Word: Check the "Use Wildcards" checkbox: [”^13][!“]@”
    

    Fix: Manual

  8. Find any quotes that aren’t smart quotes:
    Regex: [“”’]
    Sigil: ^[^<]*<p[^>]*>[^>]*([“”’])[^>]*
    Word: "
    

    Fix: Manual, except in word. In word, just replace all quotes and they will all be switched to smart quotes (unless you turned that setting off).

  9. Comma after a conjunction:
    Regex: b(but|and|so|which|yet|or|except),
    Sigil:  b(but|and|so|which|yet|or|except),
    Word: ”
    Fix: Manual. Either move the comma to before the conjunction or delete it.
  10. Find character names you write inconsistently. I use three letters. The first letter, a consistent middle consonant, and the last letter. If your misspellings change the first or last letter, you’ll need to figure out your own.
    Example: Aiden
    Adien
    Aeden
    Adan
    Adin
    Adn

    Regex: A\w*dw*n
    Sigil: A\w*dw*n
    Word: Check the "Use Wildcards" checkbox: A([aei]@)[d]([aei]@)n
    

    Fix: Aiden (or you correct spelling)
    Example 2: Neihan
    Neihan
    Niehan
    Nihan
    Nehan
    Neihen
    Niehen
    Nihen
    Nehen

    Regex: N[ie]{1,2}h[ae]n
    Sigil: N[ie]{1,2}h[ae]n
    Word: Check the "Use Wildcards" checkbox: N([ie]@)h[ae]n
    

Expect me to continue to update this article. I guess the next step is to create a tool that will find these sentences for me (or you if I give you the tool). I already have a tool, it is just time to use it.

10 Comments

  1. CM said:

    January 23, 2014 at 12:46 pm

    Looking forward to the help with punctuation at the end of sentences. The help you have given so far has proven to be extremely useful in both Word and Sigil. Excellent work. Hope the end of sentence punctuation help comes soon.

    • J. Abram barneck said:

      January 23, 2014 at 2:14 pm

      What exact help are you looking for? What is the punctuation problem you want to solve?

      Are you looking to fix missing punctuation at the end of a Sentence? Then you would do this:

      Sigil: [^.!?] [A-Z]
      Word: [!.!?] [A-Z]

      “This is an example of a missing punctuation at the end of the sentence There should have been a punctuation.”
      “This is an example of a sentence that will result in a false positive. I love Michelle and Aiden and Lincoln.”

      You will have so many false positives that this search becomes useless. However, you can add a bunch of “exceptions” which are actually negative look-aheads in Regex terms.

      Sigil: [^.!?] (?!Michelle|Aiden|Lincoln)[A-Z]
      Word: Unknown

      Notice that you can add as many exceptions as you want to regex. Just add a pipe character “|” and then the proper name. You may have 100 proper names that you have to add. You may also want to add Mr. and Mrs. and Dr. and such.

      However, while this will help alleviate false positives, it will actually make you miss possible problematic sentences like this one:

      “I should end this sentence with a punctuation Michelle will be mad.”

      If your goal is perfection, then this problem cannot be solved with find and replace or Regex. If your goal is just to fix as many issues as you can before sending to the editor so your editor can spend less time on stupid grammar issues and more time helping your novel’s content be the best it can be, then this is still valid.

  2. CM said:

    January 25, 2014 at 12:27 pm

    Yes that is what I was meaning. Issues such as “He walked down the street to deliver the paper After he was done he noticed it was starting to drizzle.” See how the punctuation (period) would be missing between “paper” and “After” where one sentence ends and another one goes forward when it should be “deliver the paper. After he was done”.

  3. Jackie said:

    February 17, 2014 at 3:23 pm

    I love you. 😉 I was having a serious issue with missing quotation marks, and this is a 600-page book, so you have saved me an astounding amount of time. I learned that regular expressions could help me with this problem, so I was starting from the very beginning and trying to school myself in regex 101. I am no programmer, but I think I have potential, as I’m quite the researcher and actually enjoyed learning about regex. But there’s a lot to learn, and your expressions have fast tracked this process for me. THANK YOU SO MUCH!!

  4. rl said:

    November 24, 2014 at 1:15 pm

    hello, I’m looking to fix three errors in an epub conversion using regex, and I’m having a little difficulty.

    The first is to eliminate the / that appears in random words. Example, th/ere i/s sl/ashes in these sen/tenc/es

    The second is to eliminate any question marks that are at the end of each paragraph. The paragraph is two sentences. This is the last one?

    And the third is to change any capital O’s to zeros.

    Any help you could offer me would be great

    • J. Abram Barneck said:

      November 24, 2014 at 5:13 pm

      For the slash, does you novel have any slashes that you should keep? My guess is no. In this case, you don’t need regex, just replace all / with nothing. Even if you do have slashes to keep, find them in your original source. There will likely only be a few and re-add them. If you have a lot of slashes in your original source, then regex is not your tool. Regex can tell the different between correct/right slash and b/a/d or wro/ng slash. The only way to the tell the difference is that one slash is surrounded by words, wheras one is not. You would need a Natural Language parser for that. Which require coding.

    • J. Abram Barneck said:

      November 24, 2014 at 5:19 pm

      Well, if you are in sigil, your paragraph will be html tagged using the p tag:

      <p>The paragraph is two sentences. This is the last one?</p>

      You want to find this:

      \?</p>

  5. Gregg Bell said:

    August 8, 2015 at 1:37 pm

    Thanks so much for this J. Abram! Have been looking for something like this for a long time. Really appreciate you sharing it!

  6. Jörg Malek said:

    April 2, 2018 at 1:05 pm

    You are the man!
    The issue: Multiple dialogs spanning across paragraphs in an epub with over 1000 pages *sigh*.
    I was googling for hours, tearing my hairs off, bouncing my head against the wall and so on – and then your hint (#6):
    Sigil: “[^”]+(“|)
    This saved me – thank you so much for sharing 🙂

    • J. Abram Barneck said:

      April 2, 2018 at 1:39 pm

      Glad it helped you!

Leave a reply on "Fixing common errors in your novel with Find and Replace and Regex"