Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ Programming ➜ General ➜ Word replacement in a C program

Word replacement in a C program

It is now over 60 days since the last post. This thread is closed.     Refresh page


Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Wed 20 Oct 2004 05:31 AM (UTC)

Amended on Wed 20 Oct 2004 05:33 AM (UTC) by Nick Gammon

Message
After some encouragement by Ksilyan, I have been learning about lex - a program for doing lexical analysis.

After some preliminary research, it seemed to be an ideal tool for doing something I wanted to do whilst fixing up SMAUG source code. In particular I was thinking of the problem of converting it to C++ where it had the word "class" sprinkled liberally through it (ie. player class) however class is a C++ keyword, and gave heaps of compile errors.

Now I know you can edit the files or use Perl to do a "change all" of "class" to (say) "mudclass" but the problem is:


  • You only want to change whole words, eg. class_table should stay the same

  • You don't want to change literals, eg. "Choose your class" should stay the same

  • You don't want to change multi-line comments, eg.

    
      /*
      Here we choose a player's class 
      */
      


    should stay the same.


  • You don't want to change single-line comments, eg. // choose class, should stay the same.


The program lex which seems to be standard under Red Hat Linux, and can no doubt be obtained from the Cygwin download, lets this be done in a simple way. I post the method below, as it took a bit of work to get it perfect. :)

You can copy the text between the lines below, and paste them into a file called "fixup.l" (that's a trailing L for Lexer) and then run the lex program as suggested in the comments below.

You could then run something like this:


./fixup class mudclass < update.c > update.c.new


Then do a "diff" to check the changes were made OK. eg.


diff update.c update.c.new


The slightly tricky part of the code is the provision of three "states" - INITIAL, quote and comment.

The INITIAL state is the default state for the lexer.

The "quote" state is entered when a quoted string is detected. Inside the quote state the target word is not changed. Also a quote-within-a-quote (namely \") does not terminate the quoted string.

The "comment" state is entered for a multi-line comment, and is only terminated when the closing comment is found.


%x comment
%x quote
%{
 /*
 To compile, save this file as fixup.l and run this:

 lex -ofixup.c fixup.l && gcc fixup.c -lfl -o fixup 

 To run (to change "foo" to "bar" in a C program):

 ./fixup foo bar < input.c > output.c

 */

 char * sFrom;   /* word to search for */
 char * sTo;     /* word to replace it with */
%}

%%

"/*"            ECHO; BEGIN (comment);   /* begin multi-line comment */
<comment>"*/"   ECHO; BEGIN (INITIAL);   /* end multi-line comment */

"\""            ECHO; BEGIN (quote);     /* begin quotes */
<quote>{         /* inside quote state */
         ["\n]  ECHO; BEGIN (INITIAL);   /* end quotes */
         "\\\"" ECHO;                    /* escaped quote inside quotes */
       }         /* end quote state */

"//".*       ECHO;                       /* single-line comment */

    /* identifier */

[a-zA-Z0-9_]+ { if (strcmp (yytext, sFrom) == 0)
                 printf ("%s", sTo);
                else
                  ECHO; 
              }
%%

int main ( int argc, char ** argv)
  {
  char * sProgram = argv [0];

  if (argc != 3)
    {
    fprintf (stderr, "Usage: %s target_word replacement_word\n", sProgram);
    return 1;
    }

  sFrom = argv [1];
  sTo = argv [2];

  yylex();
  return 0;
  }

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #1 on Wed 20 Oct 2004 06:22 AM (UTC)

Amended on Wed 20 Oct 2004 06:24 AM (UTC) by Nick Gammon

Message
Another useful tip - how to do this to a batch of files. Say you want to process all .c files in your directory. This example will assume that "fixup" is in your path, otherwise amend the fixup part to point to it.

Warning - make a backup first in case something goes wrong! :)

The commands below are based around the bash shell, which is standard in Linux, and also under Cygwin.


Fixup all .c files


 for i in *.c; do fixup class mudclass < $i > $i.new; done


The above code will process all .c files, creating new files ending in .c.new


Check diffs


 for i in *.c; do diff $i $i.new; done


You might want to confirm the new files are OK.


Remove originals


  rm *.c


This deletes the original files before the changes.


Rename new files to old


  for i in *.c.new; do mv $i ${i%\.new}; done


This renames the .c.new files back to their original .c names.




If you want to also process .h files you could change each command like this:


 for i in *.{c,h}; do fixup class mudclass < $i > $i.new; done


Note the {c,h} in the above to handle both suffixes. Use a similar thing in the other commands.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Samson   USA  (683 posts)  Bio
Date Reply #2 on Wed 20 Oct 2004 11:11 AM (UTC)
Message
Would it not be possible to have the script look for the combination of ->class ? That's class with a hyphen and greater-than symbol in front. Off the top of my head I think most if not all references to it will be in that form. Less cumbersome to search and replace that than it is to worry about the other exceptions :)
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #3 on Wed 20 Oct 2004 09:11 PM (UTC)
Message
Well, for one thing, in tables.c are quite a few references the other way around:


   CREATE( class, struct class_type, 1 );

    /* Setup defaults for additions to class structure */
    class->attr_second = 0;
    class->attr_deficient = 0;
    xCLEAR_BITS(class->affected);
    class->resist = 0;
    class->suscept = 0;


Then in skills.c there are some cases without any extra punctuation:


           int class;

            argument = one_argument( argument, arg3 );
            class = atoi( arg3 );


Then there were other examples of different words, like new - another C++ keyword...


       CREATE(new, NEIGHBOR_DATA, 1);
        new->next = NULL;
        new->prev = NULL;
        new->address = NULL;
        new->name = fread_string(fp);


Once you start having to search for all the cases, like class-> and then ->class and the most problematic, "class" on its own, which will then be found in quoted strings ("Enter your class") you are getting to the stage I got to last time - needing a way to automate it to make all changes which are:


  • Whole words

  • Not in comments

  • Not in quotes

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Samson   USA  (683 posts)  Bio
Date Reply #4 on Sun 24 Oct 2004 01:43 AM (UTC)
Message
I see what you mean. Makes me wish I had this thing when I did my conversion to C++. It would have made life a whole lot simpler.
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #5 on Sun 24 Oct 2004 11:13 PM (UTC)

Amended on Sun 24 Oct 2004 11:14 PM (UTC) by Nick Gammon

Message
The lexer file (above) can be downloaded from:


(2 Kb)

In case you don't have lex installed, the generated program (which does the actual fixing up) can be downloaded from:


(37 Kb)

To compile from the C source, just type:


gcc fixup.c -o fixup


The generated source has had the following added to it so you don't need to link against the "fl" library:


int yywrap ()
  {
  return 1;
  }

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


25,582 views.

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.