Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ Programming
➜ General
➜ Word replacement in a C program
|
Word replacement in a C program
|
It is now over 60 days since the last post. This thread is closed.
Refresh page
| Posted by
| Nick Gammon
Australia (23,173 posts) Bio
Forum Administrator |
| Date
| Wed 20 Oct 2004 05:31 AM (UTC) Amended on Wed 20 Oct 2004 05:33 AM (UTC) by Nick Gammon
|
| Message
| After some encouragement by Ksilyan, I have been learning about lex - a program for doing lexical analysis.
After some preliminary research, it seemed to be an ideal tool for doing something I wanted to do whilst fixing up SMAUG source code. In particular I was thinking of the problem of converting it to C++ where it had the word "class" sprinkled liberally through it (ie. player class) however class is a C++ keyword, and gave heaps of compile errors.
Now I know you can edit the files or use Perl to do a "change all" of "class" to (say) "mudclass" but the problem is:
- You only want to change whole words, eg. class_table should stay the same
- You don't want to change literals, eg. "Choose your class" should stay the same
- You don't want to change multi-line comments, eg.
/*
Here we choose a player's class
*/
should stay the same.
- You don't want to change single-line comments, eg. // choose class, should stay the same.
The program lex which seems to be standard under Red Hat Linux, and can no doubt be obtained from the Cygwin download, lets this be done in a simple way. I post the method below, as it took a bit of work to get it perfect. :)
You can copy the text between the lines below, and paste them into a file called "fixup.l" (that's a trailing L for Lexer) and then run the lex program as suggested in the comments below.
You could then run something like this:
./fixup class mudclass < update.c > update.c.new
Then do a "diff" to check the changes were made OK. eg.
diff update.c update.c.new
The slightly tricky part of the code is the provision of three "states" - INITIAL, quote and comment.
The INITIAL state is the default state for the lexer.
The "quote" state is entered when a quoted string is detected. Inside the quote state the target word is not changed. Also a quote-within-a-quote (namely \") does not terminate the quoted string.
The "comment" state is entered for a multi-line comment, and is only terminated when the closing comment is found.
%x comment
%x quote
%{
/*
To compile, save this file as fixup.l and run this:
lex -ofixup.c fixup.l && gcc fixup.c -lfl -o fixup
To run (to change "foo" to "bar" in a C program):
./fixup foo bar < input.c > output.c
*/
char * sFrom; /* word to search for */
char * sTo; /* word to replace it with */
%}
%%
"/*" ECHO; BEGIN (comment); /* begin multi-line comment */
<comment>"*/" ECHO; BEGIN (INITIAL); /* end multi-line comment */
"\"" ECHO; BEGIN (quote); /* begin quotes */
<quote>{ /* inside quote state */
["\n] ECHO; BEGIN (INITIAL); /* end quotes */
"\\\"" ECHO; /* escaped quote inside quotes */
} /* end quote state */
"//".* ECHO; /* single-line comment */
/* identifier */
[a-zA-Z0-9_]+ { if (strcmp (yytext, sFrom) == 0)
printf ("%s", sTo);
else
ECHO;
}
%%
int main ( int argc, char ** argv)
{
char * sProgram = argv [0];
if (argc != 3)
{
fprintf (stderr, "Usage: %s target_word replacement_word\n", sProgram);
return 1;
}
sFrom = argv [1];
sTo = argv [2];
yylex();
return 0;
}
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Nick Gammon
Australia (23,173 posts) Bio
Forum Administrator |
| Date
| Reply #1 on Wed 20 Oct 2004 06:22 AM (UTC) Amended on Wed 20 Oct 2004 06:24 AM (UTC) by Nick Gammon
|
| Message
| Another useful tip - how to do this to a batch of files. Say you want to process all .c files in your directory. This example will assume that "fixup" is in your path, otherwise amend the fixup part to point to it.
Warning - make a backup first in case something goes wrong! :)
The commands below are based around the bash shell, which is standard in Linux, and also under Cygwin.
Fixup all .c files
for i in *.c; do fixup class mudclass < $i > $i.new; done
The above code will process all .c files, creating new files ending in .c.new
Check diffs
for i in *.c; do diff $i $i.new; done
You might want to confirm the new files are OK.
Remove originals
This deletes the original files before the changes.
Rename new files to old
for i in *.c.new; do mv $i ${i%\.new}; done
This renames the .c.new files back to their original .c names.
If you want to also process .h files you could change each command like this:
for i in *.{c,h}; do fixup class mudclass < $i > $i.new; done
Note the {c,h} in the above to handle both suffixes. Use a similar thing in the other commands. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Samson
USA (683 posts) Bio
|
| Date
| Reply #2 on Wed 20 Oct 2004 11:11 AM (UTC) |
| Message
| | Would it not be possible to have the script look for the combination of ->class ? That's class with a hyphen and greater-than symbol in front. Off the top of my head I think most if not all references to it will be in that form. Less cumbersome to search and replace that than it is to worry about the other exceptions :) | | Top |
|
| Posted by
| Nick Gammon
Australia (23,173 posts) Bio
Forum Administrator |
| Date
| Reply #3 on Wed 20 Oct 2004 09:11 PM (UTC) |
| Message
| Well, for one thing, in tables.c are quite a few references the other way around:
CREATE( class, struct class_type, 1 );
/* Setup defaults for additions to class structure */
class->attr_second = 0;
class->attr_deficient = 0;
xCLEAR_BITS(class->affected);
class->resist = 0;
class->suscept = 0;
Then in skills.c there are some cases without any extra punctuation:
int class;
argument = one_argument( argument, arg3 );
class = atoi( arg3 );
Then there were other examples of different words, like new - another C++ keyword...
CREATE(new, NEIGHBOR_DATA, 1);
new->next = NULL;
new->prev = NULL;
new->address = NULL;
new->name = fread_string(fp);
Once you start having to search for all the cases, like class-> and then ->class and the most problematic, "class" on its own, which will then be found in quoted strings ("Enter your class") you are getting to the stage I got to last time - needing a way to automate it to make all changes which are:
- Whole words
- Not in comments
- Not in quotes
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Samson
USA (683 posts) Bio
|
| Date
| Reply #4 on Sun 24 Oct 2004 01:43 AM (UTC) |
| Message
| | I see what you mean. Makes me wish I had this thing when I did my conversion to C++. It would have made life a whole lot simpler. | | Top |
|
| Posted by
| Nick Gammon
Australia (23,173 posts) Bio
Forum Administrator |
| Date
| Reply #5 on Sun 24 Oct 2004 11:13 PM (UTC) Amended on Sun 24 Oct 2004 11:14 PM (UTC) by Nick Gammon
|
| Message
| The lexer file (above) can be downloaded from:
(2 Kb)
In case you don't have lex installed, the generated program (which does the actual fixing up) can be downloaded from:
(37 Kb)
To compile from the C source, just type:
The generated source has had the following added to it so you don't need to link against the "fl" library:
int yywrap ()
{
return 1;
}
|
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
25,583 views.
It is now over 60 days since the last post. This thread is closed.
Refresh page
top