Question
RUTA entity extraction: regex
Hi,
I have the following entity extraction model:
DECLARE VarA;
DECLARE VarB;
NUM{REGEXP("([0-9]{8}|[0-9]{6})") -> MARK(VarA)}
"-"
NUM{REGEXP("....") -> MARK(VarB),MARK(EntityType,1,3), UNMARK(VarA), UNMARK(VarB)};
This should mark entities which have the fllowing format: a number (8 or 6 digits) , dash ("-") and a number (4 digits).
It works fine, when the entity is within the paragraph, but doesn't work when the entity stays alone in a text line (see attachment). Adding any character in front or after the examined string makes it work again (see another screenshot).
What is missing?
Thanks, Miloslaw
***Edited by Moderator Marissa to update Content Type from Discussion to Question***
Hi,
Kindly use the below script for the above mentioned pattern:
PACKAGE uima.ruta.example;
Document{->RETAINTYPE(SPACE)};
"[0-9]{6,8}[-]?[0-9]{4}" -> EntityType;
Let me know if it works.
Thanks
Vamsi