Question

1
Replies
1678
Views
moryh Member since 2017 7 posts
PEGA
Posted: March 26, 2019
Last activity: April 13, 2019
Closed
Solved

NLP With Ruta Script

I have created a Decision Data rule for entity extraction. I am performing NLP using RUTA script in pega. My requirement is to extract policy number from an email.

S- Represents Alphanumeric A- Represents Numeric

Policy Number has format: 1)With Hyphen SS-SSSSSSS-AAA 2)Without Hyphen SS SSSSSSS AAA 3)Without Spaces SSSSSSSSSAAA 4)Optionally This policy number can be prefixed with 1 also.So 1SS-SSSSSSS-AAA, 1SS SSSSSSS AAA and 1SSSSSSSSSAAA are also valid combination.

So policy number has 3 parts; 1st part is of length 2(SS), 2nd part is of length 7(SSSSSSS) and third part is of length 3(AAA). And optionally "1" is fourth part which would be prefixed to policy number.

I have written a script for this but its not working for combination in which policy number is prefixed with 1.

Below is code from script:

PACKAGE uima.ruta.example;
Document{-> RETAINTYPE(SPACE)};

DECLARE VarA;
DECLARE VarC;
DECLARE VarE;


("1")? W{REGEXP(".{2}")} ("-"|SPACE)? ((W* NUM* W* NUM* W* NUM* W*)|(NUM* W* NUM* W* NUM* W* NUM*)){REGEXP(".{7}")} ("-"|SPACE)? W{REGEXP(".{3}")->MARK(EntityType,1,6)};


(W* NUM*){REGEXP(".{2}")} ("-"|SPACE)? ((W* NUM* W* NUM* W* NUM* W*)|(NUM* W* NUM* W* NUM* W* NUM*)){REGEXP(".{7}")} ("-"|SPACE)? W{REGEXP(".{3}")->MARK(EntityType,1,5)};

((W|NUM)(NUM|W)*){REGEXP("(?i)\\b[1]{0,1}[A-Z0-9]{2}[A-Z0-9]{7}[A-Z]{3}\\b" )->MARK(EntityType)};

Valid Policy Numbers: AB-CD123EF-GHI, 1AB-CD123EF-GHI, ABCD123EFGHI, 23-456ABC7-GHI, 123-456ABC7-GHI, 1A3-456ABC7-GHI, 12A-456ABC7-GHI etc..

i am not able to handle 123-456ABC7-GHI, 1A3-456ABC7-GHI, 12A-456ABC7-GHI these combination.

Please help to write correct script that cover all possible combination. Thanks in advance.

Pega Intelligent Virtual Assistant Conversational Channels
Moderation Team has archived post
Share this page LinkedIn