16512
14520
我正在使用PIG生成最近訪問過的URL列表。在每個URL中,都有一串數字,代表訪問的產品頁面。我正在嘗試使用regex_extract_all()函數僅提取數字字符串,其長度在6-8之間。可以在job2 / view /之後直接找到數字字符串,通常以+&cd結尾,但有時它們可以以)結尾。
以下是一些示例URL:
(http://a.com/search?q=cache:QD7vZRHkPQoJ:ca.xyz.com/jobs2/view/17069404+&cd=1&hl=zh-CN&ct=clnk&gl=ca)
(http://a.com/search?q=cache:G9323j2oNbAJ:ca.xyz.com/jobs2/view/5977065+&cd=1&hl=zh-CN&ct=clnk&gl=ca)
(http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clnk&gl=hk)
(http://a.com/search?q=cache:aNspmG11AJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clnk&gl=hk)
(http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=cl k&gl = hk)
這是我正在使用的當前正則表達式:
J = FOREACH jpage生成FLATTEN(REGEX_EXTRACT_ALL(TEXTCOLUMN,'\ / view \ /(\ d +)\ + \&'))為(輸出:字符數組)
我還嘗試了其他形式,例如:
'[0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9]','視圖。([0- 9] +)','視圖\ /([\ d] +)\ +',
'[0-9] [0-9] [0-9] +',和
'[0-9] [0-9] [0-9] *';沒有一個工作。
任何人都可以在這裡提供幫助或採取其他方式進行處理嗎?
非常感激,
毫米 
“意外字符'D'”的原因是,您需要放置雙反斜杠而不是單反斜杠。例如,只需將[\ d +]替換為[\\ d +]
在這裡您的解決方案,請驗證所有輸入字符串
input.txt
http://a.com/search?q=cache:QD7vZRHkPQoJ:ca.xyz.com/jobs2/view/17069404+&cd=1&hl=zh-CN&ct=clnk&gl=ca
http://a.com/search?q=cache:G9323j2oNbAJ:ca.xyz.com/jobs2/view/5977065+&cd=1&hl=zh-CN&ct=clnk&gl=ca
http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clnk&gl=hk
http://a.com/search?q=cache:aNspmG11AJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clnk&gl=hk
http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clk&gl=hk
http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928)=2&hl=zh-TW&ct=clk&gl=hk
http://webcache.googleusercontent.com/search?q=cache:http://my.linkedin.com/jobs2/view/9919248
更新的Pigscript:
A = LOAD'input.txt'作為行;
B =生成一個REGEX_EXTRACT(line,'。* / view /(\\ d +)([+ |&| cd |)?] +)?',1);
轉儲B;
(17069404)
(5977065)
(16988928)
(16988928)
(16988928)
(16988928)
6
|
我不熟悉PIG,但是此正則表達式可以滿足您的目標:
(?<= / jobs2 / view /)\ d +
通過使用(非消耗性)外觀,整個比賽(而不僅僅是比賽的一組)就是您的號碼。
1個
|
你的答案
StackExchange.ifUsing(“ editor”,function(){
StackExchange.using(“ externalEditor”,function(){
StackExchange.using(“ snippets”,function(){
StackExchange.snippets.init();
});
});
},“代碼段”);
StackExchange.ready(function(){
var channelOptions = {
標籤:“” .split(“”),
id:“ 1”
};
initTagRenderer(“”。split(“”),“” .split(“”),channelOptions);
StackExchange.using(“ externalEditor”,function(){
//如果啟用了摘要,則必須在摘要後觸發編輯器
如果(StackExchange.settings.snippets.snippetsEnabled){
StackExchange.using(“ snippets”,function(){
createEditor();
});
}
別的 {
createEditor();
}
});
函數createEditor(){
StackExchange.prepareEditor({
useStacksEditor:否,
heartbeatType:“答案”,
autoActivateHeartbeat:否,
convertImagesToLinks:是,
noModals:是的,
showLowRepImageUploadWarning:是的,
聲望:ToPostImages:10,
bindNavPrevention:是的,
後綴:“”,
imageUploader:{
brandingHtml:“採用\ u003ca href = \“ https://imgur.com/ \” \ u003e \ u003csvg class = \“ svg-icon \” width = \“ 50 \” height = \“ 18 \” viewBox = \“ 0 0 50 18 \” fill = \“ none \” xmlns = \“ http://www.w3.org/2000/svg \” \ u003e \ u003cpath d = \“ M46.1709 9.17788C46.1709 8.26454 46.2665 7.94324 47.1084 7.58816C47.4091 7.46349 47.7169 7.36433 48.0099 7.26993C48.9099 6.97997 49.672 6.73443 49.672 5.93063C49.672 5.22043 48.9832 4.61182 48.1414 4.61182C47.4335 4.61182 46.7256 4.91628 46.0943 5.50789C45.74.6 4.313.6412 4.313.6662 43.1481 6.59048V11.9512C43.1481 13.2535 43.6264 13.8962 44.6595 13.8962C45.6924 13.8962 46.1709 13.2535 46.1709 11.9512V9.17788Z \“ / \ u003e \ u003cpath d = \” M32.492 10.1419C32.492 12.6954 34.1182 14.048437.0451 14.0484 41.5985 12.6954 41.5985 10.1419V6.59049C41.5985 5.28821 41.1394 4.66232 40.1061 4.66232C39.0732 4.66232 38.5948 5.28821 38.5948 6.59049V9.60062C38.5948 10.8521 38.2696 11.5455 37.0451 11.5455C35.8209 11.5455 35.4954 10.8 521 35.4954 9.60062V6.59049C35.4954 5.28821 35.0173 4.66232 34.0034 4.66232C32.9703 4.66232 32.492 5.28821 32.492 6.59049V10.1419Z \“ / \ u003e \ u003cpath fill-rule = \” evenodd \“ clip-rule = \” evenodd \“ d = \“ M25.6622 17.6335C27.8049 17.6335 29.3739 16.9402 30.2537 15.6379C30.8468 14.7755 30.9615 13.5579 30.9615 11.9512V6.59049C30.9615 5.28821 30.4833 4.66231 29.4502 4.66231C28.9913 4.66231 28.4555 4.94978 28.1109 5.50789C6.723 4.5608.787 .1369 4.56087 21.0134 6.57349 21.0134 9.27932C21.0134 11.9852 23.003 13.913 25.3754 13.913C26.5612 13.913 27.4607 13.4902 28.1109 12.6616C28.1109 12.7229 28.1161 12.7799 28.121 12.8346C28.1256 12.8854 28.1301 12.9342 28.1301 12.983C27.24.8349 1.437 15.2321 24.1352 14.9821 23.5661 14.7787C23.176 14.6393 22.8472 14.5218 22.5437 14.5218C21.7977 14.5218 21.2429 15.0123 21.2429 15.6887C21.2429 16.7375 22.9072 17.6335 25.6622 17.6335ZM24.1317 9.27932C24.1317 7.94324 24.9928 7.09766 26.1024 27.2119 7.09766 28.0918 7.94324 28.0918 9.27932C28.0918 10.6321 27.2311 11.5116 26.1024 11.5116C24.9737 11.5116 24.1317 10.6491 24.1317 9.27932Z \“ / \ u003e \ u003cpath d = \” M16.8045 11.9512C16.8045 13.2535 17.2637 13.89629.31965 19.8079 13.2535 19.8079 11.9512V8.12928C19.8079 5.82936 18.4879 4.62866 16.4027 4.62866C15.1594 4.62866 14.279 4.98375 13.3609 5.88013C12.653 5.05154 11.6581 4.62866 10.3573 4.62866C9.34336 4.62866 8.57809 4.89931 7.9466 5.5079C7.510506203 4。 5.00066 5.28821 5.00066 6.59049V11.9512C5.00066 13.2535 5.47873 13.8962 6.51203 13.8962C7.54479 13.8962 8.0232 13.2535 8.023211.9512V8.90741C8.0232 7.58817 8.44431 6.91179 9.53458 6.91179C10.5104 6.91179 10.893 7.58817 10.893 8.94108V11.9512C10.893 13.2535 11.3711 13.8962 12.4044 13.8962C13.4375 13.8962 13.9157 13.2535 13.9157 11.9512V8.90741C13.9157 7.58879 14.3 6.91179 16.8045 7.58817 16.8045 8.94108V11.9512Z \“ / \ u003e \ u003cpath d = \” M3.31675 6.59049C3.31675 5.28821 2.83866 4.66232 1.82471 4.66232C0.791758 4.66232 0.313354 5.28821 0.313354 6.59049V11.9512C0.313354 13.2535 0.791758 1 .85798 13.8962 3.31675 13.2535 3.31675 11.9512V6.59049Z \“ / \ u003e \ u003cpath d = \” M1.87209 0.400291C0.843612 0.400291 0 1.1159 0 1.98861C0 2.87869 0.822846 3.57676 1.87209 3.57676C2.90056 3.57676 3.7234 2.87869 3.7234 1.988 1.1159 2.90056 0.400291 1.87209 0.400291Z \“ fill = \”#1BB76E \“ / \ u003e \ u003c / svg \ u003e \ u003c / a \ u003e”,
contentPolicyHtml:“根據\ u003ca href = \“ https://stackoverflow.com/help/licensing \” \ u003ecc by-sa \ u003c / a \ u003e \ u003ca href = \“ https://stackoverflow.com獲得許可的用戶貢獻/ legal / content-policy \“ \ u003e(內容策略)\ u003c / a \ u003e”,
allowUrls:是
},
onDemand:是的,
dispatchSelector:“。discard-answer”
,立即顯示MarkdownHelp:true,enableTables:true,enableSnippets:true
});
}
});
感謝您為Stack Overflow提供答案!
請務必回答問題。提供詳細信息並分享您的研究!
但是要避免...
尋求幫助,澄清或回答其他答案。
根據意見發表聲明;用參考或個人經驗來備份它們。
要了解更多信息,請參見我們撰寫出色答案的提示。
草稿已保存
草稿丟棄
註冊或登錄
StackExchange.ready(function(){
StackExchange.helpers.onClickDraftSave('#login-link');
});
使用Google註冊
使用Facebook註冊
使用電子郵件和密碼註冊
提交
以訪客身份發布
姓名
電子郵件
必需,但從未顯示
StackExchange.ready(
功能 () {
StackExchange.openid.initPostLogin('。new-post-login','https%3a%2f%2fstackoverflow.com%2fquestions%2f25942295%2fextract-string-of-numbers from-url-using-regreg-pig%23new-答案”,“ question_page”);
}
);
以訪客身份發布
姓名
電子郵件
必需,但從未顯示
發表您的答案
丟棄
點擊“發布答案”,即表示您同意我們的服務條款,隱私政策和Cookie政策
不是您要找的答案?瀏覽標記為regex apache-pig提取物的其他問題regex-lookarounds環顧四周,或提出自己的問題。