long parsing: aparser вылетает (падает)

  • Автор темы Автор темы John
  • Дата начала Дата начала

John

A-Parser Enterprise License
A-Parser Enterprise
Запускаю задание с пресетом:
Код:
eyJwcmVzZXQiOiJTdXBlciBzdWdnZXN0IiwidmFsdWUiOnsicGFyc2VycyI6W1si
U0U6Okdvb2dsZTo6U3VnZ2VzdCIsImRlZmF1bHQiLHsidHlwZSI6Im92ZXJyaWRl
IiwiaWQiOiJmb3JtYXRyZXN1bHQiLCJ2YWx1ZSI6InJlc3VsdHNbe3N1Z2dlc3R9
XFxuXSJ9LHsidHlwZSI6Im9wdGlvbnMiLCJpZCI6InBhcnNlTGV2ZWwiLCJ2YWx1
ZSI6MTB9XSxbIlNFOjpCaW5nOjpTdWdnZXN0IiwiZGVmYXVsdCIseyJ0eXBlIjoi
b3ZlcnJpZGUiLCJpZCI6ImZvcm1hdHJlc3VsdCIsInZhbHVlIjoicmVzdWx0c1t7
c3VnZ2VzdH1cXG5dIn0seyJ0eXBlIjoib3B0aW9ucyIsImlkIjoicGFyc2VMZXZl
bCIsInZhbHVlIjoxMH1dLFsiU0U6OllhaG9vOjpTdWdnZXN0IiwiZGVmYXVsdCIs
eyJ0eXBlIjoib3ZlcnJpZGUiLCJpZCI6ImZvcm1hdHJlc3VsdCIsInZhbHVlIjoi
cmVzdWx0c1t7c3VnZ2VzdH1cXG5dIn0seyJ0eXBlIjoib3B0aW9ucyIsImlkIjoi
cGFyc2VMZXZlbCIsInZhbHVlIjoxMH1dLFsiU0U6OllhbmRleDo6U3VnZ2VzdCIs
ImRlZmF1bHQiLHsidHlwZSI6Im92ZXJyaWRlIiwiaWQiOiJmb3JtYXRyZXN1bHQi
LCJ2YWx1ZSI6InJlc3VsdHNbe3N1Z2dlc3R9XFxuXSJ9LHsidHlwZSI6Im9wdGlv
bnMiLCJpZCI6InBhcnNlTGV2ZWwiLCJ2YWx1ZSI6MTB9XSxbIlNFOjpZYW5kZXg6
OldvcmRTdGF0IiwiWWFuZGV4IFdvcmRTdGF0Iix7InR5cGUiOiJvdmVycmlkZSIs
ImlkIjoiZm9ybWF0cmVzdWx0IiwidmFsdWUiOiJrZXlzW3trZXl9XFxuXSJ9LHsi
dHlwZSI6Im9wdGlvbnMiLCJpZCI6InBhcnNlTGV2ZWwiLCJ2YWx1ZSI6MTB9XV0s
InJlc3VsdHNGb3JtYXQiOiJwYXJzZXIxKHtwcmVzZXR9KXBhcnNlcjIoe3ByZXNl
dH0pcGFyc2VyMyh7cHJlc2V0fSlwYXJzZXI0KHtwcmVzZXR9KXBhcnNlcjUoe3By
ZXNldH0pIiwicmVzdWx0c1NhdmVUbyI6ImZpbGUiLCJyZXN1bHRzRmlsZU5hbWUi
OiJ7ZGF0ZX1fe3RpbWV9LnR4dCIsInJlc3VsdHNVbmlxdWUiOiJubyIsInF1ZXJ5
Rm9ybWF0Ijoie3F1ZXJ5fSIsInVuaXF1ZVF1ZXJpZXMiOmZhbHNlLCJzYXZlRmFp
bGVkUXVlcmllcyI6ZmFsc2UsImRvTG9nIjoibm8iLCJrZWVwVW5pcXVlIjoiTm8i
LCJtb3JlT3B0aW9ucyI6ZmFsc2UsInJlc3VsdHNQcmVwZW5kIjoiIiwicmVzdWx0
c0FwcGVuZCI6IiIsInF1ZXJ5QnVpbGRlcnMiOltdLCJyZXN1bHRzQnVpbGRlcnMi
OltdLCJjb25maWdPdmVycmlkZXMiOltdfSwicGFyc2Vyc0NvbmZQcmVzZXRzIjp7
IlNFOjpZYW5kZXg6OldvcmRTdGF0Ijp7IllhbmRleCBXb3JkU3RhdCI6eyJxdWVy
eWZvcm1hdCI6IntxdWVyeX0iLCJwYWdlY291bnQiOjQwLCJyZW1vdmVwbHVzIjpm
YWxzZSwidXNlbG9naW5zIjp0cnVlLCJkb19nemlwIjp0cnVlLCJmaXJzdHNsZWVw
IjoiNjAiLCJ0aW1lb3V0IjoiNjAiLCJ1c2Vwcm94eSI6dHJ1ZSwiYW50aWdhdGVw
cmVzZXQiOiJBbnRpZ2F0ZSBZYW5kZXggV29yZFN0YXQiLCJleHRyYXF1ZXJ5Ijoi
IiwiZ2VvIjpbIjIyNSJdLCJ1c2VzZXNzaW9ucyI6dHJ1ZSwicHJveHlyZXRyaWVz
IjoiMTAiLCJ1c2VhbnRpZ2F0ZSI6dHJ1ZSwicmVxdWVzdGRlbGF5IjoiMyIsInBy
b3h5YmFubmVkY2xlYW51cCI6IjMwMCIsInJhd2RhdGEiOjAsImZvcm1hdHJlc3Vs
dCI6IntxdWVyeX0gLSB7dG90YWxjb3VudH0sIHVwZGF0ZWQ6IHt1cGRhdGVkYXRl
fVxcbmtleXdvcmRzOlxcbmtleXNbe2tleX06IHtjb3VudH1cXG5dXFxuYWRkaXRp
b25hbCBrZXl3b3JkczpcXG5zZWFyY2hbe2tleX06IHtjb3VudH1cXG5dIiwibWF4
X3NpemUiOiIxMDQ4NTc2In19fX0=

и списком запросов:
Код:
granta lada
лада гранта

апарсер работает пару минут, потом падает (веб-интерфейс недоступен, процесса нет в списке процессов):
Код:
root@aparser:~/aparser# ps ax | grep aparser
19946 pts/0    S+    0:00 grep --color=auto aparser
конец лога aparser.log
Код:
root@aparser:~/aparser# tail aparser.log
Mar  1 06:35:02.00733 long parsing(5.71137809753418): SE::Yandex::WordStat::0, query: картинки лада гранта люкс
Mar  1 06:35:02.56121 long parsing(10.1585340499878): SE::Yandex::WordStat::0, query: лада гранта люкс фото сало                                                        на
Mar  1 06:35:02.74863 long parsing(10.1621220111847): SE::Yandex::WordStat::0, query: лада гранта комплектация л                                                        юкс цена
Mar  1 06:35:02.87846 long parsing(21.7180509567261): SE::Yandex::WordStat::0, query: лада гранта люкс обзор
Mar  1 06:35:07.95806 long parsing(20.2776300907135): SE::Yandex::WordStat::0, query: лада гранта автомат люкс ф                                                        ото
Mar  1 06:35:07.96338 long parsing(15.2062871456146): SE::Yandex::WordStat::0, query: лада гранта люкс тюнинг
Mar  1 06:35:12.10881 long parsing(20.6793630123138): SE::Yandex::WordStat::0, query: лада гранта люкс 2013 фото
Mar  1 06:35:12.1107 long parsing(15.3106338977814): SE::Yandex::WordStat::0, query: лада гранта люкс комплектац                                                        ия
Mar  1 06:35:12.96979 long parsing(38.6633038520813): SE::Yandex::WordStat::0, query: лада гранта люкс плюс фото
Mar  1 06:35:13.11205 long parsing(10.1619091033936): SE::Yandex::WordStat::0, query: краш тест лада гранта люкс
инфо о системе
Код:
root@aparser:~/aparser# uname -a
Linux aparser 3.2.0-24-virtual #37-Ubuntu SMP Wed Apr 25 12:51:49 UTC 2012 i686 i686 i386 GNU/Linux
 
Предполагаю зацикливание запросов - надо включить Unique queries
в данном случае long parsing это ложный варнинг
 
Включил unique queries и на запрос и на результат вордстат-парсера, вот пресет:
Код:
eyJwcmVzZXQiOiJTdXBlciBzdWdnZXN0IiwidmFsdWUiOnsicGFyc2VycyI6W1si
U0U6Okdvb2dsZTo6U3VnZ2VzdCIsImRlZmF1bHQiLHsidHlwZSI6Im92ZXJyaWRl
IiwiaWQiOiJmb3JtYXRyZXN1bHQiLCJ2YWx1ZSI6InJlc3VsdHNbe3N1Z2dlc3R9
XFxuXSJ9LHsidHlwZSI6Im9wdGlvbnMiLCJpZCI6InBhcnNlTGV2ZWwiLCJ2YWx1
ZSI6MTB9XSxbIlNFOjpCaW5nOjpTdWdnZXN0IiwiZGVmYXVsdCIseyJ0eXBlIjoi
b3ZlcnJpZGUiLCJpZCI6ImZvcm1hdHJlc3VsdCIsInZhbHVlIjoicmVzdWx0c1t7
c3VnZ2VzdH1cXG5dIn0seyJ0eXBlIjoib3B0aW9ucyIsImlkIjoicGFyc2VMZXZl
bCIsInZhbHVlIjoxMH1dLFsiU0U6OllhaG9vOjpTdWdnZXN0IiwiZGVmYXVsdCIs
eyJ0eXBlIjoib3ZlcnJpZGUiLCJpZCI6ImZvcm1hdHJlc3VsdCIsInZhbHVlIjoi
cmVzdWx0c1t7c3VnZ2VzdH1cXG5dIn0seyJ0eXBlIjoib3B0aW9ucyIsImlkIjoi
cGFyc2VMZXZlbCIsInZhbHVlIjoxMH1dLFsiU0U6OllhbmRleDo6U3VnZ2VzdCIs
ImRlZmF1bHQiLHsidHlwZSI6Im92ZXJyaWRlIiwiaWQiOiJmb3JtYXRyZXN1bHQi
LCJ2YWx1ZSI6InJlc3VsdHNbe3N1Z2dlc3R9XFxuXSJ9LHsidHlwZSI6Im9wdGlv
bnMiLCJpZCI6InBhcnNlTGV2ZWwiLCJ2YWx1ZSI6MTB9XSxbIlNFOjpZYW5kZXg6
OldvcmRTdGF0IiwiWWFuZGV4IFdvcmRTdGF0Iix7InR5cGUiOiJvdmVycmlkZSIs
ImlkIjoiZm9ybWF0cmVzdWx0IiwidmFsdWUiOiJrZXlzW3trZXl9XFxuXSJ9LHsi
dHlwZSI6Im9wdGlvbnMiLCJpZCI6InBhcnNlTGV2ZWwiLCJ2YWx1ZSI6MTB9LHsi
dHlwZSI6InVuaXF1ZSIsInJlc3VsdCI6WyJrZXlzIiwia2V5Il0sInVuaXF1ZVR5
cGUiOiJzdHJpbmciLCJ1bmlxdWVHbG9iYWwiOnRydWV9LHsidHlwZSI6InVuaXF1
ZSIsInJlc3VsdCI6InF1ZXJ5IiwidW5pcXVlVHlwZSI6InN0cmluZyIsInVuaXF1
ZUdsb2JhbCI6dHJ1ZX1dXSwicmVzdWx0c0Zvcm1hdCI6InBhcnNlcjEoe3ByZXNl
dH0pcGFyc2VyMih7cHJlc2V0fSlwYXJzZXIzKHtwcmVzZXR9KXBhcnNlcjQoe3By
ZXNldH0pcGFyc2VyNSh7cHJlc2V0fSkiLCJyZXN1bHRzU2F2ZVRvIjoiZmlsZSIs
InJlc3VsdHNGaWxlTmFtZSI6IntkYXRlfV97dGltZX0udHh0IiwicmVzdWx0c1Vu
aXF1ZSI6Im5vIiwicXVlcnlGb3JtYXQiOiJ7cXVlcnl9IiwidW5pcXVlUXVlcmll
cyI6ZmFsc2UsInNhdmVGYWlsZWRRdWVyaWVzIjpmYWxzZSwiZG9Mb2ciOiJubyIs
ImtlZXBVbmlxdWUiOiJObyIsIm1vcmVPcHRpb25zIjpmYWxzZSwicmVzdWx0c1By
ZXBlbmQiOiIiLCJyZXN1bHRzQXBwZW5kIjoiIiwicXVlcnlCdWlsZGVycyI6W10s
InJlc3VsdHNCdWlsZGVycyI6W10sImNvbmZpZ092ZXJyaWRlcyI6W119LCJwYXJz
ZXJzQ29uZlByZXNldHMiOnsiU0U6OllhbmRleDo6V29yZFN0YXQiOnsiWWFuZGV4
IFdvcmRTdGF0Ijp7InF1ZXJ5Zm9ybWF0Ijoie3F1ZXJ5fSIsInBhZ2Vjb3VudCI6
NDAsInVzZWxvZ2lucyI6dHJ1ZSwicmVtb3ZlcGx1cyI6ZmFsc2UsImRvX2d6aXAi
OnRydWUsImZpcnN0c2xlZXAiOiI2MCIsInRpbWVvdXQiOiI2MCIsInVzZXByb3h5
Ijp0cnVlLCJhbnRpZ2F0ZXByZXNldCI6IkFudGlnYXRlIFlhbmRleCBXb3JkU3Rh
dCIsImV4dHJhcXVlcnkiOiIiLCJnZW8iOlsiMjI1Il0sInVzZXNlc3Npb25zIjp0
cnVlLCJwcm94eXJldHJpZXMiOiIxMCIsInVzZWFudGlnYXRlIjp0cnVlLCJyZXF1
ZXN0ZGVsYXkiOiIzIiwicHJveHliYW5uZWRjbGVhbnVwIjoiMzAwIiwiZm9ybWF0
cmVzdWx0Ijoie3F1ZXJ5fSAtIHt0b3RhbGNvdW50fSwgdXBkYXRlZDoge3VwZGF0
ZWRhdGV9XFxua2V5d29yZHM6XFxua2V5c1t7a2V5fToge2NvdW50fVxcbl1cXG5h
ZGRpdGlvbmFsIGtleXdvcmRzOlxcbnNlYXJjaFt7a2V5fToge2NvdW50fVxcbl0i
LCJyYXdkYXRhIjowLCJtYXhfc2l6ZSI6IjEwNDg1NzYifX19fQ==

Думал, что дело в параметре parse-to-level, убирал и его для вордстат-парсера, но это не помогает.

А-парсер все равно падает, в логах тоже самое:
Код:
root@aparser:~/aparser# tail aparser.log
Mar  1 20:12:09.07681 long parsing(10.9120697975159): SE::Yandex::WordStat::0, query: lada granta sport цена фото
Mar  1 20:12:09.09425 long parsing(10.7083780765533): SE::Yandex::WordStat::0, query: купить новую lada granta
Mar  1 20:12:09.09578 long parsing(10.814150094986): SE::Yandex::WordStat::0, query: lada granta sport купить
Mar  1 20:12:11.157 long parsing(9.98780608177185): SE::Yandex::WordStat::0, query: lada ваз granta sedan sport
Mar  1 20:12:11.16083 long parsing(9.78026509284973): SE::Yandex::WordStat::0, query: lada granta sport тест
Mar  1 20:12:11.16248 long parsing(10.536908864975): SE::Yandex::WordStat::0, query: lada granta фото салона
Mar  1 20:12:13.22179 long parsing(10.1200680732727): SE::Yandex::WordStat::0, query: технические характеристики lada granta sport
Mar  1 20:12:15.2052 long parsing(10.9457960128784): SE::Yandex::WordStat::0, query: lada granta 106 л +с
Mar  1 20:12:15.29825 long parsing(8.33511090278625): SE::Yandex::WordStat::0, query: технические характеристики lada granta sport
Mar  1 20:12:15.68693 long parsing(8.34540796279907): SE::Yandex::WordStat::0, query: lada granta sport цена фото
 
Галочка Unique queries / Уникальные запросы находится в секции Queries / Запросы, под форматом запроса
 
Сегодня забыл поставить галочку unique queries и посмотрел в лог после падения, а там какие-то ошибки:
Код:
Mar  4 20:44:52.99166 long parsing(8.37489199638367): SE::Yandex::WordStat::0, query: тест драйв лада гранта 2014
Mar  4 20:46:15.41228 long parsing(8.38189601898193): SE::Yandex::WordStat::0, query: тест драйв лада гранта 2014
Mar  4 20:46:15.45029 task 9 completed
Use of uninitialized value in subroutine entry at core.to_build.pl line 44772.
        (in cleanup) Can't use string ("") as a subroutine ref while "strict refs" in use at core.to_build.pl line 44772.
DBI db handle 0xb3d7892c has uncleared implementors data at core.to_build.pl line 17902.
    dbih_clearcom (dbh 0xb3d7892c, com 0xe8d2cd0, imp DBD::SQLite::db):
      FLAGS 0x100113: COMSET IMPSET Warn PrintError PrintWarn
      PARENT DBI::dr=HASH(0xc620800)
      KIDS 0 (0 Active)
Close unknow watcher iterator-9 at core.to_build.pl line 44795.
Mar  4 20:46:15.53838 cannot commit database iterator-9 files/tasks/9_additional.db
Close unknow watcher iterator-9 at core.to_build.pl line 44795.
Mar  4 20:46:15.53881 cannot commit database iterator-9 files/tasks/9_additional.db
Mar  4 20:46:48.49247 starting task 10
Mar  4 20:46:54.72523 using new iterator
DBD::SQLite::db do failed: Safety level may not be changed inside a transaction at core.to_build.pl line 17785.
Mar  4 20:47:06.94609 task 10 completed
Use of uninitialized value in subroutine entry at core.to_build.pl line 44772.
        (in cleanup) Can't use string ("") as a subroutine ref while "strict refs" in use at core.to_build.pl line 44772.
DBI db handle 0xb3d7d558 has uncleared implementors data at core.to_build.pl line 17902.
    dbih_clearcom (dbh 0xb3d7d558, com 0x105b94c0, imp DBD::SQLite::db):
      FLAGS 0x100113: COMSET IMPSET Warn PrintError PrintWarn
      PARENT DBI::dr=HASH(0xc620800)
      KIDS 0 (0 Active)
Close unknow watcher iterator-10 at core.to_build.pl line 44795.
Mar  4 20:47:06.953 cannot commit database iterator-10 files/tasks/10_additional.db
Close unknow watcher iterator-10 at core.to_build.pl line 44795.
Mar  4 20:47:06.95327 cannot commit database iterator-10 files/tasks/10_additional.db
Mar  4 20:47:20.89532 starting task 11
Mar  4 20:47:21.70957 starting task 12
Mar  4 20:47:22.55028 starting task 13
Mar  4 20:47:23.46461 starting task 14
Mar  4 20:47:24.38234 starting task 15
Mar  4 20:47:29.12323 using new iterator
Mar  4 20:47:29.91099 using new iterator
Mar  4 20:47:31.0511 using new iterator
Mar  4 20:47:31.79487 using new iterator
Mar  4 20:47:32.85295 using new iterator
Mar  4 20:47:45.07239 long parsing(2.72810411453247): SE::Yandex::0, query: "автосалон гусар киров лада калина"
Mar  4 20:47:45.69397 long parsing(0.433759212493896): SE::Yandex::0, query: "рабочая температура двигателя гранты"
Mar  4 20:49:47.80802 long parsing(119.349725008011): SE::Yandex::0, query: "автоцентр лада +в орехово зуево"
pfae b455 died at core.to_build.pl line 52907, <$__ANONIO__> line 84.
Mar  4 20:49:47.8087 exit called
Mar  4 20:49:47.81065 end called
Use of uninitialized value in subroutine entry at core.to_build.pl line 44772, <$__ANONIO__> line 84 during global destruction.
        (in cleanup) Can't use string ("") as a subroutine ref while "strict refs" in use at core.to_build.pl line 44772, <$__ANONIO__> line 84 during global destruction.
Use of uninitialized value in subroutine entry at core.to_build.pl line 44772, <$__ANONIO__> line 84 during global destruction.
Use of uninitialized value in subroutine entry at core.to_build.pl line 44772, <$__ANONIO__> line 84 during global destruction.
Use of uninitialized value in subroutine entry at core.to_build.pl line 44772, <$__ANONIO__> line 84 during global destruction.
Use of uninitialized value in subroutine entry at core.to_build.pl line 44772, <$__ANONIO__> line 84 during global destruction.
Use of uninitialized value in subroutine entry at core.to_build.pl line 44772, <$__ANONIO__> line 84 during global destruction.
Use of uninitialized value in subroutine entry at core.to_build.pl line 44772, <$__ANONIO__> line 84 during global destruction.
Use of uninitialized value in subroutine entry at core.to_build.pl line 44772, <$__ANONIO__> line 84 during global destruction.
        (in cleanup) Can't use string ("") as a subroutine ref while "strict refs" in use at core.to_build.pl line 44772, <$__ANONIO__> line 84 during global destruction.
Use of uninitialized value in subroutine entry at core.to_build.pl line 44772 during global destruction.
        (in cleanup) Can't use string ("") as a subroutine ref while "strict refs" in use at core.to_build.pl line 44772 during global destruction.
Use of uninitialized value in subroutine entry at core.to_build.pl line 44772 during global destruction.
Use of uninitialized value in subroutine entry at core.to_build.pl line 44772 during global destruction.
Use of uninitialized value in subroutine entry at core.to_build.pl line 44772 during global destruction.
        (in cleanup) Can't use string ("") as a subroutine ref while "strict refs" in use at core.to_build.pl line 44772 during global destruction.

Возможно вам будет интересно... все-таки не должен парсер падать целиком из-за настроек одного отдельного задания.
 
Прошу прощения. Я запускал много потоков, парсер выбирал всю память, ядро его прибивало.
Тему можно закрывать.
 
Назад
Верх